IDTAXA Classify Functions - Frequently Asked Questions:
Yes, please install DECIPHER and then look at the code page.
Training sets for functional classification were derived from the KEGG database. The complete KEGG All training set is subsampled by lineage to have up to 100 represenatives per KEGG Orthology group, while the lineage-specific KEGG subsets include represenatives of all sequences clustered at ≥ 90% sequence identity. The lineage specific subsets will offer higher resolution classifications. The subsampled KEGG training set includes lineage information so it can be used to identify genome contamination (e.g., prokaryotic genes in a eukaryotic genome).
It is best to choose the training set that is most specific to the taxon where the input sequences originated. For example, if the input sequences were from E. coli then it is best to choose the Gammaproteobacteria training set and not the Proteobacteria [Other] or KEGG All training sets. However, if the input sequences originated from multiple taxonomic groups then the only option would be the KEGG All training set.