IDTAXA Classify Functions - Inputs:
Please select the training set that you wish to use for classification. KEGG lineage specific training sets will generally higher confidence results than the subsampled KEGG training set. However, the subsampled KEGG training set includes lineage information, which can be useful for identifying prokaryotic contaminants in eukaryotic genomes.
Select a minimum confidence threshold for classifications. We recommend using a confidence of 50% (very high) or 40% (high) for amino acid training sets such as KEGG.
Choose a text file containing the sequence records that you wish to classify. An example input file containing coding sequences from chlamydia trachomatis can be downloaded here. Some general remarks about input files: