DECIPHER - IDTAXA Classify Functions Inputs

IDTAXA Classify Functions - Inputs:

Training set:
Please select the training set that you wish to use for classification. KEGG lineage specific training sets will generally higher confidence results than the subsampled KEGG training set. However, the subsampled KEGG training set includes lineage information, which can be useful for identifying prokaryotic contaminants in eukaryotic genomes.
Confidence level:
Select a minimum confidence threshold for classifications. We recommend using a confidence of 50% (very high) or 40% (high) for amino acid training sets such as KEGG.
FASTA File:
Choose a text file containing the sequence records that you wish to classify. An example input file containing coding sequences from chlamydia trachomatis can be downloaded here. Some general remarks about input files:
- Sequences must be in FASTA format where each new sequence record begins with a ">" symbol on a single line containing the description, and subsequent lines contain the sequence information.
- Sequences can be either amino acids or nucleotides. Nucleotides must represent coding sequences that are directly translatable with the standard genetic code.
- Sequences are expected to be full-length (i.e., the complete gene or protein) with amino acid training sets such as KEGG. If you wish to classify partial-length sequences then please use the stand-alone DECIPHER software.
- The size of the uploaded file is restricted to be less than 100 MB.