IDTAXA – Classify Organisms
This short example describes how to use IDTAXA to classify organisms using nucleotide sequences, as described in:
A Murali et al. (2018) "IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences." Microbiome, doi:10.1186/s40168-018-0521-5.
How do I perform taxonomic classification?
First it is necessary to install DECIPHER and load the library in R. Next, set the "fas" variable to the path to the FASTA file of sequences (e.g., "~/mySeqs.fas"). Trained classifiers for different marker gene (nucleotide) sequences can be found on the Downloads page.# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to the FASTA file (in quotes)
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
>
> # load the sequences from the file
> seqs <- readDNAStringSet(fas) # or readRNAStringSet
>
> # remove any gaps (if needed)
> seqs <- RemoveGaps(seqs)
>
> # for help, see the IdTaxa help page (optional)
> ?IdTaxa
>
> # load a training set object (trainingSet)
> # see http://DECIPHER.codes/Downloads.html
> load("<<REPLACE WITH PATH TO RData file>>")
>
> # classify the sequences
> ids <- IdTaxa(seqs,
+ trainingSet,
+ strand="both", # or "top" if same as trainingSet
+ threshold=60, # 60 (cautious) or 50 (sensible)
+ processors=NULL) # use all available processors
|============================================| 100%
Time difference of 135.83 secs
>
> # look at the results
> print(ids)
A test set of class 'Taxa' with length 1000
confidence name taxon
[1] 78.0% ENA|OBRS01158965|... Root; Bacter...
[2] 44.7% ENA|OBRS01551965|... Root; unclas...
[3] 74.8% ENA|OBRS01920881|... Root; Bacter...
[4] 15.9% ENA|OBRS01851995|... Root; unclas...
[5] 19.7% ENA|OBRS01760119|... Root; unclas...
... ... ... ...
[996] 54.0% ENA|OBRS01119407|... Root; unclas...
[997] 56.0% ENA|OBRS01447422|... Root; unclas...
[998] 51.5% ENA|OBRS01883532|... Root; unclas...
[999] 64.7% ENA|OBRS01350537|... Root; Bacter...
[1000] 47.5% ENA|OBRS01488581|... Root; unclas...
> plot(ids)