IDTAXA – Classify Organisms
This short example describes how to use IDTAXA to classify organisms using nucleotide sequences, as described in:
A Murali et al. (2018) "IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences." Microbiome, doi:10.1186/s40168-018-0521-5.
Instructions
First it is necessary to install DECIPHER and load the library in R. Next, set the "fas" variable to the path to the FASTA file of sequences (e.g., "~/mySeqs.fas"). Trained classifiers for different marker gene (nucleotide) sequences can be found on the Downloads page.# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to the FASTA file (in quotes)
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
>
> # load the sequences from the file
> seqs <- readDNAStringSet(fas) # or readRNAStringSet
>
> # remove any gaps (if needed)
> seqs <- RemoveGaps(seqs)
>
> # for help, see the IdTaxa help page (optional)
> ?IdTaxa
>
> # load a training set object (trainingSet)
> # see http://DECIPHER.codes/Downloads.html
> load("<<REPLACE WITH PATH TO RData file>>")
>
> # classify the sequences
> ids <- IdTaxa(seqs,
+ trainingSet,
+ strand="both", # or "top" if same as trainingSet
+ threshold=60, # 60 (cautious) or 50 (sensible)
+ processors=NULL) # use all available processors
|============================================| 100%
Time difference of 135.83 secs
>
> # look at the results
> print(ids)
A test set of class 'Taxa' with length 1000
confidence name taxon
[1] 78.0% ENA|OBRS01158965|... Root; Bacter...
[2] 44.7% ENA|OBRS01551965|... Root; unclas...
[3] 74.8% ENA|OBRS01920881|... Root; Bacter...
[4] 15.9% ENA|OBRS01851995|... Root; unclas...
[5] 19.7% ENA|OBRS01760119|... Root; unclas...
... ... ... ...
[996] 54.0% ENA|OBRS01119407|... Root; unclas...
[997] 56.0% ENA|OBRS01447422|... Root; unclas...
[998] 51.5% ENA|OBRS01883532|... Root; unclas...
[999] 64.7% ENA|OBRS01350537|... Root; Bacter...
[1000] 47.5% ENA|OBRS01488581|... Root; unclas...
> plot(ids)