IDTAXA – Classify Functions
This short example describes how to use IDTAXA to annotate the functions of protein sequences, as described in:
NP Cooley & ES Wright (2021) "Accurate annotation of protein coding sequences with IDTAXA." NAR Genomics and Bioinformatics, doi:10.1093/nargab/lqab080.
How do I assign functions to sequences?
First it is necessary to install DECIPHER and load the library in R. Next, set the "fas" variable to the path to the FASTA file of sequences (e.g., "~/mySeqs.fas"). Second, trained classifiers for protein sequences can be found on the Downloads page.# load the DECIPHER library in R
> library(DECIPHER)
>
> # specify the path to the FASTA file (in quotes)
> fas <- "<<REPLACE WITH PATH TO FASTA FILE>>"
>
> # load the sequences from the file
> seqs <- readAAStringSet(fas)
>
> # remove any gaps (if needed)
> seqs <- RemoveGaps(seqs)
>
> # for help, see the IdTaxa help page (optional)
> ?IdTaxa
>
> # load a training set object (trainingSet)
> # see http://DECIPHER.codes/Downloads.html
> load("<<REPLACE WITH PATH TO RData file>>")
>
> # classify the sequences
> ids <- IdTaxa(seqs,
+ trainingSet,
+ threshold=50, # 60 (cautious) or 50 (sensible)
+ processors=NULL) # use all available processors
|============================================| 100%
Time difference of 135.83 secs
>
> # look at the results
> print(ids)
A test set of class 'Taxa' with length 1000
confidence name taxon
[1] 78.0% ENA|OBRS01158965|... Root; Bacter...
[2] 44.7% ENA|OBRS01551965|... Root; unclas...
[3] 74.8% ENA|OBRS01920881|... Root; Bacter...
[4] 15.9% ENA|OBRS01851995|... Root; unclas...
[5] 19.7% ENA|OBRS01760119|... Root; unclas...
... ... ... ...
[996] 54.0% ENA|OBRS01119407|... Root; unclas...
[997] 56.0% ENA|OBRS01447422|... Root; unclas...
[998] 51.5% ENA|OBRS01883532|... Root; unclas...
[999] 64.7% ENA|OBRS01350537|... Root; Bacter...
[1000] 47.5% ENA|OBRS01488581|... Root; unclas...
> plot(ids)