Analysis of big biological sequence datasets using the DECIPHER package
Originally presented at the useR! 2016 conference in California.
Recent advances in DNA sequencing have led to the generation of massive amounts of biological sequence data. As a result, there is an urgent need for packages that assist in organizing and evaluating large collections of sequences. The DECIPHER package enables the construction of databases for curating sequence sets in a space-efficient manner. Sequence databases offer improved organization and greatly reduce memory requirements by allowing subsets of sequences to be accessed independently. Using DECIPHER, sequences can be imported into a database, explored, viewed, and exported under non-destructive workflows that simplify complex analyses. For example, DECIPHER workflows could be used to quickly search for thousands of short sequences (oligonucleotides) within millions of longer sequences that are contained in a database. DECIPHER also includes state-of-the-art functions for sequence alignment, primer/probe design, sequence manipulation, phylogenetics, and other common bioinformatics tasks. Collectively, these features empower DECIPHER users to handle big biological sequence data using only a regular laptop computer.