Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R

Abstract:

In recent years, the cost of DNA sequencing has decreased at a rate that has outpaced improvements in memory capacity. It is now common to collect or have access to many gigabytes of biological sequences. This has created an urgent need for approaches that analyze sequences in subsets without requiring all of the sequences to be loaded into memory at one time. It has also opened opportunities to improve the organization and accessibility of information acquired in sequencing projects. The DECIPHER package offers solutions to these problems by assisting in the curation of large sets of biological sequences stored in compressed format inside a database. This approach has many practical advantages over standard bioinformatics workflows, and enables large analyses that would otherwise be prohibitively time consuming.

Cite PDF Tweet

Author

Affiliation

Erik S. Wright

 

Published

April 30, 2016

Received

Jan 29, 2016

DOI

10.32614/RJ-2016-025

Volume

Pages

8/1

352 - 359

CRAN packages used

RSQLite

CRAN Task Views implied by cited packages

Databases

Bioconductor packages used

Biostrings, DECIPHER

Footnotes

    Reuse

    Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

    Citation

    For attribution, please cite this work as

    Wright, "The R Journal: Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R", The R Journal, 2016

    BibTeX citation

    @article{RJ-2016-025,
      author = {Wright, Erik S.},
      title = {The R Journal: Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R},
      journal = {The R Journal},
      year = {2016},
      note = {https://doi.org/10.32614/RJ-2016-025},
      doi = {10.32614/RJ-2016-025},
      volume = {8},
      issue = {1},
      issn = {2073-4859},
      pages = {352-359}
    }