Abstract

mBLAST: Keeping up with the Sequencing Explosion for (Meta) Genome Analysis

Curtis Davis, Karthik Kota, Venkat Baldhandapani, Wei Gong, Sahar Abubucker, Eric Becker, John Martin, Kristine M. Wylie, Radhika Khetani, Matthew E. Hudson, George M. Weinstock and Makedonka Mitreva

Recent advances in next-generation sequencing technologies require alignment algorithms and software that can keep pace with heightened data production. Standard algorithms, especially protein similarity searches, represent significant bottlenecks in analysis pipelines. For metagenomic approaches in particular, it is now often necessary to search hundreds of millions of sequence reads against large databases. Here we describe mBLAST, an accelerated search algorithm for translated and/or protein alignments to large datasets based on the Basic Local Alignment Search Tool (BLAST) and retaining the high sensitivity of BLAST. The mBLAST algorithms achieve substantial speed up over the National Center for Biotechnology Information (NCBI) programs BLASTX, TBLASTX and BLASTP for large datasets, allowing analysis within reasonable timeframes on standard computer architectures. In this article, the impact of mBLAST is demonstrated with sequences originating from the microbiota of healthy humans from the Human Microbiome Project. mBLAST is designed as a plug-in replacement for BLAST for any study that involves short-read sequences and includes high-throughput analysis. The mBLAST software is freely available to academic users at www.multicorewareinc.com.