Journal of Chromatography & Separation Techniques

Journal of Chromatography & Separation Techniques
Open Access

ISSN: 2157-7064

+44 1300 500008


ABOid: A Software for Automated Identification and Phyloproteomics Classification of Tandem Mass Spectrometric Data

Samir V. Deshpande, Rabih E. Jabbour, Peter A. Snyder, Michael Stanford, Charles H. Wick and Alan W. Zulich

We have developed suite of bioinformatics algorithms for automated identification and classification of microbes based on comparative analysis of protein sequences. This application uses sequence information of microbial proteins revealed by mass spectrometry-based proteomics for identification and phyloproteomics classification. The algorithms transforms results of searching product ion spectra of peptide ions against a protein database, performed by commercially available software (e.g. SEQUEST), into a taxonomically meaningful and easy to interpret output. To achieve this goal we constructed a custom protein database composed of theoretical proteomes derived from all fully sequenced bacterial genomes (1204 microorganisms as of August 25th, 2010) in a FASTA format. Each protein sequence in the database is supplemented with information on a source organism and chromosomal position of each protein coding open reading frame (ORF) is embedded into the protein sequence header. In addition this information is linked with a taxonomic position of each database bacterium. ABOid analyzes SEQUEST search results files to provide the probabilities that peptide sequence assignments to a product ion mass spectrum (MS/MS) are correct and uses the accepted spectrum–to-sequence matches to generate a sequence-to-organism (STO) matrix of assignments. Because peptide sequences are differentially present or absent in various strains being compared this allows for the classification of bacterial species in a high throughput manner. For this purpose, STO matrices of assignments, viewed as assignment bitmaps, are next analyzed by a ABOid module that uses phylogenetic relationships between bacterial species as a part of decision tree process, and by applying multivariate statistical techniques (principal component and cluster analysis), to reveal relationship of the analyzed unknown sample to the database microorganisms. Our bacterial classification and identification algorithm uses assignments of an analyzed organism to taxonomic groups based on an organized scheme that begins at the phylum level and follows through classes, orders, families and genus down to strain level.