Population Analysis of Bacterial Samples for Individual Identification in Forensics Application

John P Jakupciak; Jeffrey M Wells; Jeffrey S Lin; Andrew B Feldman

doi:10.4172/2153-0602.1000138

Awards Nomination 20+ Million Readerbase

PMC/PubMed Indexed Articles

Causal Inference in the Age of Decision Medicine

Mining Next Generation Sequencing Data: How to Avoid â€œTreasure in, Error Outâ€

Google Scholar citation report

Citations : 1039

Journal of Data Mining in Genomics & Proteomics received 1039 citations as per Google Scholar report

Journal of Data Mining in Genomics & Proteomics peer review process verified at publons

25+ Million Website Visitors

Indexed In

Academic Journals Database
Open J Gate
Genamics JournalSeek
JournalTOCs
ResearchBible
Ulrich's Periodicals Directory
Electronic Journals Library
RefSeek
Hamdard University
EBSCO A-Z
OCLC- WorldCat
Scholarsteer
SWB online catalog
Virtual Library of Biology (vifabio)
Publons
MIAR
Geneva Foundation for Medical Education and Research
Euro Pub
Google Scholar

Useful Links

Share This Page

Journal Flyer

Tweets by JohnMat36980096

Open Access Journals

Abstract

Population Analysis of Bacterial Samples for Individual Identification in Forensics Application

John P Jakupciak, Jeffrey M Wells, Jeffrey S Lin and Andrew B Feldman

Biodefense preparedness begins with the ability to detect and respond to bio-threats, based on accurate interpretation of genetic information with sophisticated, yet easy-to-use bioinformatics tools. Microbial forensics further enables attribution of microbial pathogen samples back to a suspected source. Sample characterization and traceability back to source are dependent on genome identification of specific targets within samples, comprehensive analysis of mixtures of populations’ present, and detection of major/minor variations in the identified genomes and comparison of sample genetic profile against other samples. Commercial Next Generation Sequencing (NGS) platforms offer the promise of dramatically higher detection sensitivity and resolution of forensic DNA samples than is possible with methods in current use. Before applying these technologies for forensic analyses of bacterial samples, however, it is critical to fully elucidate the benefits, caveats and pitfalls of NGS for hypothesis testing in comparative analyses, as ultimately this will be required for NGS use both as an investigative tool and tool for attribution in courts of law. Methods: We developed and evaluated novel probabilistic algorithms to process metagenomic sequence data from direct sample sequencing to identify genomes present in mixtures. Results: We present a pipeline for reference-free sample-to-sample comparisons to improve target characterization beyond one microorganism to characterization of comprehensive sample content. Our tools strengthen statistical confidence to trace the ancestry of samples and attribute samples to source with probabilistic certainties on many targets instead of a single genome. Conclusion: This study developed a novel reference free, bioinformatics strategy to account for and identify genetic diversity in samples. Sequence variants must be non-arbitrarily confirmed in both forward and reverse reads at a rate above the background noise level of sequencer machine error. A similarity distance metric compares genomes within a range of near relationships. Using sequence data from bio-threat agents, we successfully attributed known related strains together, and excluded near relation of known unrelated strains. The major strengths of this forensic method are the non-arbitrary determinations of data validation and relatedness metrics, as well as the ability to compare microbial genomes with or without a reference database of related genomes.