Model-Free Inference for ChIP-Seq Data

Mingqi Wu; Monique Rijnkels; Faming Liang

doi:10.4172/2153-0602.1000153

Awards Nomination 20+ Million Readerbase

PMC/PubMed Indexed Articles

Causal Inference in the Age of Decision Medicine

Mining Next Generation Sequencing Data: How to Avoid â€œTreasure in, Error Outâ€

Google Scholar citation report

Citations : 1039

Journal of Data Mining in Genomics & Proteomics received 1039 citations as per Google Scholar report

Journal of Data Mining in Genomics & Proteomics peer review process verified at publons

25+ Million Website Visitors

Indexed In

Academic Journals Database
Open J Gate
Genamics JournalSeek
JournalTOCs
ResearchBible
Ulrich's Periodicals Directory
Electronic Journals Library
RefSeek
Hamdard University
EBSCO A-Z
OCLC- WorldCat
Scholarsteer
SWB online catalog
Virtual Library of Biology (vifabio)
Publons
MIAR
Geneva Foundation for Medical Education and Research
Euro Pub
Google Scholar

Useful Links

Share This Page

Journal Flyer

Tweets by JohnMat36980096

Open Access Journals

Abstract

Model-Free Inference for ChIP-Seq Data

Mingqi Wu, Monique Rijnkels and Faming Liang

Due to its higher resolution mapping and stronger ChIP enrichment signals, ChIP-seq tends to replace ChIP-chip technology in studying genome-wide protein-DNA interactions, while the massive digital ChIP-seq data present new challenges to statisticians. To date, most methods proposed in the literature for ChIP-seq data analysis are model based, however, finding a single model workable for all datasets is impossible, given the complexity of biological systems and variations generated in the sequencing process. In this paper, we present a model-free approach, the so-called MICS (Model-free Inference for ChIP-Seq), for ChIP-seq data analysis. MICS has a few advantages over the existing methods: Firstly, MICS avoids assumptions for the data distribution, and thus it maintains high power even when model assumptions for the data are violated. Secondly, MICS employs a simulation-based method in estimating the false discovery rate. Since the simulation-based method works independently of ChIP samples, MICS can perform robustly to variety of ChIP samples; it can produce accurate identification of peak regions, even for those where the enrichment is weak. Thirdly, MICS is very efficient in computation, which takes only a few seconds on a personal computer for a reasonably large dataset. In this paper, we also present a simple semi-empirical method for simulating ChIP-seq data, which allows a better assessment of performance of different approaches for ChIP-seq data analysis. MICS is compared with several existing methods, including MACS, CCAT, PICS, BayesPeak and QuEST, based on real and simulated datasets. The numerical results indicate that MICS can outperform others. Availability: An R package called MICS is available at http://www.stat.tamu.edu/~mqwu.