Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data

Lubke GH; Laurin C; Walters R; Eriksson N; Hysi P; Spector TD; Montgomery GW; Martin NG; Medl; SE; Boomsma DI

doi:10.4172/2153-0602.1000143

Awards Nomination 20+ Million Readerbase

PMC/PubMed Indexed Articles

Causal Inference in the Age of Decision Medicine

Mining Next Generation Sequencing Data: How to Avoid â€œTreasure in, Error Outâ€

Google Scholar citation report

Citations : 1039

Journal of Data Mining in Genomics & Proteomics received 1039 citations as per Google Scholar report

Journal of Data Mining in Genomics & Proteomics peer review process verified at publons

25+ Million Website Visitors

Indexed In

Academic Journals Database
Open J Gate
Genamics JournalSeek
JournalTOCs
ResearchBible
Ulrich's Periodicals Directory
Electronic Journals Library
RefSeek
Hamdard University
EBSCO A-Z
OCLC- WorldCat
Scholarsteer
SWB online catalog
Virtual Library of Biology (vifabio)
Publons
MIAR
Geneva Foundation for Medical Education and Research
Euro Pub
Google Scholar

Useful Links

Share This Page

Journal Flyer

Tweets by JohnMat36980096

Open Access Journals

Abstract

Gradient Boosting as a SNP Filter: an Evaluation Using Simulated and Hair Morphology Data

Lubke GH, Laurin C, Walters R, Eriksson N, Hysi P, Spector TD, Montgomery GW, Martin NG, Medland SE and Boomsma DI

Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects impractical for genome-wide data. We advocate a two-step approach where the first step consists of a filter that is sensitive to different types of SNP main and interactions effects. The aim is to substantially reduce the number of SNPs such that more specific modeling becomes feasible in a second step. We provide an evaluation of a statistical learning method called “gradient boosting machine” (GBM) that can be used as a filter. GBM does not require an a priori specification of a genetic model, and permits inclusion of large numbers of covariates. GBM can therefore be used to explore multiple GxE interactions, which would not be feasible within the parametric framework used in GWAS. We show in a simulation that GBM performs well even under conditions favorable to the standard additive regression model commonly used in GWAS, and is sensitive to the detection of interaction effects even if one of the interacting variables has a zero main effect. The latter would not be detected in GWAS. Our evaluation is accompanied by an analysis of empirical data concerning hair morphology. We estimate the phenotypic variance explained by increasing numbers of highest ranked SNPs, and show that it is sufficient to select 10K-20K SNPs in the first step of a two-step approach.