Gynecology & Obstetrics

Gynecology & Obstetrics
Open Access

ISSN: 2161-0932


An Application of Machine Learning in IVF: Comparing the Accuracy of Classification Alogithims for the Prediction of Twins

Rinehart John

Background: Clinical decision-making dilemmas are particularly notable in IVF practice, given that large datasets are often generated which enable clinicians to make predictions that inform treatment choices. This study applied machine learning by using IVF data to determine the risk of twins when two or more embryos are available for transfer. While most classifiers are able to provide estimates of accuracy, this study went further by comparing classifiers both by accuracy and Area Under the Curve (AUC).
Methods: Study data were derived from a large electronic medical record system that is utilized by over 140 IVF clinics and contained 135,000 IVF cycles. The dataset was reduced from 88 variables to 40 and included only those cycles of IVF where two or more blastocyst embryos were created. The following classifiers were compared in terms of accuracy and AUC: a generalized linear model, linear discriminant analysis, quadratic discriminant analysis, K-nearest neighbors, support vector machine, random forests, and boosting. A stacking ensemble learning algorithm was also applied in order to use predictions from classifiers to create a new model.
Results: While the ensemble classifier was the most accurate, none of the classifiers predominated as being significantly superior to other classifiers. Findings indicated that boosting methods for classifiers performed poorly; logistic and linear discriminant analysis classifiers performed better than the quadratic discriminant analysis classifier, and the support vector machine performed almost as well as the tree classifier. AUC results were consistent with the comparisons for accuracy. External validation was also performed using a different dataset containing 588 observations. All models performed better using the external validation dataset, with the random forest classifier performing markedly better than any other classifier.
Conclusions: These results support the impression that big data can be of value in the clinical decision-making process; but that no single statistical algorithm provides maximum accuracy for all databases. Therefore, different datasets will require investigation in order to determine which algorithms are the most accurate for a particular set of data. These findings underscore the premise that clinicians with access to large amounts of data can use advanced predictive analytic models to create robust clinical information of vital importance for patient care.