Abstract

Computational Feature Selection and Classification of RET Phenotypic Severity

David K. Crockett, Stephen R. Piccolo, Scott P. Narus, Joyce A. Mitchell and Julio C. Facelli

Although many reported mutations in the RET oncogene have been directly associated with hereditary thyroid carcinoma, other mutations are labelled as uncertain gene variants because they have not been clearly associated with a clinical phenotype. The process of determining the severity of a mutation is costly and time consuming. Informatics tools and methods may aid to bridge this genotype-phenotype gap. Towards this goal, machine-learning classification algorithms were evaluated for their ability to distinguish benign and pathogenic RET gene variants as characterized by differences in values of physicochemical properties of the residue present in the wild type and the one in the mutated sequence. Representative algorithms were chosen from different categories of machine learning classification techniques, including rules, bayes, and regression, nearest neighbour, support vector machines and trees. Machinelearning models were then compared to well-established techniques used for mutation severity prediction. Machinelearning classification can be used to accurately predict RET mutation status using primary sequence information only. Existing algorithms that are based on sequence homology (ortholog conservation) or protein structural data are not necessarily superior.