Assessing the significance of novel genetic variants revealed by DNA sequencing is a major challenge to the integration of genomic techniques with medical practice. Many variants remain difficult to classify by traditional genetic methods. Computational methods have been developed that could contribute to classifying these variants, but they have not been properly validated and are generally not considered mature enough to be used effectively in a clinical setting. We developed a computational method for predicting the effects of missense variants detected in patients with hypertrophic cardiomyopathy (HCM). We used a curated clinical data set of 74 missense variants in six genes associated with HCM to train and validate an automated predictor. The predictor is based on support vector regression and uses phylogenetic and structural features specific to genes involved in HCM. Ten-fold cross validation estimated our predictor's sensitivity at 94% (95% confidence interval: 83%-98%) and specificity at 89% (95% confidence interval: 72%-100%). This corresponds to an odds ratio of 10 for a prediction of pathogenic (95% confidence interval: 4.0-infinity), or an odds ratio of 9.9 for a prediction of benign (95% confidence interval: 4.6-21). Coverage (proportion of variants for which a prediction was made) was 57% (95% confidence interval: 49%-64%). This performance exceeds that of existing methods that are not specifically designed for HCM. The accuracy of this predictor provides support for the clinical use of automated predictions alongside family segregation and population frequency data in the interpretation of new missense variants and suggests future development of similar tools for other diseases.
Copyright © 2011 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.