Motivation: Polymorphisms in human genes are being described in remarkable numbers. Determining which polymorphisms and which environmental factors are associated with common, complex diseases has become a daunting task. This is partly because the effect of any single genetic variation will likely be dependent on other genetic variations (gene-gene interaction or epistasis) and environmental factors (gene-environment interaction). Detecting and characterizing interactions among multiple factors is both a statistical and a computational challenge. To address this problem, we have developed a multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension thus permitting interactions to be detected in relatively small sample sizes. In this paper, we describe the MDR approach and an MDR software package.
Results: We developed a program that integrates MDR with a cross-validation strategy for estimating the classification and prediction error of multifactor models. The software can be used to analyze interactions among 2-15 genetic and/or environmental factors. The dataset may contain up to 500 total variables and a maximum of 4000 study subjects.
Availability: Information on obtaining the executable code, example data, example analysis, and documentation is available upon request.
Supplementary information: All supplementary information can be found at http://phg.mc.vanderbilt.edu/Software/MDR.