Finding the best ridge regression subset by genetic algorithms: applications to multilocus quantitative trait mapping

Conf Proc IEEE Eng Med Biol Soc. 2004:2004:2793-6. doi: 10.1109/IEMBS.2004.1403798.

Abstract

Genetic algorithms (GAs) are increasingly used in large and complex optimization problems. Here we use GAs to optimize fitness functions related to ridge regression, which is a classical statistical procedure for dealing with a large number of features in a multivariable, linear regression setting. The algorithm avoids overfitting, gracefully handles collinearity, and leads to easily interpretable results. We use the method to model the relationship between a quantitative trait and genetic markers in a mouse cross involving 69 F2 mice. The approach will be useful in the context of many genomic data sets where the number of features far exceeds the number of observations and where features can be highly correlated.