This manuscript considers a data-mining approach for the prediction of mild obstructive sleep disordered breathing, defined as an elevated respiratory disturbance index (RDI), in 5,530 participants in a community-based study, the Sleep Heart Health Study. The prediction algorithm was built using modern ensemble learning algorithms, boosting in specific, which allowed for assessing potential high-dimensional interactions between predictor variables or classifiers. To evaluate the performance of the algorithm, the data were split into training and validation sets for varying thresholds for predicting the probability of a high RDI (≥7 events per hour in the given results). Based on a moderate classification threshold from the boosting algorithm, the estimated post-test odds of a high RDI were 2.20 times higher than the pre-test odds given a positive test, while the corresponding post-test odds were decreased by 52% given a negative test (sensitivity and specificity of 0.66 and 0.70, respectively). In rank order, the following variables had the largest impact on prediction performance: neck circumference, body mass index, age, snoring frequency, waist circumference, and snoring loudness.
Keywords: Sleep disorders; machine learning; prediction; sleep apnea; variable importance.