Introduction: Alzheimer's disease (AD) is the most common form of dementia in the elderly. Given that AD neuropathology begins decades before symptoms, there is a dire need for effective screening tools for early detection of AD to facilitate early intervention.
Methods: Here, we used tree-based and deep learning methods to train polyomic prediction models for AD affection status and age at onset, employing genomic, proteomic, metabolomic, and drug use data from UK Biobank. We used SHAP to determine the feature's importance.
Results: Our best-performing polyomic model achieved an area under the receiver operating characteristics curve (AUROC) of 0.87. We identified GFAP and CXCL17 proteins to be the strongest predictors of AD, besides apolipoprotein E (APOE) alleles. Increasing the number of cases by including "AD-by-proxy" cases did not improve AD prediction.
Discussion: Among the four modalities, genomics, and proteomics were the most informative modality based on AUROC (area under the receiver operating characteristic curve). Our data suggest that two blood-based biomarkers (glial fibrillary acidic protein [GFAP] and CXCL17) may be effective for early presymptomatic prediction of AD.
Highlights: We developed a polyomic model to predict AD and age-at-onset using omics and medication use data from EHR. We identified GFAP and CXCL17 proteins to be the strongest predictors of AD, besides APOE alleles. "AD-by-proxy" cases, if used in training, do not improve AD prediction. Proteomics was the most informative modality overall for affection status and AAO prediction.
Keywords: Alzheimer's disease; machine learning; omics; polyomic model; prediction.
© 2024 The Author(s). Alzheimer's & Dementia published by Wiley Periodicals LLC on behalf of Alzheimer's Association.