Assessing polyomic risk to predict Alzheimer's disease using a machine learning model

Tiffany Ngai; Julian Willett; Mohammad Waqas; Lucas H Fishbein; Younjung Choi; Georg Hahn; Kristina Mullin; Christoph Lange; Julian Hecker; Rudolph E Tanzi; Dmitry Prokopenko

doi:10.1002/alz.14319

Assessing polyomic risk to predict Alzheimer's disease using a machine learning model

Alzheimers Dement. 2024 Nov 7. doi: 10.1002/alz.14319. Online ahead of print.

Authors

Affiliations

¹ Department of Neurology, Genetics and Aging Research Unit and the McCance Center for Brain Health, Massachusetts General Hospital and Harvard Medical School, Charlestown, Massachusetts, USA.
² Department of Systems Design Engineering, University of Waterloo, Waterloo, Ontario, Canada.
³ Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.
⁴ Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts, USA.
⁵ Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.

PMID: 39511865
DOI: 10.1002/alz.14319

Abstract

Introduction: Alzheimer's disease (AD) is the most common form of dementia in the elderly. Given that AD neuropathology begins decades before symptoms, there is a dire need for effective screening tools for early detection of AD to facilitate early intervention.

Methods: Here, we used tree-based and deep learning methods to train polyomic prediction models for AD affection status and age at onset, employing genomic, proteomic, metabolomic, and drug use data from UK Biobank. We used SHAP to determine the feature's importance.

Results: Our best-performing polyomic model achieved an area under the receiver operating characteristics curve (AUROC) of 0.87. We identified GFAP and CXCL17 proteins to be the strongest predictors of AD, besides apolipoprotein E (APOE) alleles. Increasing the number of cases by including "AD-by-proxy" cases did not improve AD prediction.

Discussion: Among the four modalities, genomics, and proteomics were the most informative modality based on AUROC (area under the receiver operating characteristic curve). Our data suggest that two blood-based biomarkers (glial fibrillary acidic protein [GFAP] and CXCL17) may be effective for early presymptomatic prediction of AD.

Highlights: We developed a polyomic model to predict AD and age-at-onset using omics and medication use data from EHR. We identified GFAP and CXCL17 proteins to be the strongest predictors of AD, besides APOE alleles. "AD-by-proxy" cases, if used in training, do not improve AD prediction. Proteomics was the most informative modality overall for affection status and AAO prediction.

Keywords: Alzheimer's disease; machine learning; omics; polyomic model; prediction.

Abstract

Grants and funding