Background: Electroencephalography (EEG) has a long history as a clinical tool to study brain function, and its potential to derive biomarkers for various applications is far from exhausted. Machine learning (ML) can guide future innovation by harnessing the wealth of complex EEG signals to isolate relevant brain activity. Yet, ML studies in EEG tend to ignore physiological artefacts, which may cause problems for deriving biomarkers specific to the central nervous system (CNS).
Methods: We present a framework for conceptualising machine learning from CNS versus peripheral signals measured with EEG. A signal representation based on Morlet wavelets allowed us to define traditional brain activity features (e.g. log power) and alternative inputs used by state-of-the-art ML approaches based on covariance matrices. Using more than 2600 EEG recordings from large public databases (TUAB, TDBRAIN), we studied the impact of peripheral signals and artefact removal techniques on ML models in age and sex prediction analyses.
Findings: Across benchmarks, basic artefact rejection improved model performance, whereas further removal of peripheral signals using ICA decreased performance. Our analyses revealed that peripheral signals enable age and sex prediction. However, they explained only a fraction of the performance provided by brain signals.
Interpretation: We show that brain signals and body signals, both present in the EEG, allow for prediction of personal characteristics. While these results may depend on specific applications, our work suggests that great care is needed to separate these signals when the goal is to develop CNS-specific biomarkers using ML.
Funding: All authors have been working for F. Hoffmann-La Roche Ltd.
Keywords: Biomarker; EEG; Machine learning; Preprocessing; Wavelets.
Copyright © 2024 The Author(s). Published by Elsevier B.V. All rights reserved.