Model selection to achieve reproducible associations between resting state EEG features and autism

William E Carson 4th; Samantha Major; Harshitha Akkineni; Hannah Fung; Elias Peters; Kimberly L H Carpenter; Geraldine Dawson; David E Carlson

doi:10.1038/s41598-024-76659-5

Model selection to achieve reproducible associations between resting state EEG features and autism

Sci Rep. 2024 Oct 25;14(1):25301. doi: 10.1038/s41598-024-76659-5.

Authors

William E Carson 4th¹, Samantha Major^{2

3}, Harshitha Akkineni^{2

3}, Hannah Fung^{2

3}, Elias Peters^{2

3}, Kimberly L H Carpenter^{2

3

4}, Geraldine Dawson^{2

3

4}, David E Carlson^{5

6

7}

Affiliations

¹ Department of Biomedical Engineering, Duke University, Durham, NC, 27708, USA.
² Duke Center for Autism and Brain Development, Duke University, Durham, NC, 27708, USA.
³ Department of Psychiatry and Behavioral Sciences, Duke University, Durham, NC, 27708, USA.
⁴ Duke Institute for Brain Sciences, Duke University, Durham, NC, 27708, USA.
⁵ Department of Civil and Environmental Engineering, Duke University, Durham, NC, 27708, USA. david.carlson@duke.edu.
⁶ Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, 27708, USA. david.carlson@duke.edu.
⁷ Department of Electrical and Computer Engineering, Duke University, Durham, NC, 27708, USA. david.carlson@duke.edu.

Abstract

A concern in the field of autism electroencephalography (EEG) biomarker discovery is their lack of reproducibility. In the present study, we considered the problem of learning reproducible associations between multiple features of resting state (RS) neural activity and autism, using EEG data collected during a RS paradigm from 36 to 96 month-old children diagnosed with autism (N = 224) and neurotypical children (N = 69). Specifically, EEG spectral power and functional connectivity features were used as inputs to a regularized generalized linear model trained to predict diagnostic group (autism versus neurotypical). To evaluate our model, we proposed a procedure that quantified both the predictive generalization and reproducibility of learned associations produced by the model. When prioritizing both model predictive performance and reproducibility of associations, a highly reproducible profile of associations emerged. This profile revealed a distinct pattern of increased gamma power and connectivity in occipital and posterior midline regions associated with an autism diagnosis. Conversely, model selection based on predictive performance alone resulted in non-robust associations. Finally, we built a custom machine learning model that further empirically improved robustness of learned associations. Our results highlight the need for model selection criteria that maximize the scientific utility provided by reproducibility instead of predictive performance.

Keywords: Autism; EEG; Electroencephalography; Reproducibility; Reproducible; Resting state.

MeSH terms

Autistic Disorder* / diagnosis
Autistic Disorder* / physiopathology
Brain / diagnostic imaging
Brain / physiopathology
Child
Child, Preschool
Electroencephalography* / methods
Female
Humans
Machine Learning
Male
Reproducibility of Results
Rest / physiology

Abstract

MeSH terms

Grants and funding