Clinical signatures of genetic epilepsies precede diagnosis in electronic medical records of 32,000 individuals

Genet Med. 2024 Nov;26(11):101211. doi: 10.1016/j.gim.2024.101211. Epub 2024 Jul 14.

Abstract

Purpose: An early genetic diagnosis can guide the time-sensitive treatment of individuals with genetic epilepsies. However, most genetic diagnoses occur long after disease onset. We aimed to identify early clinical features suggestive of genetic diagnoses in individuals with epilepsy through large-scale analysis of full-text electronic medical records.

Methods: We extracted 89 million time-stamped standardized clinical annotations using Natural Language Processing from 4,572,783 clinical notes from 32,112 individuals with childhood epilepsy, including 1925 individuals with known or presumed genetic epilepsies. We applied these features to train random forest models to predict SCN1A-related disorders and any genetic diagnosis.

Results: We identified 47,774 age-dependent associations of clinical features with genetic etiologies a median of 3.6 years before molecular diagnosis. Across all 710 genetic etiologies identified in our cohort, neurodevelopmental differences between 6 to 9 months increased the likelihood of a later molecular diagnosis 5-fold (P < .0001, 95% CI = 3.55-7.42). A later diagnosis of SCN1A-related disorders (area under the curve [AUC] = 0.91) or an overall positive genetic diagnosis (AUC = 0.82) could be reliably predicted using random forest models.

Conclusion: Clinical features predictive of genetic epilepsies precede molecular diagnoses by up to several years in conditions with known precision treatments. An earlier diagnosis facilitated by automated electronic medical records analysis has the potential for earlier targeted therapeutic strategies in the genetic epilepsies.

Keywords: Developmental epileptic encephalopathy; Electronic medical record; Epilepsy; Natural language processing; Precision medicine.

MeSH terms

  • Adolescent
  • Adult
  • Child
  • Child, Preschool
  • Early Diagnosis
  • Electronic Health Records*
  • Epilepsy* / diagnosis
  • Epilepsy* / genetics
  • Female
  • Genetic Testing / methods
  • Humans
  • Infant
  • Male
  • NAV1.1 Voltage-Gated Sodium Channel* / genetics
  • Natural Language Processing

Substances

  • NAV1.1 Voltage-Gated Sodium Channel
  • SCN1A protein, human