Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining

Isa Kristina Kirk; Christian Simon; Karina Banasik; Peter Christoffer Holm; Amalie Dahl Haue; Peter Bjødstrup Jensen; Lars Juhl Jensen; Cristina Leal Rodríguez; Mette Krogh Pedersen; Robert Eriksson; Henrik Ullits Andersen; Thomas Almdal; Jette Bork-Jensen; Niels Grarup; Knut Borch-Johnsen; Oluf Pedersen; Flemming Pociot; Torben Hansen; Regine Bergholdt; Peter Rossing; Søren Brunak

doi:10.7554/eLife.44941

Linking glycemic dysregulation in diabetes to symptoms, comorbidities, and genetics through EHR data mining

Elife. 2019 Dec 10:8:e44941. doi: 10.7554/eLife.44941.

Authors

Isa Kristina Kirk^#¹, Christian Simon^#¹, Karina Banasik¹, Peter Christoffer Holm¹, Amalie Dahl Haue¹, Peter Bjødstrup Jensen^{1

2}, Lars Juhl Jensen¹, Cristina Leal Rodríguez¹, Mette Krogh Pedersen¹, Robert Eriksson¹, Henrik Ullits Andersen³, Thomas Almdal^{3

4}, Jette Bork-Jensen⁵, Niels Grarup⁵, Knut Borch-Johnsen⁶, Oluf Pedersen^{3

5}, Flemming Pociot^{3

7}, Torben Hansen^{3

5}, Regine Bergholdt³, Peter Rossing^{3

8}, Søren Brunak^{1

9}

Affiliations

¹ Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark.
² Odense Patient Data Explorative Network (OPEN), Odense University Hospital, Odense, Denmark.
³ Steno Diabetes Center Copenhagen, Gentofte, Denmark.
⁴ Department of Endocrinology, Rigshospitalet, Copenhagen, Denmark.
⁵ Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark.
⁶ Holbæk Hospital, Holbæk, Denmark.
⁷ Department of Clinical Medicine, Herlev-Gentofte Hospital, Herlev, Denmark.
⁸ Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
⁹ Center for Biological Sequence Analysis, Department of Bio and Health Informatics, Technical University of Denmark, Lyngby, Denmark.

^# Contributed equally.

Abstract

Diabetes is a diverse and complex disease, with considerable variation in phenotypic manifestation and severity. This variation hampers the study of etiological differences and reduces the statistical power of analyses of associations to genetics, treatment outcomes, and complications. We address these issues through deep, fine-grained phenotypic stratification of a diabetes cohort. Text mining the electronic health records of 14,017 patients, we matched two controlled vocabularies (ICD-10 and a custom vocabulary developed at the clinical center Steno Diabetes Center Copenhagen) to clinical narratives spanning a 19 year period. The two matched vocabularies comprise over 20,000 medical terms describing symptoms, other diagnoses, and lifestyle factors. The cohort is genetically homogeneous (Caucasian diabetes patients from Denmark) so the resulting stratification is not driven by ethnic differences, but rather by inherently dissimilar progression patterns and lifestyle related risk factors. Using unsupervised Markov clustering, we defined 71 clusters of at least 50 individuals within the diabetes spectrum. The clusters display both distinct and shared longitudinal glycemic dysregulation patterns, temporal co-occurrences of comorbidities, and associations to single nucleotide polymorphisms in or near genes relevant for diabetes comorbidities.

Keywords: EHR; comorbidities; computational biology; diabetes; diabetes subtypes; epidemiology; genotyping; global health; human; systems biology; text mining.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adolescent
Adult
Aged
Aged, 80 and over
Algorithms
Child
Cohort Studies
Data Mining*
Denmark / epidemiology
Diabetes Complications / diagnosis
Diabetes Complications / epidemiology*
Diabetes Complications / genetics
Diabetes Complications / therapy
Diabetes Mellitus / diagnosis
Diabetes Mellitus / epidemiology*
Diabetes Mellitus / genetics
Diabetes Mellitus / therapy
Electronic Health Records
Female
Humans
Male
Middle Aged
Risk Factors
Terminology as Topic*
Treatment Outcome
Vocabulary
Young Adult

Abstract

Publication types

MeSH terms

Grants and funding