Big data analysis for Covid-19 in hospital information systems

PLoS One. 2024 May 22;19(5):e0294481. doi: 10.1371/journal.pone.0294481. eCollection 2024.

Abstract

The COVID-19 pandemic has triggered a global public health crisis, affecting hundreds of countries. With the increasing number of infected cases, developing automated COVID-19 identification tools based on CT images can effectively assist clinical diagnosis and reduce the tedious workload of image interpretation. To expand the dataset for machine learning methods, it is necessary to aggregate cases from different medical systems to learn robust and generalizable models. This paper proposes a novel deep learning joint framework that can effectively handle heterogeneous datasets with distribution discrepancies for accurate COVID-19 identification. We address the cross-site domain shift by redesigning the COVID-Net's network architecture and learning strategy, and independent feature normalization in latent space to improve prediction accuracy and learning efficiency. Additionally, we propose using a contrastive training objective to enhance the domain invariance of semantic embeddings and boost classification performance on each dataset. We develop and evaluate our method with two large-scale public COVID-19 diagnosis datasets containing CT images. Extensive experiments show that our method consistently improves the performance both datasets, outperforming the original COVID-Net trained on each dataset by 13.27% and 15.15% in AUC respectively, also exceeding existing state-of-the-art multi-site learning methods.

MeSH terms

  • Big Data*
  • COVID-19* / epidemiology
  • Deep Learning
  • Hospitals
  • Humans
  • Information Systems
  • Machine Learning
  • Pandemics
  • SARS-CoV-2 / isolation & purification
  • Tomography, X-Ray Computed / methods

Grants and funding

The authors received no specific funding for this work.