The COVID-19 pandemic has triggered a global public health crisis, affecting hundreds of countries. With the increasing number of infected cases, developing automated COVID-19 identification tools based on CT images can effectively assist clinical diagnosis and reduce the tedious workload of image interpretation. To expand the dataset for machine learning methods, it is necessary to aggregate cases from different medical systems to learn robust and generalizable models. This paper proposes a novel deep learning joint framework that can effectively handle heterogeneous datasets with distribution discrepancies for accurate COVID-19 identification. We address the cross-site domain shift by redesigning the COVID-Net's network architecture and learning strategy, and independent feature normalization in latent space to improve prediction accuracy and learning efficiency. Additionally, we propose using a contrastive training objective to enhance the domain invariance of semantic embeddings and boost classification performance on each dataset. We develop and evaluate our method with two large-scale public COVID-19 diagnosis datasets containing CT images. Extensive experiments show that our method consistently improves the performance both datasets, outperforming the original COVID-Net trained on each dataset by 13.27% and 15.15% in AUC respectively, also exceeding existing state-of-the-art multi-site learning methods.
Copyright: © 2024 Ying et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.