Deep learning has shown great potential to automate abdominal organ segmentation and quantification. However, most existing algorithms rely on expert annotations and do not have comprehensive evaluations in real-world multinational settings. To address these limitations, we organised the FLARE 2022 challenge to benchmark fast, low-resource, and accurate abdominal organ segmentation algorithms. We first constructed an intercontinental abdomen CT dataset from more than 50 clinical research groups. We then independently validated that deep learning algorithms achieved a median dice similarity coefficient (DSC) of 90·0% (IQR 87·4-91·3%) by use of 50 labelled images and 2000 unlabelled images, which can substantially reduce manual annotation costs. The best-performing algorithms successfully generalised to holdout external validation sets, achieving a median DSC of 89·4% (85·2-91·3%), 90·0% (84·3-93·0%), and 88·5% (80·9-91·9%) on North American, European, and Asian cohorts, respectively. These algorithms show the potential to use unlabelled data to boost performance and alleviate annotation shortages for modern artificial intelligence models.
Copyright © 2024 Elsevier Ltd. All rights reserved.