Purpose: To quantify interobserver variation (IOV) in target volume and organs-at-risk (OAR) contouring across 31 institutions in breast cancer cases and to explore the clinical utility of deep learning (DL)-based auto-contouring in reducing potential IOV.
Methods and materials: In phase 1, two breast cancer cases were randomly selected and distributed to multiple institutions for contouring six clinical target volumes (CTVs) and eight OAR. In Phase 2, auto-contour sets were generated using a previously published DL Breast segmentation model and were made available for all participants. The difference in IOV of submitted contours in phases 1 and 2 was investigated quantitatively using the Dice similarity coefficient (DSC) and Hausdorff distance (HD). The qualitative analysis involved using contour heat maps to visualize the extent and location of these variations and the required modification.
Results: Over 800 pairwise comparisons were analysed for each structure in each case. Quantitative phase 2 metrics showed significant improvement in the mean DSC (from 0.69 to 0.77) and HD (from 34.9 to 17.9 mm). Quantitative analysis showed increased interobserver agreement in phase 2, specifically for CTV structures (5-19 %), leading to fewer manual adjustments. Underlying IOV differences causes were reported using a questionnaire and hierarchical clustering analysis based on the volume of CTVs.
Conclusion: DL-based auto-contours improved the contour agreement for OARs and CTVs significantly, both qualitatively and quantitatively, suggesting its potential role in minimizing radiation therapy protocol deviation.
Keywords: Auto-contouring; Breast cancer; Deep learning; Inter-observer variation; RTQA.
Copyright © 2023. Published by Elsevier Ltd.