Comparison of the output of a deep learning segmentation model for locoregional breast cancer radiotherapy trained on 2 different datasets

Nienke Bakx; Maurice van der Sangen; Jacqueline Theuws; Hanneke Bluemink; Coen Hurkmans

doi:10.1016/j.tipsro.2023.100209

Comparison of the output of a deep learning segmentation model for locoregional breast cancer radiotherapy trained on 2 different datasets

Tech Innov Patient Support Radiat Oncol. 2023 May 13:26:100209. doi: 10.1016/j.tipsro.2023.100209. eCollection 2023 Jun.

Authors

Nienke Bakx¹, Maurice van der Sangen¹, Jacqueline Theuws¹, Hanneke Bluemink¹, Coen Hurkmans^{1

2}

Affiliations

¹ Catharina Hospital, Department of Radiation Oncology, 5602ZA Eindhoven, the Netherlands.
² Technical University Eindhoven, Faculties of Physics and Electrical Engineering, 5600MB Eindhoven, the Netherlands.

Abstract

Introduction: The development of deep learning (DL) models for auto-segmentation is increasing and more models become commercially available. Mostly, commercial models are trained on external data. To study the effect of using a model trained on external data, compared to the same model trained on in-house collected data, the performance of these two DL models was evaluated.

Methods: The evaluation was performed using in-house collected data of 30 breast cancer patients. Quantitative analysis was performed using Dice similarity coefficient (DSC), surface DSC (sDSC) and 95th percentile of Hausdorff Distance (95% HD). These values were compared with previously reported inter-observer variations (IOV).

Results: For a number of structures, statistically significant differences were found between the two models. For organs at risk, mean values for DSC ranged from 0.63 to 0.98 and 0.71 to 0.96 for the in-house and external model, respectively. For target volumes, mean DSC values of 0.57 to 0.94 and 0.33 to 0.92 were found. The difference of 95% HD values ranged 0.08 to 3.23 mm between the two models, except for CTVn4 with 9.95 mm. For the external model, both DSC and 95% HD are outside the range of IOV for CTVn4, whereas this is the case for the DSC found for the thyroid of the in-house model.

Conclusions: Statistically significant differences were found between both models, which were mostly within published inter-observer variations, showing clinical usefulness of both models. Our findings could encourage discussion and revision of existing guidelines, to further decrease inter-observer, but also inter-institute variability.

Keywords: Auto-segmentation; Clinical validation; Deep learning; Loco-regional breast cancer; Radiotherapy.