Reliability as a Precondition for Trust-Segmentation Reliability Analysis of Radiomic Features Improves Survival Prediction

Gustav Müller-Franzes; Sven Nebelung; Justus Schock; Christoph Haarburger; Firas Khader; Federico Pedersoli; Maximilian Schulze-Hagen; Christiane Kuhl; Daniel Truhn

doi:10.3390/diagnostics12020247

Reliability as a Precondition for Trust-Segmentation Reliability Analysis of Radiomic Features Improves Survival Prediction

Diagnostics (Basel). 2022 Jan 19;12(2):247. doi: 10.3390/diagnostics12020247.

Authors

Gustav Müller-Franzes¹, Sven Nebelung¹, Justus Schock², Christoph Haarburger³, Firas Khader¹, Federico Pedersoli¹, Maximilian Schulze-Hagen¹, Christiane Kuhl¹, Daniel Truhn¹

Affiliations

¹ Department of Diagnostic and Interventional Radiology, University Hospital Aachen, 52074 Aachen, Germany.
² Department of Diagnostic and Interventional Radiology, University Hospital Düsseldorf, 40225 Düsseldorf, Germany.
³ Department of Research and Development, CheckupPoint GmbH, 81669 Munich, Germany.

Abstract

Machine learning results based on radiomic analysis are often not transferrable. A potential reason for this is the variability of radiomic features due to varying human made segmentations. Therefore, the aim of this study was to provide comprehensive inter-reader reliability analysis of radiomic features in five clinical image datasets and to assess the association of inter-reader reliability and survival prediction. In this study, we analyzed 4598 tumor segmentations in both computed tomography and magnetic resonance imaging data. We used a neural network to generate 100 additional segmentation outlines for each tumor and performed a reliability analysis of radiomic features. To prove clinical utility, we predicted patient survival based on all features and on the most reliable features. Survival prediction models for both computed tomography and magnetic resonance imaging datasets demonstrated less statistical spread and superior survival prediction when based on the most reliable features. Mean concordance indices were C_mean = 0.58 [most reliable] vs. C_mean = 0.56 [all] (p < 0.001, CT) and C_mean = 0.58 vs. C_mean = 0.57 (p = 0.23, MRI). Thus, preceding reliability analyses and selection of the most reliable radiomic features improves the underlying model's ability to predict patient survival across clinical imaging modalities and tumor entities.

Keywords: inter-rater reliability; neural network; overall survival; radiomic features; robustness; segmentation variability.