Differential diagnosis between low-risk and high-risk thymoma: Comparison of diagnostic performance of radiologists with and without deep learning model

Yuriko Yoshida; Masahiro Yanagawa; Yukihisa Sato; Tomo Miyata; Atsushi Kawata; Akinori Hata; Noriyuki Tomiyama

doi:10.1177/20584601241288509

Differential diagnosis between low-risk and high-risk thymoma: Comparison of diagnostic performance of radiologists with and without deep learning model

Acta Radiol Open. 2024 Oct 4;13(10):20584601241288509. doi: 10.1177/20584601241288509. eCollection 2024 Oct.

Authors

Yuriko Yoshida¹, Masahiro Yanagawa¹, Yukihisa Sato², Tomo Miyata³, Atsushi Kawata⁴, Akinori Hata¹, Noriyuki Tomiyama¹

Affiliations

¹ Department of Diagnostic and Interventional Radiology, Osaka University Graduate School of Medicine, Osaksa, Japan.
² Department of Diagnostic Radiology, Suita Municipal Hospital, Osaka, Japan.
³ Department of Diagnostic Radiology, Sakai City Medical Center, Osaka, Japan.
⁴ Department of Diagnostic Radiology, Osaka International Cancer Institute, Osaka, Japan.

Abstract

Background: There are few CT-based deep learning (DL) studies on thymoma according to the World Health Organization classification.

Purpose: To develop a CT-based DL model to distinguish between low-risk and high-risk thymoma and to compare the diagnostic performance of radiologists with and without the DL model.

Material and methods: 159 patients with 160 thymomas were included. A fine-tuning VGG16 network model with Adam optimizer was used, followed by k-fold cross validation. The dataset consisted of three axial slices, including the maximum tumor size from the CT volume data. The data were augmented 50 times by rotation, zoom, shear, and horizontal/vertical flip. Three independent networks for the CT dataset were considered, and the result was determined by voting. Three radiologists independently diagnosed thymomas with and without the model. The area under the curve (AUC) of the diagnostic performance was compared using receiver operating characteristic analysis.

Results: Accuracy of the DL model was 71.3%. Diagnostic performance of the radiologists was as follows: AUC and accuracy without the DL model, 0.61-0.68 and 61.9%-69.3%; and with the DL model, 0.66-0.69 and 68.1%-70.0%, respectively. AUC of the diagnostic performance showed no significant differences between radiologists with and without the DL model. The DL model tended to increase the diagnostic accuracy, but AUC was not significantly improved.

Conclusion: Diagnostic performance of the DL was comparable to that of radiologists. The DL model assistance tended to increase diagnostic accuracy.

Keywords: CT imaging; Deep learning; mediastinum; radiology; risk classification; thymoma.