Deep Learning for Automated Classification of Hip Hardware on Radiographs

J Imaging Inform Med. 2024 Sep 12. doi: 10.1007/s10278-024-01263-y. Online ahead of print.

Abstract

Purpose: To develop a deep learning model for automated classification of orthopedic hardware on pelvic and hip radiographs, which can be clinically implemented to decrease radiologist workload and improve consistency among radiology reports.

Materials and methods: Pelvic and hip radiographs from 4279 studies in 1073 patients were retrospectively obtained and reviewed by musculoskeletal radiologists. Two convolutional neural networks, EfficientNet-B4 and NFNet-F3, were trained to perform the image classification task into the following most represented categories: no hardware, total hip arthroplasty (THA), hemiarthroplasty, intramedullary nail, femoral neck cannulated screws, dynamic hip screw, lateral blade/plate, THA with additional femoral fixation, and post-infectious hip. Model performance was assessed on an independent test set of 851 studies from 262 patients and compared to individual performance of five subspecialty-trained radiologists using leave-one-out analysis against an aggregate gold standard label.

Results: For multiclass classification, the area under the receiver operating characteristic curve (AUC) for NFNet-F3 was 0.99 or greater for all classes, and EfficientNet-B4 0.99 or greater for all classes except post-infectious hip, with an AUC of 0.97. When compared with human observers, models achieved an accuracy of 97%, which is non-inferior to four out of five radiologists and outperformed one radiologist. Cohen's kappa coefficient for both models ranged from 0.96 to 0.97, indicating excellent inter-reader agreement.

Conclusion: A deep learning model can be used to classify a range of orthopedic hip hardware with high accuracy and comparable performance to subspecialty-trained radiologists.

Keywords: Deep learning; Hip radiography; Image classification; Orthopedic hardware; Pelvic radiography.