Improving Data-Efficiency and Robustness of Medical Imaging Segmentation Using Inpainting-Based Self-Supervised Learning

Jeffrey Dominic; Nandita Bhaskhar; Arjun D Desai; Andrew Schmidt; Elka Rubin; Beliz Gunel; Garry E Gold; Brian A Hargreaves; Leon Lenchik; Robert Boutin; Akshay S Chaudhari

doi:10.3390/bioengineering10020207

Improving Data-Efficiency and Robustness of Medical Imaging Segmentation Using Inpainting-Based Self-Supervised Learning

Bioengineering (Basel). 2023 Feb 4;10(2):207. doi: 10.3390/bioengineering10020207.

Authors

Jeffrey Dominic¹, Nandita Bhaskhar², Arjun D Desai^{1

2}, Andrew Schmidt¹, Elka Rubin¹, Beliz Gunel², Garry E Gold¹, Brian A Hargreaves^{1

2

3}, Leon Lenchik⁴, Robert Boutin¹, Akshay S Chaudhari^{1

5

6}

Affiliations

¹ Department of Radiology, Stanford University, Stanford, CA 94305, USA.
² Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA.
³ Department of Bioengineering, Stanford University, Stanford, CA 94305, USA.
⁴ Department of Radiology, Wake Forest University School of Medicine, Winston-Salem, NC 27101, USA.
⁵ Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.
⁶ Stanford Cardiovascular Institute, Stanford University, Stanford, CA 94305, USA.

Abstract

We systematically evaluate the training methodology and efficacy of two inpainting-based pretext tasks of context prediction and context restoration for medical image segmentation using self-supervised learning (SSL). Multiple versions of self-supervised U-Net models were trained to segment MRI and CT datasets, each using a different combination of design choices and pretext tasks to determine the effect of these design choices on segmentation performance. The optimal design choices were used to train SSL models that were then compared with baseline supervised models for computing clinically-relevant metrics in label-limited scenarios. We observed that SSL pretraining with context restoration using 32 × 32 patches and Poission-disc sampling, transferring only the pretrained encoder weights, and fine-tuning immediately with an initial learning rate of 1 × 10-3 provided the most benefit over supervised learning for MRI and CT tissue segmentation accuracy (p < 0.001). For both datasets and most label-limited scenarios, scaling the size of unlabeled pretraining data resulted in improved segmentation performance. SSL models pretrained with this amount of data outperformed baseline supervised models in the computation of clinically-relevant metrics, especially when the performance of supervised learning was low. Our results demonstrate that SSL pretraining using inpainting-based pretext tasks can help increase the robustness of models in label-limited scenarios and reduce worst-case errors that occur with supervised learning.

Keywords: CT; MRI; deep learning; machine learning; segmentation; self-supervised learning.

Abstract

Grants and funding