Visual Field Prediction: Evaluating the Clinical Relevance of Deep Learning Models

Mohammad Eslami; Julia A Kim; Miao Zhang; Michael V Boland; Mengyu Wang; Dolly S Chang; Tobias Elze

doi:10.1016/j.xops.2022.100222

Visual Field Prediction: Evaluating the Clinical Relevance of Deep Learning Models

Ophthalmol Sci. 2022 Sep 13;3(1):100222. doi: 10.1016/j.xops.2022.100222. eCollection 2023 Mar.

Authors

Mohammad Eslami¹, Julia A Kim², Miao Zhang², Michael V Boland¹, Mengyu Wang¹, Dolly S Chang^{2

3}, Tobias Elze¹

Affiliations

¹ Schepens Eye Research Institute, Massachusetts Eye and Ear, Harvard Medical School, Boston, Massachusetts.
² Early Clinical Development, Genentech, Inc, South San Francisco, California.
³ Byers Eye Institute, Stanford University, Palo Alto, California.

Abstract

Purpose: Two novel deep learning methods using a convolutional neural network (CNN) and a recurrent neural network (RNN) have recently been developed to forecast future visual fields (VFs). Although the original evaluations of these models focused on overall accuracy, it was not assessed whether they can accurately identify patients with progressive glaucomatous vision loss to aid clinicians in preventing further decline. We evaluated these 2 prediction models for potential biases in overestimating or underestimating VF changes over time.

Design: Retrospective observational cohort study.

Participants: All available and reliable Swedish Interactive Thresholding Algorithm Standard 24-2 VFs from Massachusetts Eye and Ear Glaucoma Service collected between 1999 and 2020 were extracted. Because of the methods' respective needs, the CNN data set included 54 373 samples from 7472 patients, and the RNN data set included 24 430 samples from 1809 patients.

Methods: The CNN and RNN methods were reimplemented. A fivefold cross-validation procedure was performed on each model, and pointwise mean absolute error (PMAE) was used to measure prediction accuracy. Test data were stratified into categories based on the severity of VF progression to investigate the models' performances on predicting worsening cases. The models were additionally compared with a no-change model that uses the baseline VF (for the CNN) and the last-observed VF (for the RNN) for its prediction.

Main outcome measures: PMAE in predictions.

Results: The overall PMAE 95% confidence intervals were 2.21 to 2.24 decibels (dB) for the CNN and 2.56 to 2.61 dB for the RNN, which were close to the original studies' reported values. However, both models exhibited large errors in identifying patients with worsening VFs and often failed to outperform the no-change model. Pointwise mean absolute error values were higher in patients with greater changes in mean sensitivity (for the CNN) and mean total deviation (for the RNN) between baseline and follow-up VFs.

Conclusions: Although our evaluation confirms the low overall PMAEs reported in the original studies, our findings also reveal that both models severely underpredict worsening of VF loss. Because the accurate detection and projection of glaucomatous VF decline is crucial in ophthalmic clinical practice, we recommend that this consideration is explicitly taken into account when developing and evaluating future deep learning models.

Keywords: Artificial intelligence; CI, confidence interval; CNN, convolutional neural network; DL, deep learning; Deep learning; Glaucoma; MD, mean deviation; MPark, recurrent neural network method from Park et al; MWen, convolutional neural network method from Wen et al; PMAE, pointwise mean absolute error; Prediction; RNN, recurrent neural network; ROP, rate of progression; TD, total deviation; VF, visual field; Visual fields; dB, decibel.

Abstract

Grants and funding