Purpose: Trachoma surveys are used to estimate the prevalence of trachomatous inflammation-follicular (TF) to guide mass antibiotic distribution. These surveys currently rely on human graders, introducing a significant resource burden and potential for human error. This study describes the development and evaluation of machine learning models intended to reduce cost and improve reliability of these surveys.
Methods: Fifty-six thousand seven hundred twenty-five everted eyelid photographs were obtained from 11,358 children of age 0 to 9 years in a single trachoma-endemic region of Ethiopia over a 3-year period. Expert graders reviewed all images from each examination to determine the estimated number of tarsal conjunctival follicles and the degree of trachomatous inflammation-intense. The median estimate of the 3 grader groups was used as the ground truth to train a MobileNetV3 large deep convolutional neural network to detect cases with TF.
Results: The classification model predicted a TF prevalence of 32%, which was not significantly different from the human consensus estimate (30%; 95% confidence interval of difference, -2 to +4%). The model had an area under the receiver operating characteristic curve of 0.943, F1 score of 0.923, 88% accuracy, 83% sensitivity, and 91% specificity. The area under the receiver operating characteristic curve increased to 0.995 when interpreting nonborderline cases of TF.
Conclusions: Deep convolutional neural network models performed well at classifying TF and detecting the number of follicles evident in conjunctival photographs. Implementation of similar models may enable accurate, efficient, large-scale trachoma screening. Further validation in diverse populations with varying TF prevalence is needed before implementation at scale.
Copyright © 2024 The Author(s). Published by Wolters Kluwer Health, Inc.