Purpose: Urologists rely heavily on videourodynamics to identify patients with neurogenic bladders who are at risk of upper tract injury, but their interpretation has high interobserver variability. Our objective was to develop deep learning models of videourodynamics studies to categorize severity of bladder dysfunction.
Materials and methods: We performed a cross-sectional study of patients aged 2 months to 28 years with spina bifida who underwent videourodynamics at a single institution between 2019 and 2021. The outcome was degree of bladder dysfunction, defined as none/mild, moderate, and severe, defined by a panel of 5 expert reviewers. Reviewers considered factors that increase the risk of upper tract injury, such as poor compliance, elevated detrusor leak point pressure, and detrusor sphincter dyssynergia, in determining bladder dysfunction severity. We built 4 models to predict severity of bladder dysfunction: (1) a random forest clinical model using prospectively collected clinical data from videourodynamics studies, (2) a deep learning convolutional neural network of raw data from the volume-pressure recordings, (3) a deep learning imaging model of fluoroscopic images, (4) an ensemble model averaging the risk probabilities of the volume-pressure and fluoroscopic models.
Results: Among 306 videourodynamics studies, the accuracy and weighted kappa of the ensemble model classification of bladder dysfunction when at least 75% expected bladder capacity was reached were 70% (95% CI 66%,76%) and 0.54 (moderate agreement), respectively. The performance of the clinical model built from data extracted by pediatric urologists was the poorest with an accuracy of 61% (55%, 66%) and a weighted kappa of 0.37.
Conclusions: Our models built from urodynamic pressure-volume tracings and fluoroscopic images were able to automatically classify bladder dysfunction with moderately high accuracy.
Keywords: machine learning; spinal dysraphism; urinary bladder, neurogenic; urodynamics.