Risk prediction models for discrete ordinal outcomes: Calibration and the impact of the proportional odds assumption

Michael Edlinger; Maarten van Smeden; Hannes F Alber; Maria Wanitschek; Ben Van Calster

doi:10.1002/sim.9281

Risk prediction models for discrete ordinal outcomes: Calibration and the impact of the proportional odds assumption

Stat Med. 2022 Apr 15;41(8):1334-1360. doi: 10.1002/sim.9281. Epub 2021 Dec 12.

Authors

Michael Edlinger^{1

2}, Maarten van Smeden^{3

4}, Hannes F Alber^{5

6}, Maria Wanitschek⁷, Ben Van Calster^{1

8

9}

Affiliations

¹ Department of Development and Regeneration, KU Leuven, Leuven, Belgium.
² Department of Medical Statistics, Informatics, and Health Economics, Medical University Innsbruck, Innsbruck, Austria.
³ Julius Centre for Health Science and Primary Care, University Medical Centre Utrecht, Utrecht, The Netherlands.
⁴ Department of Clinical Epidemiology, Leiden University Medical Centre, Leiden, The Netherlands.
⁵ Department of Internal Medicine and Cardiology, Klinikum Klagenfurt am Wörthersee, Klagenfurt, Austria.
⁶ Karl Landsteiner Institute for Interdisciplinary Science, Rehabilitation Centre, Münster, Austria.
⁷ Department of Internal Medicine III-Cardiology and Angiology, Tirol Kliniken, Innsbruck, Austria.
⁸ EPI-Centre, KU Leuven, Leuven, Belgium.
⁹ Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands.

Abstract

Calibration is a vital aspect of the performance of risk prediction models, but research in the context of ordinal outcomes is scarce. This study compared calibration measures for risk models predicting a discrete ordinal outcome, and investigated the impact of the proportional odds assumption on calibration and overfitting. We studied the multinomial, cumulative, adjacent category, continuation ratio, and stereotype logit/logistic models. To assess calibration, we investigated calibration intercepts and slopes, calibration plots, and the estimated calibration index. Using large sample simulations, we studied the performance of models for risk estimation under various conditions, assuming that the true model has either a multinomial logistic form or a cumulative logit proportional odds form. Small sample simulations were used to compare the tendency for overfitting between models. As a case study, we developed models to diagnose the degree of coronary artery disease (five categories) in symptomatic patients. When the true model was multinomial logistic, proportional odds models often yielded poor risk estimates, with calibration slopes deviating considerably from unity even on large model development datasets. The stereotype logistic model improved the calibration slope, but still provided biased risk estimates for individual patients. When the true model had a cumulative logit proportional odds form, multinomial logistic regression provided biased risk estimates, although these biases were modest. Nonproportional odds models require more parameters to be estimated from the data, and hence suffered more from overfitting. Despite larger sample size requirements, we generally recommend multinomial logistic regression for risk prediction modeling of discrete ordinal outcomes.

Keywords: calibration; discrete ordinal outcome; predictive performance; proportional odds; risk prediction; simulation.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Calibration*
Humans
Logistic Models
Probability
Sample Size