Background: Preterm birth is a major worldwide public health concern, being the leading cause of infant mortality. Understanding of risk factors remains limited, and early identification of women at high risk of preterm birth is an open challenge. Objective: The aim of the study was to develop and validate a novel pre-pregnancy score for preterm delivery in nulliparous women using information from Italian healthcare utilization databases. Study Design: Twenty-six variables independently able to predict preterm delivery were selected, using a LASSO logistic regression, from a large number of features collected in the 4 years prior to conception, related to clinical history and socio-demographic characteristics of 126,839 nulliparous women from Lombardy region who gave birth between 2012 and 2017. A weight proportional to the coefficient estimated by the model was assigned to each of the selected variables, which contributed to the Preterm Birth Score. Discrimination and calibration of the Preterm Birth Score were assessed using an internal validation set (i.e., other 54,359 deliveries from Lombardy) and two external validation sets (i.e., 14,703 and 62,131 deliveries from Marche and Sicily, respectively). Results: The occurrence of preterm delivery increased with increasing the Preterm Birth Score value in all regions in the study. Almost ideal calibration plots were obtained for the internal validation set and Marche, while expected and observed probabilities differed slightly in Sicily for high Preterm Birth Score values. The area under the receiver operating characteristic curve was 60%, 61% and 56% for the internal validation set, Marche and Sicily, respectively. Conclusions: Despite the limited discriminatory power, the Preterm Birth Score is able to stratify women according to their risk of preterm birth, allowing the early identification of mothers who are more likely to have a preterm delivery.
Keywords: healthcare utilization database; nulliparous; preterm birth; real-world evidence; score.