Do machine learning methods lead to similar individualized treatment rules? A comparison study on real data

Florie Bouvier; Etienne Peyrot; Alan Balendran; Corentin Ségalas; Ian Roberts; François Petit; Raphaël Porcher

doi:10.1002/sim.10059

Do machine learning methods lead to similar individualized treatment rules? A comparison study on real data

Stat Med. 2024 May 20;43(11):2043-2061. doi: 10.1002/sim.10059. Epub 2024 Mar 12.

Authors

Florie Bouvier¹, Etienne Peyrot¹, Alan Balendran¹, Corentin Ségalas², Ian Roberts³, François Petit¹, Raphaël Porcher^{1

4}

Affiliations

¹ Inserm, INRAE, Center for Research in Epidemiology and StatisticS (CRESS), Université Paris Cité and Université Sorbonne Paris Nord, Paris, France.
² Bordeaux Population Health Research Center, Université de Bordeaux, Inserm, Bordeaux, France.
³ Clinical Trials Unit, London School of Hygiene & Tropical Medicine, London, UK.
⁴ Centre d'Épidémiologie Clinique, Assistance Publique-Hôpitaux de Paris, Hôtel-Dieu, Paris, France.

PMID: 38472745
DOI: 10.1002/sim.10059

Abstract

Identifying patients who benefit from a treatment is a key aspect of personalized medicine, which allows the development of individualized treatment rules (ITRs). Many machine learning methods have been proposed to create such rules. However, to what extent the methods lead to similar ITRs, that is, recommending the same treatment for the same individuals is unclear. In this work, we compared 22 of the most common approaches in two randomized control trials. Two classes of methods can be distinguished. The first class of methods relies on predicting individualized treatment effects from which an ITR is derived by recommending the treatment evaluated to the individuals with a predicted benefit. In the second class, methods directly estimate the ITR without estimating individualized treatment effects. For each trial, the performance of ITRs was assessed by various metrics, and the pairwise agreement between all ITRs was also calculated. Results showed that the ITRs obtained via the different methods generally had considerable disagreements regarding the patients to be treated. A better concordance was found among akin methods. Overall, when evaluating the performance of ITRs in a validation sample, all methods produced ITRs with limited performance, suggesting a high potential for optimism. For non-parametric methods, this optimism was likely due to overfitting. The different methods do not lead to similar ITRs and are therefore not interchangeable. The choice of the method strongly influences for which patients a certain treatment is recommended, drawing some concerns about their practical use.

Keywords: comparison study; individualized treatment rule; machine learning; personalized medicine.

Publication types

Comparative Study

MeSH terms

Humans
Machine Learning*
Precision Medicine* / methods
Randomized Controlled Trials as Topic* / methods

Grants and funding

ANR-18-CE36-0010-01/Agence Nationale de la Recherche