Identifying patients who benefit from a treatment is a key aspect of personalized medicine, which allows the development of individualized treatment rules (ITRs). Many machine learning methods have been proposed to create such rules. However, to what extent the methods lead to similar ITRs, that is, recommending the same treatment for the same individuals is unclear. In this work, we compared 22 of the most common approaches in two randomized control trials. Two classes of methods can be distinguished. The first class of methods relies on predicting individualized treatment effects from which an ITR is derived by recommending the treatment evaluated to the individuals with a predicted benefit. In the second class, methods directly estimate the ITR without estimating individualized treatment effects. For each trial, the performance of ITRs was assessed by various metrics, and the pairwise agreement between all ITRs was also calculated. Results showed that the ITRs obtained via the different methods generally had considerable disagreements regarding the patients to be treated. A better concordance was found among akin methods. Overall, when evaluating the performance of ITRs in a validation sample, all methods produced ITRs with limited performance, suggesting a high potential for optimism. For non-parametric methods, this optimism was likely due to overfitting. The different methods do not lead to similar ITRs and are therefore not interchangeable. The choice of the method strongly influences for which patients a certain treatment is recommended, drawing some concerns about their practical use.
Keywords: comparison study; individualized treatment rule; machine learning; personalized medicine.
© 2024 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.