Introduction: Colorectal cancer (CRC) is a global public health problem. There is strong indication that nutrition could be an important component of primary prevention. Dietary patterns are a powerful technique for understanding the relationship between diet and cancer varying across populations.
Objective: We used an unsupervised machine learning approach to cluster Moroccan dietary patterns associated with CRC.
Methods: The study was conducted based on the reported nutrition of CRC matched cases and controls including 1483 pairs. Baseline dietary intake was measured using a validated food-frequency questionnaire adapted to the Moroccan context. Food items were consolidated into 30 food groups reduced on 6 dimensions by principal component analysis (PCA).
Results: K-means method, applied in the PCA-subspace, identified two patterns: 'prudent pattern' (moderate consumption of almost all foods with a slight increase in fruits and vegetables) and a 'dangerous pattern' (vegetable oil, cake, chocolate, cheese, red meat, sugar and butter) with small variation between components and clusters. The student test showed a significant relationship between clusters and all food consumption except poultry. The simple logistic regression test showed that people who belong to the 'dangerous pattern' have a higher risk to develop CRC with an OR 1.59, 95% CI (1.37 to 1.38).
Conclusion: The proposed algorithm applied to the CCR Nutrition database identified two dietary profiles associated with CRC: the 'dangerous pattern' and the 'prudent pattern'. The results of this study could contribute to recommendations for CRC preventive diet in the Moroccan population.
Keywords: BMJ Health Informatics; Data Mining; Unsupervised Machine Learning.
© Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ.