Heterogeneity of major depressive disorder (MDD) illness course complicates clinical decision-making. Although efforts to use symptom profiles or biomarkers to develop clinically useful prognostic subtypes have had limited success, a recent report showed that machine-learning (ML) models developed from self-reports about incident episode characteristics and comorbidities among respondents with lifetime MDD in the World Health Organization World Mental Health (WMH) Surveys predicted MDD persistence, chronicity and severity with good accuracy. We report results of model validation in an independent prospective national household sample of 1056 respondents with lifetime MDD at baseline. The WMH ML models were applied to these baseline data to generate predicted outcome scores that were compared with observed scores assessed 10-12 years after baseline. ML model prediction accuracy was also compared with that of conventional logistic regression models. Area under the receiver operating characteristic curve based on ML (0.63 for high chronicity and 0.71-0.76 for the other prospective outcomes) was consistently higher than for the logistic models (0.62-0.70) despite the latter models including more predictors. A total of 34.6-38.1% of respondents with subsequent high persistence chronicity and 40.8-55.8% with the severity indicators were in the top 20% of the baseline ML-predicted risk distribution, while only 0.9% of respondents with subsequent hospitalizations and 1.5% with suicide attempts were in the lowest 20% of the ML-predicted risk distribution. These results confirm that clinically useful MDD risk-stratification models can be generated from baseline patient self-reports and that ML methods improve on conventional methods in developing such models.