Developing effective strategies to predict areas susceptible to landslides and reducing risk is vital. This involves using ensemble methods to meet the precise prediction and addressing challenges like data limitation. Recent studies have highlighted the potential of using ensemble methods to enhance the prediction of landslide susceptibility maps (LSM). Ensemble methods present a sampling of landslides and non-landslide points from high and low susceptible areas, respectively. Extensive research has explored their application in machine learning processes, particularly in classification-related problems. This study delves into strategies of ensemble as a promising method in future landslide applications. The proposed method was tested considering Kangra district of Himachal Pradesh as study area where three datasets were prepared consisting of presence and absence points. Dataset 1 consisted of initial landslide and randomly generated non-landslide points. In dataset 2, additional landslide points obtained from the very high susceptibility of initial LSM were supplemented with initial landslide data, while the non-landslide points were generated randomly from the study area. Finally, dataset 3 was composed of the landslide points as in dataset 2, and the non-landslide points were obtained from the very low susceptible areas of initial LSM. These datasets are used with random forest (RF) and support vector machine (SVM), thereby preparing six landslide susceptibility maps. To analyze the applicability of the proposed method, we have used metrics such as AUC-ROC, precision, recall, F-score, accuracy and Mathew's correlation coefficient (MCC). The AUC for dataset 1 with SVM and RF is 0.89, which increased to 0.898 and 0.952 for datasets 2 and 3 with SVM and 0.937 and 0.954 with RF. Among all the methods, the precision and recall values were highest for dataset 3 with SVM as well as RF. Hence, based on several accuracy metrics, we conclude that when the landslides and non-landslides samples were sampled from very high and very low susceptible areas respectively, the LSM performed better than all the other methods. Sampling landslides from very high susceptible areas only (dataset 2) does not perform well thereby committing misclassification error. The study demonstrated that the landslide and non-landslide data were obtained from very high and very low susceptibility; the predictive capability of the LSM increased significantly. Thus, the results demonstrate the effectiveness of the proposed ensemble approach in providing precise delineation of landslide zones, facilitating informed decision-making for land and hazard management.
Keywords: Landslide susceptibility; Non-landslide; Random forest; Support vector machine.
© 2024. The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature.