Abstract:
To mitigate the influence of non-landslide sample selection on landslide susceptibility assessment, this study introduces a farthest point sampling(FPS)strategy for optimizing the spatial distribution of non-landslide samples used in model training. Four machine learning methods—logistic regression, support vector machine, Naïve Bayes classification, and Gaussian process classification—were employed to evaluate the effectiveness of the proposed approach. Landslide susceptibility models were constructed using landslide samples alongside non-landslide samples selected via the FPS strategy, and the results were compared with those obtained using conventional random sampling. A systematic comparison was conducted through 100 randomized experimental iterations. The results showed that when using conventional random sampling, the
AUC values for the logistic regression model across 100 iterations ranged from 0.52 to 0.98, with a mean value of approximately 0.80. In contrast, the proposed FPS method yielded
AUC values ranging from 0.72 to 1.00, with the range of variation reduced by 39.1% and the mean
AUC increased by 12.5% to around 0.90. Improvements in predictive performance and uncertainty were also consistently observed across the other three models. These findings demonstrate that the farthest point sampling method effectively enhances both the accuracy and reliability of machine learning-based landslide susceptibility assessments.