Efficient Unsupervised Parameter Estimation for One-Class Support Vector Machines

IEEE Trans Neural Netw Learn Syst. 2018 Oct;29(10):5057-5070. doi: 10.1109/TNNLS.2017.2785792. Epub 2018 Jan 23.

Abstract

One-class support vector machines (OCSVMs) are very effective for semisupervised anomaly detection. However, their performance strongly depends on the settings of their hyperparameters, which has not been well studied. Moreover, unavailability of a clean training set that only comprises normal data in many real-life problems has given rise to the application of OCSVMs in an unsupervised manner. However, it has been shown that if the training set includes anomalies, the normal boundary created by OCSVMs is prone to skew toward the anomalies. This problem decreases the detection rate of anomalies and results in poor performance of the classifier. In this paper, we propose a new technique to set the hyperparameters and clean suspected anomalies from unlabelled training sets. The proposed method removes suspected anomalies based on a $K$ -nearest neighbors technique, which is then used to directly estimate the hyperparameters. We examine several benchmark data sets with diverse distributions and dimensionality. Our findings suggest that on the examined data sets, the proposed technique is roughly 70 times faster than supervised parameter estimation via grid-search and cross validation, and one to three orders of magnitude faster than broadly used semisupervised and unsupervised parameter estimation methods for OCSVMs. Moreover, our method statistically outperforms those semisupervised and unsupervised methods and its accuracy is comparable to supervised grid-search and cross validation.