An Adaptative Eps Parameter of DBSCAN Algorithm for Identifying Clusters with Heterogeneous Density
Abstract
Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is one of the most important data clustering algorithms. Its importance lies in the fact that it can recognize clusters of arbitrary shapes and is not affected by noise in the data. To identify clusters, DBSCAN needs to specify two parameters: the parameter Eps, representing the radius of the circle to identify the neighborhood of each observation. The second parameter of DBSCAN is minpts, which represents the minimum size of the neighborhood for a point to be a seed in a cluster and not a noise. However, the task of determining the adequate value of Eps parameter is not easy and represents a major issue when applying DBSCAN since the accuracy of this algorithm highly depends on the values of its parameters. In this paper, we present a new version of DBSCAN where we need only to specify the minpts parameter, then we use k-nearest neighbors (kNN) algorithm to calculate the value of Eps automatically for every point in the data. This technique not only reduces the number of parameters by eliminating Eps which has been very difficult to determine, but also gives DBSCAN the ability to detect clusters with heterogeneous density. The experimental results show that the proposed method is more efficient and more accurate than the original DBSCAN algorithm.