基于密度的最佳聚类数确定方法l(Method l is determined based on optimum cluster number of density).doc
文本预览下载声明
基于密度的最佳聚类数确定方法l(Method l is determined based on optimum cluster number of density)
A method for determining optimal cluster number based on density
[abstract] determining the correct clustering number of data sets is a fundamental problem in cluster analysis. The commonly used clustering method is usually dependent on a particular clustering algorithm and is not effective in the case of cluster cluster in the data set. This paper puts forward a new index of the optimal number of clustering, which focuses on the analysis of geometric structure of the clusters, from the point of view of the data object distribution density measurement tightness and the degree of separation between classes in class. The index is not sensitive to noise and can identify the data set of submanifold group, the experimental results on real data and synthetic data show that the performance of the new index is superior to the use of other indicators.
[key words] cluster evaluation, cluster number, clustering effectiveness index
0 the introduction
Clustering is important in the research of data mining analysis method, its purpose is to gather data set object in class, which is similar, the same kind of objects not in the same object is different. So far, the researchers have proposed numerous clustering algorithms and have been widely used in the fields of business intelligence, graphic analysis and bioinformatics. As an unsupervised learning method, it is necessary to evaluate the clustering results obtained by learning. Because many clustering algorithms require the number of clusters for a given dataset, in practice this is usually not known. The clustering number of data sets is still one of the fundamental puzzles in the study of cluster analysis.
Cluster evaluation is used to evaluate the quality of clustering results, which is considered to be one of the important factors influencing the success of cluster analysis. Its location in the cluster analysis process is shown in figure 1. Clus
显示全部