文档详情

Introduction to data mining-lecture notes chap09_advanced_cluster_analysis.pdf

发布:2017-04-10约1.28万字共37页下载文档
文本预览下载声明
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar ? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 ? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2 Hierarchical Clustering: Revisited Creates nested clusters Agglomerative clustering algorithms vary in terms of how the proximity of two clusters are computed MIN (single link): susceptible to noise/outliers MAX/GROUP AVERAGE: may not work well with non-globular clusters – CURE algorithm tries to handle both problems Often starts with a proximity matrix – A type of graph-based algorithm ? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3 Uses a number of points to represent a cluster Representative points are found by selecting a constant number of points from a cluster and then “shrinking” them toward the center of the cluster Cluster similarity is the similarity of the closest pair of representative points from different clusters CURE: Another Hierarchical Approach × × ? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4 CURE Shrinking representative points toward the center helps avoid problems with noise and outliers CURE is better able to handle clusters of arbitrary shapes and sizes ? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5 Experimental Results: CURE Picture from CURE, Guha, Rastogi, Shim. ? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6 Experimental Results: CURE Picture from CURE, Guha, Rastogi, Shim. (centroid) (single link) ? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7 CURE Cannot Handle Differing Densities Original Points CURE ? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8 Graph-Based Clustering Graph-Based clustering uses the pro
显示全部
相似文档