Introduction to data mining-lecture notes chap09_advanced_cluster_analysis.pdf
文本预览下载声明
Data Mining
Cluster Analysis: Advanced Concepts
and Algorithms
Lecture Notes for Chapter 9
Introduction to Data Mining
by
Tan, Steinbach, Kumar
? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1
? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2
Hierarchical Clustering: Revisited
Creates nested clusters
Agglomerative clustering algorithms vary in terms
of how the proximity of two clusters are computed
MIN (single link): susceptible to noise/outliers
MAX/GROUP AVERAGE:
may not work well with non-globular clusters
– CURE algorithm tries to handle both problems
Often starts with a proximity matrix
– A type of graph-based algorithm
? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3
Uses a number of points to represent a cluster
Representative points are found by selecting a constant
number of points from a cluster and then “shrinking” them
toward the center of the cluster
Cluster similarity is the similarity of the closest pair of
representative points from different clusters
CURE: Another Hierarchical Approach
× ×
? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4
CURE
Shrinking representative points toward the center
helps avoid problems with noise and outliers
CURE is better able to handle clusters of arbitrary
shapes and sizes
? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5
Experimental Results: CURE
Picture from CURE, Guha, Rastogi, Shim.
? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6
Experimental Results: CURE
Picture from CURE, Guha, Rastogi, Shim.
(centroid)
(single link)
? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7
CURE Cannot Handle Differing Densities
Original Points CURE
? Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8
Graph-Based Clustering
Graph-Based clustering uses the pro
显示全部