扔休矣.PDF
文本预览下载声明
Polynomial Time Approximation Schemes for Geometric
k-Clustering
y z
Rafail Ostrovsky Yuval Rabani
July 1, 2001
Abstract
The Johnson-Lindenstrauss lemma states that n p oints in a high dimensional Hilb ert space can
b e emb edded with small distortion of the distances into an O (log n ) dimensional space by applying
a random linear transformation. We show that similar (though weaker) prop erties hold for certain
random linear transformations over the Hamming cub e. We use these transformations to solve NP-
hard clustering problems in the cub e as well as in geometric settings.
More sp ecically, we address the following clustering problem. Given n p oints in a larger set (for
example, Rd ) endowed with a distance function (for example, L2 distance), we would like to partition
the data set into k disjoint clusters, each with a \cluster center, so as to minimize the sum over all data
p oints of the distance b etween the p oint and the center of the cluster containing the p oint. The problem
is provably NP-hard in some high dimensional geometric settings, even for k = 2. We give p olynomial
d
time approximation schemes for this problem in several settings, including the binary cub e f0 ; 1g with
Hamming distance, and Rd either with L 1 distance, or with L2 distance, or with the square of L2
distance. In all these settings, the b est p
显示全部