文档详情

An investigation of computational and informational limits in gaussian mixture clustering.pdf

发布:2017-04-08约3.78万字共9页下载文档
文本预览下载声明
Department of Computer Science 6 King’s College Rd, Toronto University of Toronto M5S 3G4, Canada fax: +1 416 978 1455 February 10, 2006 UTML TR 2006–002 An Investigation of Computational and Informational Limits in Gaussian Mixture Clustering Nathan Srebro Department of Computer Science, University of Toronto Gregory Shakhnarovich Department of Computer Science, Brown University Sam Roweis Department of Computer Science, University of Toronto Abstract We investigate under what conditions clustering by learning a mixture of spherical Gaussians is (a) computationally tractable; and (b) statistically pos- sible. We show that using principal component projection greatly aids in recovering the clustering using EM; present empirical evidence that even us- ing such a projection, there is still a large gap between the number of samples needed to recover the clustering using EM, and the number of samples needed without computational restrictions; and characterize the regime in which such a gap exists. An Investigation of Computational and Informational Limits in Gaussian Mixture Clustering Nathan Srebro nati@ Department of Computer Science, University of Toronto Gregory Shkhnarovich gregory@ Department of Computer Science, Brown University Sam Roweis roweis@ Department of Computer Science, University of Toronto Abstract We investigate under what conditions clus- tering by learning a mixture of spherical Gaussians is (a) computationally tractable; and (b) statistically possible. We show that using principal component projection greatly aids in recovering the clustering using EM; present empirical evidence that even using such a projection, there is still a large gap between the number of samples needed to re- cover the clustering using EM, and the num- ber of samples needed without computational restrictions; and characterize the regime in which such a gap exists. 1. Introduction Consider clustering a collection of points by fitting a mixture-of-Gaussians model to the data.
显示全部
相似文档