文档详情

对kmeans聚类算法的改进.pdf

发布：2015-09-07约字共3页下载文档

文本预览下载声明

维普资讯与带业．{I息信带带对k—means聚类算法的改进彗袁方 1,2 孟增辉于戈 (东北大学信息科学与工程学院，沈阳 110004) (河北大学数学与计算机学院，保定 071002) E—mail：yuanfang@mail．hbu．edu．an 摘要提出了一种k-means聚类算法中寻找初始聚类中心的新方法。算法首先计算样本间的距离，然后根据样本点之间的距离寻找有可能是一类的数据，依据这些样本点形成初始聚类中心，从而得到较好的聚类结果。实验表明，改进后的方法相对于随机选取初始聚类中心具有较高的准确率。关键词 k-means聚类算法聚类模式识别文章编号 1002—8331一(2004)36—0177—02 文献标识码 A 中图分类号 TP181 Improved k-meansClusteringAlgorithm YuanFang Meng Zenghui Yu Ge (CollegeofInformationScienceandEngineering，NortheasternUniversity，Shenyang110004) (CollegeofMathematicsandComputerScience，HebeiUniversity，Baoding071002) Abstract： Thispaper investigates the standard k—means clustering algorithm nad gives an improved algorithm by selectingbetterinitialcentersthatthealgorithmbeginswith．Firstthepapercomputesdistancesbetweendatapoints；then triesto find outthe data pointsthatraesimilar；finally constructsinitial centers according to these datapoints．In the experiment，authors find thatdifferentdata points lead to differentresults．Ifpeople Cna find initial centers thatale consistentwith thedistribution ofdata，people could getgood clusterings．According to the experiment，theimproved k— meansClusteringAlgorithm can gethigheraccuracy． Keywords：k-meansclusteringalgorithm，clustering，patternrecognition 1 引言 2 基本思想数据聚类是发现事物自然分类的一种方法，也是机器学习文献 5【1中给出了k-means的算法过程：和模式识别的一个重要研究领域。对于把 n个 d一维数据分成k 输入：聚类个数k，以及包含n个数据对象的样本集。个集合的问题，要得到全局最优解的算法是 NP-hard问题Ⅲ。为输出：满足方差最小标准的个聚类。了得到分类人们提出了许多种聚类算法，如 k-means算法[21、高处理流程：斯最大期望算法nI、k—harmonic算法l【I、CUREt3~和 CLIQUEt41等。 (1)从 n个数据对象中任意选择 k个对象作为初始聚类中心；这些算

显示全部

相似文档