文档详情

改进的optics算法及其在文本聚类中的应用_曾依灵.pdf

发布:2017-08-23约1.37万字共6页下载文档
文本预览下载声明
22   1 Vol.22, No.1 2008 1 JOURNAL OF CHINESE INFORMATION PROCESSING Jan., 2008 :1003-0077(2008)01-0051-05 OPTICS 1, 2 1 1 , , (1. , 100080;2. , 100080)  :基于密度的OPTICS 聚类算法以可视化的结果输出方式直观呈现语料结构, 但由于其结果组织 略在处 理稀疏点时的局限性, 算法实际性能未能得到充分发挥。 本文针对此缺陷提出一种有效的结果重组织 略以辅助 稀疏点的重新定位, 并针对文本领域的特点改变距离度量方法, 形成了 OPTICS-Plus 文本聚类算法。在真实文本 分类语料上的实验表明, 我们的结果重组织 略能够辅助算法产生更为清晰反映语料结构的可达图, 与 -means 算法的比较则证实了OPTICS-Plus 具有较为良好的聚类性能。 :计算机应用;中文信息处理;OPTICS 算法;密度聚类;文本挖掘 :TP391    :A OPTICS-Plus for Text Clustering 1, 2 1 1 ZENG Yi-ling , XU Hong-bo , BAI Shuo (1.Research Center of Information Intelligence and Information Security, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China; 2.Graduate University, Chinese Academy of Sciences, Beijing 100080, China) Abstract:As a density-based clustering algorithm, OPTICS is capable of showing the intrinsic corpus structure within a visual plot.However, due to the improper strategy in organizing the points in sparse space, the algorithm does not reach its best performance.To solve this problem, we proposed an effective result-reorganization strategy for reordering those sparse points.Based on this strategy, a new text clustering algorithm named OPTICS-Plus was proposed according to the characteristic of text mining fields.Experiment on FuDan text classification corpus show s that our result-reorganization strategy is capable of helping the reachability plots generating clearer view s of corpus structures.Furthermore, a comparison with -means proves that the clustering performance of OPTICS-Plus is
显示全部
相似文档