基于不完全数据的异常挖掘算法研究.pdf
文本预览下载声明
第 41 卷 第 9 期 计 算 机 研 究 与 发 展 Vol 141 ,No 19
2004 年 9 月 J OURNAL OF COMPU TER RESEARCH AND DEV ELOPMEN T Sep 1 2004
基于不完全数据的异常挖掘算法研究
1 ,2 1 2
杨 虎 钟 震 程代杰
1 (重庆大学数理学院 重庆 400044)
2 (重庆大学计算机学院 重庆 400044)
(yh @cqu1edu1cn)
摘 要 异常挖掘是数据挖掘的重要研究内容之一 ,对于不完全数据会面对双重的困难1 首先将用于缺失数据填充的
EM 算法和 MI 算法推广到混合缺失情形 ,并根据 Weisberg 的不完全数据填充理论 ,提出了 RE 算法 ,然后通过将聚类分
析与向前搜索算法结合起来 ,获得了比单纯的向前搜索法更优越的算法1 最后 ,在上述填充算法的基础上探讨了不完全
数据的异常挖掘1 理论和实例分析均表明 ,基于不完全数据的异常挖掘算法是有效可行的1
关键词 缺失数据 ; EM 算法 ;聚类分析 ;异常挖掘
中图法分类号 TP311
An Outlier Mining Algorithm Based on the Imcomplete Data
YAN G Hu1 ,2 , ZHON G Zhen1 , and CHEN G Dai2Jie2
( College of Sciences , Chongqing U niversity , Chongqing 400044)
( College of Computer Science , Chongqing U niversity , Chongqing 400044)
Abstract Lots of deferent ways can be used to mine outliers , among which , the forward search algorithm
is one of the most important ways1 Since data are incomplete , data mining for outliers will encounter some
difficulties , and thus one needs to make an attempt on this field 1 First of all , one should think of the fill of
those lost data 1 Thinking of the mixed loss , one can simplif y the application of algorithm , such as EM algo2
rithm and M I algorithm 1 Furthermore , the more simple and facile RE algorithm is proposed 1 The actual fill
of data indicates the effect of the method1 When one uses the forward search algorithm to mine outliers , an2
alyzing the formation of EM algorithm , he can use the same method to estimate the unknown parameter 1
Even when making usual statistical outliers testing , the test statistics that relies on residuals can also be also
generated by EM algorithm 1 That means the result of data mining is more credible when one first completes
and then mines th
显示全部