文档详情

Discovery in Databases or KDD (Piatesky-.pdf

发布:2017-04-10约1.96万字共31页下载文档
文本预览下载声明
10 May 2002 Roberto Innocente 1 Data mining: rule mining algorithms Roberto Innocente rinnocente@ 10 May 2002 Roberto Innocente 2 Introduction /1 Data mining also known as Knowledge Discovery in Databases or KDD (Piatesky- Shapiro 1991), is the process of extracting useful hidden information from very large databases in an unsupervised manner. 10 May 2002 Roberto Innocente 3 Introduction /2 Central themes of data mining are: Classification Cluster analysis Associations analysis Outlier analysis Evolution analysis 10 May 2002 Roberto Innocente 4 ARM /1 (association rules mining)  Formally introduced in 1993 by Agrawal, Imielinski and Swami (AIS) in connection with market basket analysis  Formalizes statements of the form: What is the percentage of customers that together with cheese buy beer ? 10 May 2002 Roberto Innocente 5 ARM /2  We have a set of items I={i1,i2,..}, and a set of transaction T={t1,t2..}. Each transaction (like a supermarket bill) is a set of items (or better as it is called an itemset)  If U and V are disjoint itemsets, we call support of U=V the fraction of transactions that contain U ∪ V and we indicate this with s(U=V)  We say that an itemset is frequent if its support is greater than a chosen threshold called minsupp.  If A and B are disjoint itemsets, we call confidence of A=B and indicate with c(A=B), the fraction of transactions containing A that contain also B. This is also called the Bayesian or conditional probability p(B|A).  We say that a rule is strong if its confidence is greater than a threshold called minconf. 10 May 2002 Roberto Innocente 6 ARM /3 ARM can then be formulated as: Given a set I of items and a set T of transactions over I, produce in an automated manner all association rules that are more than x% frequent and more than y% strong. 10 May 2002 Roberto Innocente 7 ARM /4 On the right we have 6 transactions T={1,2,3,4,5,6} on a set of 5 items I={A,B,C,D,E} The itemset BC is present
显示全部
相似文档