文档详情

Abstract Frequency-Based Views to Pattern Collections.pdf

发布:2017-04-08约4.23万字共11页下载文档
文本预览下载声明
Frequency-Based Views to Pattern Collections Taneli Mielika?inen Department of Computer Science P.O.Box 26 (Teollisuuskatu 23) FIN-00014 University of Helsinki, Finland Taneli.Mielikainen@cs.helsinki.fi Abstract Finding frequently occurring patterns from data sets is a central computational task in data mining. In this paper we suggest to focus on pattern frequencies. We advocate frequency simplifications as a complementary approach to structural constraints on patterns. As a special case of the frequency simplifications, we consider discretizing the frequencies. We analyze the worst case error of certain discretization functions and give efficient algorithms minimizing several error functions. In addition, we show that the discretizations can be used to find small approximate condensed representations for the frequent patterns. 1 Introduction Finding interesting patterns from data sets is the fun- damental problem in data mining [24, 25, 34]. This problem is known as the pattern discovery problem. The interestingness of a pattern depends (at least) on the data set and the objective of the data analysis. While surprising patterns could be interesting one day, some other day it might be more interesting to find regularities of the data. Pattern discovery problem can be formulated as follows: Given a data set d from a set D of possible data sets, a pattern class P, and an interestingness predicate q : P ×D → {0, 1}, find all interesting patterns, i.e., patterns p ∈ P such that q(p, d) = 1. (The set D of possible data sets could be e.g. the set of all binary matrices and thus d would be some binary matrix.) Unfortunately defining the interestingness predicate (or even its vague approximations) is highly nontrivial. In practice the problem of inventing the interesting- ness predicate has been partly postponed by relaxing the interestingness predicate to an interestingness mea- sure. An interestingness measure can be defined as a mapping φ : P ×D → [0, 1], where the inte
显示全部
相似文档