Abstract Frequency-Based Views to Pattern Collections.pdf
文本预览下载声明
Frequency-Based Views to Pattern Collections
Taneli Mielika?inen
Department of Computer Science
P.O.Box 26 (Teollisuuskatu 23)
FIN-00014 University of Helsinki, Finland
Taneli.Mielikainen@cs.helsinki.fi
Abstract
Finding frequently occurring patterns from data sets is
a central computational task in data mining. In this
paper we suggest to focus on pattern frequencies. We
advocate frequency simplifications as a complementary
approach to structural constraints on patterns. As a
special case of the frequency simplifications, we consider
discretizing the frequencies. We analyze the worst
case error of certain discretization functions and give
efficient algorithms minimizing several error functions.
In addition, we show that the discretizations can be used
to find small approximate condensed representations for
the frequent patterns.
1 Introduction
Finding interesting patterns from data sets is the fun-
damental problem in data mining [24, 25, 34]. This
problem is known as the pattern discovery problem.
The interestingness of a pattern depends (at least)
on the data set and the objective of the data analysis.
While surprising patterns could be interesting one day,
some other day it might be more interesting to find
regularities of the data.
Pattern discovery problem can be formulated as
follows: Given a data set d from a set D of possible data
sets, a pattern class P, and an interestingness predicate
q : P ×D → {0, 1},
find all interesting patterns, i.e., patterns p ∈ P
such that q(p, d) = 1. (The set D of possible data
sets could be e.g. the set of all binary matrices and
thus d would be some binary matrix.) Unfortunately
defining the interestingness predicate (or even its vague
approximations) is highly nontrivial.
In practice the problem of inventing the interesting-
ness predicate has been partly postponed by relaxing
the interestingness predicate to an interestingness mea-
sure. An interestingness measure can be defined as a
mapping
φ : P ×D → [0, 1],
where the inte
显示全部