Data Quality Mining – Making a Virtue of Necessity.pdf
文本预览下载声明
DATA QUALITY MINING
– Making a Virtue of Necessity –
Jochen Hipp
DaimlerChrysler AG, Research Technology, Ulm, Germany
Wilhelm-Schickard-Institute, University of Tu?bingen, Germany
Email: jochen.hipp@
Ulrich Gu?ntzer
Wilhelm-Schickard-Institute, University of Tu?bingen, Germany
Email: guentzer@informatik.uni-tuebingen.de
Udo Grimmer
DaimlerChrysler AG, Research Technology, Ulm, Germany
Email: udo.grimmer@
Abstract
In this paper we introduce data quality mining (DQM) as a new and promising data mining approach from
the academic and the business point of view. The goal of DQM is to employ data mining methods in order to
detect, quantify, explain and correct data quality deficiencies in very large databases. Data quality is crucial for
many applications of knowledge discovery in databases (KDD). So a typical application scenario for DQM is to
support KDD projects, especially during the initial phases. Moreover, improving data quality is also a burning
issue in many areas outside KDD. That is, DQM opens new and promising application fields for data mining
methods outside the field of pure data analysis. To give a first impression of a concrete DQM approach, we
describe how to employ association rules for the purpose of DQM.
1 MOTIVATION
Since the early nineties knowledge discovery in
databases (KDD) has developed to a well established
field of research. Over the years new methods to-
gether with scalable algorithms have been developed
to efficiently analyze even very large datasets. How-
ever, KDD has not been broadly established outside
academia. Although there are numerous success sto-
ries of practical applications today many of the peo-
ple concerned with KDD seem to be somehow disil-
lusioned. “Crossing the chasm” as Rakesh Agrawal
formulates in (Agrawal, 1999) is overdue. Other-
wise KDD might end like many promising technolo-
gies that were welcomed enthusiastically but finally
missed to satisfy the expectations they generated.
The research community is aware t
显示全部