文档详情

Data Quality Mining – Making a Virtue of Necessity.pdf

发布:2017-04-09约2.59万字共6页下载文档
文本预览下载声明
DATA QUALITY MINING – Making a Virtue of Necessity – Jochen Hipp DaimlerChrysler AG, Research Technology, Ulm, Germany Wilhelm-Schickard-Institute, University of Tu?bingen, Germany Email: jochen.hipp@ Ulrich Gu?ntzer Wilhelm-Schickard-Institute, University of Tu?bingen, Germany Email: guentzer@informatik.uni-tuebingen.de Udo Grimmer DaimlerChrysler AG, Research Technology, Ulm, Germany Email: udo.grimmer@ Abstract In this paper we introduce data quality mining (DQM) as a new and promising data mining approach from the academic and the business point of view. The goal of DQM is to employ data mining methods in order to detect, quantify, explain and correct data quality deficiencies in very large databases. Data quality is crucial for many applications of knowledge discovery in databases (KDD). So a typical application scenario for DQM is to support KDD projects, especially during the initial phases. Moreover, improving data quality is also a burning issue in many areas outside KDD. That is, DQM opens new and promising application fields for data mining methods outside the field of pure data analysis. To give a first impression of a concrete DQM approach, we describe how to employ association rules for the purpose of DQM. 1 MOTIVATION Since the early nineties knowledge discovery in databases (KDD) has developed to a well established field of research. Over the years new methods to- gether with scalable algorithms have been developed to efficiently analyze even very large datasets. How- ever, KDD has not been broadly established outside academia. Although there are numerous success sto- ries of practical applications today many of the peo- ple concerned with KDD seem to be somehow disil- lusioned. “Crossing the chasm” as Rakesh Agrawal formulates in (Agrawal, 1999) is overdue. Other- wise KDD might end like many promising technolo- gies that were welcomed enthusiastically but finally missed to satisfy the expectations they generated. The research community is aware t
显示全部
相似文档