Comparing machine learning and knowledge discovery in databases an application to knowledge.pdf
文本预览下载声明
1Comparing Machine Learning and Knowledge Discovery in DataBases : An
Application to Knowledge Discovery in Texts
Yves Kodratoff
CNRS, LRI Bat. 490 Univ. Paris-Sud, F - 91405 Orsay Cedex
yk@lri.fr
Text associated to a course delivered at the ECCAI summer course, Crete July 1999.
To be published by Springer-Verlag
in the Lecture Notes on AI (LNAI) - Tutorials series, 2000.
SUMMARY :
This presentation has two goals.
The first goal is to compare ML and Knowledge Discovery in Data (KDD, also often
called Data Mining, DM) in order to insist on how much they actually differ In order to make
my ideas somewhat easier to understand, and as an illustration, I will include a description of
several research topics that I find relevant to KDD and to KDD only.
The second goal is to show that the definition I give of KDD can be almost directly
applied to text analysis, and that will lead us to a very restrictive definition of Knowledge
Discovery in Texts (KDT). I will provide a compelling example of a real-life set of rules
obtained by what I call KDT techniques.
1. INTRODUCTION
KDD is better known by the oversimplified name of Data Mining (DM). Actually, most
academics are rather interested by DM which develops methods for extracting knowledge
from a given set of data. Industrialists and experts should be more interested in KDD which
comprises the whole process of data selection, data cleaning, transfer to a DM technique,
applying the DM technique, validating the results of the DM technique, and finally interpreting
them for the user. In general, this process is a cycle that improves under the criticism of the
expert.
Machine Learning (ML) and KDD have in common a very strong link : they both
acknowledge the importance of induction as a normal way of thinking, while other scientific
fields are reluctant to accept it, to say the least. We shall first explore this common point. We
believe that this reluctance relies on a misuse of apparent contradictions inside the theory of
confi
显示全部