文档详情

数据挖掘十大经典算法.pdf

发布:2025-03-28约1.19万字共17页下载文档
文本预览下载声明

theIEEEInternationalConferenceonDataMining

(ICDM)200612C4.5,k-Means,

SVM,Apriori,EM,PageRank,AdaBoost,kNN,NaiveBayes,andCART.

18

1.C4.5

C4.5,ID3.

C4.5ID3ID3

1)

2)

3)

4)

C4.5

2.Thek-meansalgorithmK-Means

k-meansalgorithmnk

kn

3.Supportvectormachines

SupportVectorMachineSV

SVM

C.J.CBurges

vanderWaltBarnard

4.TheApriorialgorithm

Apriori

5.(EM)

EMExpectation–Maximization

probabilistic

LatentVariabl

DataClustering

6.PageRank

PageRankGoogle20019

Google•LarryPagePageRankpage

PageRank

PageRank

“”——

PageRank

——

7.AdaBoost

Adaboost(

)(

)

8.kNN:k-nearestneighborclassification

K(k-NearestNeighborKNN)

k()

9.NaiveBayes

(Decision

TreeModel)NaiveBayesianModelNBC

NBC

NBC

NBC

NBC

NBC

NBC

10.CART:

CART,ClassificationandRegressionTrees

(1)C4.5

,

“”

1)

2)

3)

ID3QuilanC4.5

C4.5ID3ID3.

C4.5ID3ID3

1)

2)

3)

4)

C4.5

C4.5

C4.5,ID3.

.

:

:.

:.

:.

§4.3.2ID3

1.CLS

1)C={E},E,.

2)IFCe

YES.

ELSE,Fi={V1,V2,V3,Vn}

CNC1,C2,C3,,Cn

3)Ci.

2.ID3

1)CW().

2)CLSWDT().

3)CDT(DT).

4)W,W.

5)2)4),.

:

,.

,

P=freq(Cj,S)/|S|;

INFO(S)=-SUM(P*LOG(P));SUM()

j1n.

Gain(X)=Info(X)-Infox(X);

Infox(X)=SUM((|Ti|/|T|)*Info(X);

,ID3,

(Gain(S)).

§4.3.3:ID3

1..

2..

3..

§4.3.4:C4.5ID3:

1.,.

Split_Infox(X)=-SUM((|T|/|Ti|)*LOG(|Ti|/|T|)

);

Gainratio(X)=Gain(X)/SplitInfox(X);

2..

1)

,C4.5ID3

,.

2),?,

3.,.

(2)k-means

k-meansa

显示全部
相似文档