数据挖掘工具(weka教程).ppt
文本预览下载声明
示例 // load data ArffLoader loader = new ArffLoader(); loader.setFile(new File(/some/where/data.arff)); Instances structure = loader.getStructure(); structure.setClassIndex(structure.numAttributes() - 1); ? // train NaiveBayes NaiveBayesUpdateable nb = new NaiveBayesUpdateable(); nb.buildClassifier(structure); Instance current; while ((current = loader.getNextInstance(structure)) != null) nb.updateClassifier(current); 示例 import weka.classifiers.Evaluation; import java.util.Random; ... Evaluation eval = new Evaluation(newData); eval.crossValidateModel(tree, newData, 10, new Random(1)); 示例 import weka.core.Instances; import weka.classifiers.Evaluation; import weka.classifiers.trees.J48; ... Instances train = ... // from somewhere Instances test = ... // from somewhere // train classifier Classifier cls = new J48(); cls.buildClassifier(train); // evaluate classifier and print some statistics Evaluation eval = new Evaluation(train); eval.evaluateModel(cls, test); System.out.println(eval.toSummaryString(\nResults\n======\n, false)); 从evaluation中获取结果的一些方法: nominal class correct() - number of correctly classified instances (see also incorrect()) pctCorrect() - percentage of correctly classified instances (see also pctIncorrect()) kappa() - Kappa statistics numeric class correlation Coefficient() - correlation coefficient general meanAbsoluteError() - the mean absolute error rootMeanSquaredError() - the root mean squared error unclassified() - number of unclassified instances pctUnclassified() - percentage of unclassified instances 以下示例装入未标识类别的数据/some/where/unlabeled.arff ,并用已训练好的分类器tree对每条数据进行标识,最后结果保存为/some/where/labeled.arff import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.FileReader; import java.io.FileWriter; import weka.core.Instances; ... // load unlabeled data Instances unlabeled = new Instances( new BufferedReader( new FileReader(/some/where/unlabeled.arff))); ? // set class at
显示全部