《【精品】weka教程数据挖掘英文PPT课件.ppt》由会员分享,可在线阅读,更多相关《【精品】weka教程数据挖掘英文PPT课件.ppt(173页珍藏版)》请在三一办公上搜索。
1、Machine Learning with WEKA,8/26/2023,University of Waikato,2,WEKA:the bird,Copyright:Martin Kramer(mkramerwxs.nl),8/26/2023,University of Waikato,3,WEKA:the software,Machine learning/data mining software written in Java(distributed under the GNU Public License)Used for research,education,and applica
2、tionsComplements“Data Mining”by Witten&FrankMain features:Comprehensive set of data pre-processing tools,learning algorithms and evaluation methodsGraphical user interfaces(incl.data visualization)Environment for comparing learning algorithms,8/26/2023,University of Waikato,4,WEKA:versions,There are
3、 several versions of WEKA:WEKA 3.0:“book version”compatible with description in data mining bookWEKA 3.2:“GUI version”adds graphical user interfaces(book version is command-line only)WEKA 3.3:“development version”with lots of improvementsThis talk is based on the latest snapshot of WEKA 3.3(soon to
4、be WEKA 3.4),8/26/2023,University of Waikato,5,relation heart-disease-simplifiedattribute age numericattribute sex female,maleattribute chest_pain_type typ_angina,asympt,non_anginal,atyp_anginaattribute cholesterol numericattribute exercise_induced_angina no,yesattribute class present,not_presentdat
5、a63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,?,no,not_present.,WEKA only deals with“flat”files,Flat file inARFF format,8/26/2023,University of Waikato,6,relation heart-disease-simplifiedattribute age numericattribute sex fema
6、le,maleattribute chest_pain_type typ_angina,asympt,non_anginal,atyp_anginaattribute cholesterol numericattribute exercise_induced_angina no,yesattribute class present,not_presentdata63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal
7、,?,no,not_present.,WEKA only deals with“flat”files,numeric attribute,nominal attribute,8/26/2023,University of Waikato,7,8/26/2023,University of Waikato,8,8/26/2023,University of Waikato,9,8/26/2023,University of Waikato,10,Explorer:pre-processing the data,Data can be imported from a file in various
8、 formats:ARFF,CSV,C4.5,binaryData can also be read from a URL or from an SQL database(using JDBC)Pre-processing tools in WEKA are called“filters”WEKA contains filters for:Discretization,normalization,resampling,attribute selection,transforming and combining attributes,8/26/2023,University of Waikato
9、,11,8/26/2023,University of Waikato,12,8/26/2023,University of Waikato,13,8/26/2023,University of Waikato,14,8/26/2023,University of Waikato,15,8/26/2023,University of Waikato,16,8/26/2023,University of Waikato,17,8/26/2023,University of Waikato,18,8/26/2023,University of Waikato,19,8/26/2023,Univer
10、sity of Waikato,20,8/26/2023,University of Waikato,21,8/26/2023,University of Waikato,22,8/26/2023,University of Waikato,23,8/26/2023,University of Waikato,24,8/26/2023,University of Waikato,25,8/26/2023,University of Waikato,26,8/26/2023,University of Waikato,27,8/26/2023,University of Waikato,28,8
11、/26/2023,University of Waikato,29,8/26/2023,University of Waikato,30,8/26/2023,University of Waikato,31,8/26/2023,University of Waikato,32,Explorer:building“classifiers”,Classifiers in WEKA are models for predicting nominal or numeric quantitiesImplemented learning schemes include:Decision trees and
12、 lists,instance-based classifiers,support vector machines,multi-layer perceptrons,logistic regression,Bayes nets,“Meta”-classifiers include:Bagging,boosting,stacking,error-correcting output codes,locally weighted learning,8/26/2023,University of Waikato,33,8/26/2023,University of Waikato,34,8/26/202
13、3,University of Waikato,35,8/26/2023,University of Waikato,36,8/26/2023,University of Waikato,37,8/26/2023,University of Waikato,38,8/26/2023,University of Waikato,39,8/26/2023,University of Waikato,40,8/26/2023,University of Waikato,41,8/26/2023,University of Waikato,42,8/26/2023,University of Waik
14、ato,43,8/26/2023,University of Waikato,44,8/26/2023,University of Waikato,45,8/26/2023,University of Waikato,46,8/26/2023,University of Waikato,47,8/26/2023,University of Waikato,48,8/26/2023,University of Waikato,49,8/26/2023,University of Waikato,50,8/26/2023,University of Waikato,51,8/26/2023,Uni
15、versity of Waikato,52,8/26/2023,University of Waikato,53,8/26/2023,University of Waikato,54,8/26/2023,University of Waikato,55,8/26/2023,University of Waikato,56,8/26/2023,University of Waikato,57,8/26/2023,University of Waikato,58,8/26/2023,University of Waikato,59,8/26/2023,University of Waikato,6
16、0,8/26/2023,University of Waikato,61,8/26/2023,University of Waikato,62,8/26/2023,University of Waikato,63,8/26/2023,University of Waikato,64,8/26/2023,University of Waikato,65,8/26/2023,University of Waikato,66,8/26/2023,University of Waikato,67,8/26/2023,University of Waikato,68,8/26/2023,Universi
17、ty of Waikato,69,8/26/2023,University of Waikato,70,8/26/2023,University of Waikato,71,8/26/2023,University of Waikato,72,8/26/2023,University of Waikato,73,8/26/2023,University of Waikato,74,8/26/2023,University of Waikato,75,8/26/2023,University of Waikato,76,8/26/2023,University of Waikato,77,8/2
18、6/2023,University of Waikato,78,8/26/2023,University of Waikato,79,8/26/2023,University of Waikato,80,8/26/2023,University of Waikato,81,8/26/2023,University of Waikato,82,8/26/2023,University of Waikato,83,8/26/2023,University of Waikato,84,8/26/2023,University of Waikato,85,8/26/2023,University of
19、 Waikato,86,8/26/2023,University of Waikato,87,8/26/2023,University of Waikato,88,8/26/2023,University of Waikato,89,8/26/2023,University of Waikato,90,8/26/2023,University of Waikato,91,8/26/2023,University of Waikato,92,Explorer:clustering data,WEKA contains“clusterers”for finding groups of simila
20、r instances in a datasetImplemented schemes are:k-Means,EM,Cobweb,X-means,FarthestFirstClusters can be visualized and compared to“true”clusters(if given)Evaluation based on loglikelihood if clustering scheme produces a probability distribution,8/26/2023,University of Waikato,93,8/26/2023,University
21、of Waikato,94,8/26/2023,University of Waikato,95,8/26/2023,University of Waikato,96,8/26/2023,University of Waikato,97,8/26/2023,University of Waikato,98,8/26/2023,University of Waikato,99,8/26/2023,University of Waikato,100,8/26/2023,University of Waikato,101,8/26/2023,University of Waikato,102,8/2
22、6/2023,University of Waikato,103,8/26/2023,University of Waikato,104,8/26/2023,University of Waikato,105,8/26/2023,University of Waikato,106,8/26/2023,University of Waikato,107,8/26/2023,University of Waikato,108,Explorer:finding associations,WEKA contains an implementation of the Apriori algorithm
23、for learning association rulesWorks only with discrete dataCan identify statistical dependencies between groups of attributes:milk,butter bread,eggs(with confidence 0.9 and support 2000)Apriori can compute all rules that have a given minimum support and exceed a given confidence,8/26/2023,University
24、 of Waikato,109,8/26/2023,University of Waikato,110,8/26/2023,University of Waikato,111,8/26/2023,University of Waikato,112,8/26/2023,University of Waikato,113,8/26/2023,University of Waikato,114,8/26/2023,University of Waikato,115,8/26/2023,University of Waikato,116,Explorer:attribute selection,Pan
25、el that can be used to investigate which(subsets of)attributes are the most predictive onesAttribute selection methods contain two parts:A search method:best-first,forward selection,random,exhaustive,genetic algorithm,rankingAn evaluation method:correlation-based,wrapper,information gain,chi-squared
26、,Very flexible:WEKA allows(almost)arbitrary combinations of these two,8/26/2023,University of Waikato,117,8/26/2023,University of Waikato,118,8/26/2023,University of Waikato,119,8/26/2023,University of Waikato,120,8/26/2023,University of Waikato,121,8/26/2023,University of Waikato,122,8/26/2023,Univ
27、ersity of Waikato,123,8/26/2023,University of Waikato,124,8/26/2023,University of Waikato,125,Explorer:data visualization,Visualization very useful in practice:e.g.helps to determine difficulty of the learning problemWEKA can visualize single attributes(1-d)and pairs of attributes(2-d)To do:rotating
28、 3-d visualizations(Xgobi-style)Color-coded class values“Jitter”option to deal with nominal attributes(and to detect“hidden”data points)“Zoom-in”function,8/26/2023,University of Waikato,126,8/26/2023,University of Waikato,127,8/26/2023,University of Waikato,128,8/26/2023,University of Waikato,129,8/
29、26/2023,University of Waikato,130,8/26/2023,University of Waikato,131,8/26/2023,University of Waikato,132,8/26/2023,University of Waikato,133,8/26/2023,University of Waikato,134,8/26/2023,University of Waikato,135,8/26/2023,University of Waikato,136,8/26/2023,University of Waikato,137,8/26/2023,Univ
30、ersity of Waikato,138,Performing experiments,Experimenter makes it easy to compare the performance of different learning schemesFor classification and regression problemsResults can be written into file or databaseEvaluation options:cross-validation,learning curve,hold-outCan also iterate over diffe
31、rent parameter settingsSignificance-testing built in!,8/26/2023,University of Waikato,139,8/26/2023,University of Waikato,140,8/26/2023,University of Waikato,141,8/26/2023,University of Waikato,142,8/26/2023,University of Waikato,143,8/26/2023,University of Waikato,144,8/26/2023,University of Waikat
32、o,145,8/26/2023,University of Waikato,146,8/26/2023,University of Waikato,147,8/26/2023,University of Waikato,148,8/26/2023,University of Waikato,149,8/26/2023,University of Waikato,150,8/26/2023,University of Waikato,151,8/26/2023,University of Waikato,152,The Knowledge Flow GUI,New graphical user
33、interface for WEKAJava-Beans-based interface for setting up and running machine learning experimentsData sources,classifiers,etc.are beans and can be connected graphicallyData“flows”through components:e.g.,“data source”-“filter”-“classifier”-“evaluator”Layouts can be saved and loaded again later,8/2
34、6/2023,University of Waikato,153,8/26/2023,University of Waikato,154,8/26/2023,University of Waikato,155,8/26/2023,University of Waikato,156,8/26/2023,University of Waikato,157,8/26/2023,University of Waikato,158,8/26/2023,University of Waikato,159,8/26/2023,University of Waikato,160,8/26/2023,Unive
35、rsity of Waikato,161,8/26/2023,University of Waikato,162,Can continue this.,8/26/2023,University of Waikato,163,8/26/2023,University of Waikato,164,8/26/2023,University of Waikato,165,8/26/2023,University of Waikato,166,8/26/2023,University of Waikato,167,8/26/2023,University of Waikato,168,8/26/202
36、3,University of Waikato,169,8/26/2023,University of Waikato,170,8/26/2023,University of Waikato,171,8/26/2023,University of Waikato,172,8/26/2023,University of Waikato,173,Conclusion:try it yourself!,WEKA is available athttp:/www.cs.waikato.ac.nz/ml/wekaAlso has a list of projects based on WEKAWEKA
37、contributors:Abdelaziz Mahoui,Alexander K.Seewald,Ashraf M.Kibriya,Bernhard Pfahringer,Brent Martin,Peter Flach,Eibe Frank,Gabi Schmidberger,Ian H.Witten,J.Lindgren,Janice Boughton,Jason Wells,Len Trigg,Lucio de Souza Coelho,Malcolm Ware,Mark Hall,Remco Bouckaert,Richard Kirkby,Shane Butler,Shane Legg,Stuart Inglis,Sylvain Roy,Tony Voyle,Xin Xu,Yong Wang,Zhihai Wang,