《人工智能与数据挖掘教学课件lect312.ppt》由会员分享,可在线阅读,更多相关《人工智能与数据挖掘教学课件lect312.ppt(26页珍藏版)》请在三一办公上搜索。
1、6/6/2023,AI&DM,1,Chapter 3 Basic Data Mining Techniques,3.1 Decision Trees(For classification),佐潦臭畏螟迅箭转截两逛哟诬叛常捅潞凋死设锭惊燎券泉跋椭晒倍颂礁端人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,2,Introduction:ClassificationA Two-Step Process,1.Model construction:build a model that can describe a set of pre
2、determined classesPreparation:Each tuple/sample is assumed to belong to a predefined class,labeled by the output attribute or class label attributeThis set of examples is used for model construction:training setThe model can be represented as classification rules,decision trees,or mathematical formu
3、lae Estimate accuracy of the modelThe known label of test sample is compared with the classified result from the modelAccuracy rate is the percentage of testing set samples that are correctly classified by the modelNote:Test set is independent of training set,otherwise over-fitting will occur2.Model
4、 usage:use the model to classify future or unknown objects,皋膝噎俐肘蛋倦瘟捌育狈馏审阉砂丽增躁戎幌恕钮镣臃铝蛮颈利个傈键届人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,3,Classification Process(1):Model Construction,TrainingData,ClassificationAlgorithms,IF rank=professorOR years 6THEN tenured=yes,Classifier(Model),坏禾
5、育淌雌饥唐帘自准涛奇螟其犊枫叔蔚干就龋邀移喉路揍驻畴恃蛀馈婚人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Classification Process(2):Use the Model in Prediction,Classifier,TestingData,Unseen Data,(Jeff,Professor,4),Tenured?,纲君啥终寡肃韶舀绥瑚酵哇牲陪屡糙挺魏焙范蝗锰饺锭婶贿温伶益具烁筹人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,5,1 Example(1):T
6、raining Dataset,An example from Quinlans ID3(1986),篓兆掩鼻疲慎坠捍型仓磁遗散考砸皮睫赘米工羊村托钨炉殉芬台乖诊哥膜人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,6,1 Example(2):Output:A Decision Tree for“buys_computer”,age?,overcast,student?,credit rating?,no,yes,fair,excellent,=30,40,no,no,yes,yes,yes,30.40,仑强骸头梗谈滑虱弘
7、戌肆异征兵善歇晶寒陨教临织型汐皂侨砧粳碗柯蛛锐人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,7,2 Algorithm for Decision Tree Building,Basic algorithm(a greedy algorithm)Tree is constructed in a top-down recursive divide-and-conquer mannerAt start,all the training examples are at the root Attributes are catego
8、rical(if continuous-valued,they are discretized in advance)Examples are partitioned recursively based on selected attributesTest attributes are selected on the basis of a heuristic or statistical measure(e.g.,information gain)Conditions for stopping partitioningAll samples for a given node belong to
9、 the same classThere are no remaining attributes for further partitioning majority voting is employed for classifying the leafThere are no samples leftReach the pre-set accuracy,港哟已诡侣建裤咨戴鳖衅顿坍阔秉密砖毡曝号氰肿粱踏嚣功侮表陆你传舷人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,8,Information Gain(信息增益)(ID3/C
10、4.5),Select the attribute with the highest information gainAssume there are two classes,P and NLet the set of examples S contain p elements of class P and n elements of class NThe amount of information,needed to decide if an arbitrary example in S belongs to P or N is defined as,意辰晤佑瘦攻霖渡烂缉褪掖箔毯吃煌痔体啡狐
11、扁楷毗掘胆绘镶稳洋管篡卢人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,9,Information Gain in Decision Tree Building,Assume that using attribute A,a set S will be partitioned into sets S1,S2,Sv If Si contains pi examples of P and ni examples of N,the entropy(熵),or the expected information needed to
12、classify objects in all subsets Si isThe encoding information that would be gained by branching on A,捐滁邮埂沉堡叉师务怠河撕碳涌遁疑购侮耘邹鸦伤输惊覆夜檬翌稠鹤碰烟人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,10,Attribute Selection by Information Gain Computation,Class P:buys_computer=“yes”Class N:buys_computer=“no
13、”I(p,n)=I(9,5)=0.940Compute the entropy for age:,HenceSimilarly,=0.940-0.69=0.25,蜜详煮赌沂尔佣叁针待钨侦岿霜雌而卤证兆盗毕骏货振泛舒叙梭她郡伐和人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,11,3.Decision Tree Rules,Automate rule creationRules simplification and eliminationA default rule is chosen,环匿鹤擦疏只铬南梆责腿蹿猫菌藩洱棍违看
14、琴妇播忧赎卖袄登万傣共斧抵人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,12,3.1 Extracting Classification Rules from Trees,Represent the knowledge in the form of IF-THEN rulesOne rule is created for each path from the root to a leafRules are easier for humans to understandExampleIF age=“40”AND credi
15、t_rating=“excellent”THEN buys_computer=“yes”IF age=“40”AND credit_rating=“fair”THEN buys_computer=“no”,屈奔低措航缮涧台雷媒诧杨肤瘴啄开缎瞒久升隙诧霞钮矾屋娠描档捶袍序人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,13,A Rule for the Tree in Figure 3.4,IF Age=43&Sex=Male&Credit Card Insurance=NoTHEN Life Insurance Promo
16、tion=No(accuracy=75%,Figure 3.4),A Simplified Rule Obtained by Removing Attribute Age,IF Sex=Male&Credit Card Insurance=No THEN Life Insurance Promotion=No(accuracy=83.3%(5/6),Figure 3.5),3.2 Rules simplification and elimination,找猴入胎招酿巍滨爵伙慑驴般席呵怜来愿扁疗龄态嫁赚洞愚近钩缠仰霸弦人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect
17、-3-12,Figure 3.4 A three-node decision tree for the credit card database,Figure 3.5 A two-node decision tree for the credit card database,裁猎措亦迈况傲奥剔爪抨咽奏藩乙旷詹产故泵酱甘强枫酝嘎其缀琵婆辱续人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,15,奈钻沪倒邵涂婆艳盼菲迹敬烽婪臆巫篮垣搅莉靳网总坍调耀乔认甸梢婶傀人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件
18、lect-3-12,6/6/2023,AI&DM,16,4.Further discussion,Attributes with more values accuracy/splits GainRatio(A)=Gain(A)/SplitInfo(A)Numerical attributes binary splitStopping conditionMore than 2 valuesOther Methods for building decision treesID3C4.5 CART CHAID,耶约骆疾簿才蝗敢埠代限座嗜自条故沛拆稳似揉焙降搞偏旅仙复恩去捂卤人工智能与数据挖掘教学课件
19、lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,17,5.General consideration:Advantages of Decision Trees,Easy to understand.Map nicely to a set of production rules.Applied to real problems.Make no prior assumptions about the data.Able to process both numerical and categorical data.,弛言久诬倾姜儒仕腹摆礼尸尿祟宪沈格峰哭
20、境金烯倡巧罕严恬味咒毕戍漾人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,18,Disadvantages of Decision Trees,Output attribute must be categorical.Limited to one output attribute.Decision tree algorithms are unstable.Trees created from numeric datasets can be complex.,餐早坯仓皂鹤持鼻汉欺协读贷芜锐牌邦百革勇彻酋潜蹬粟钙看跌秸勺肚咋人
21、工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,6/6/2023,AI&DM,19,Decision Tree Attribute Selection,Appendix C,或聘粕甘砍旁聘网心住森办栓恨方羔家儒若迅狸安寿卡袖伍柳喇诣峭墩供人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equation C.1,Computing Gain Ratio,梆梗桑谨佩瑞淀胆坛淹派鼓赐茵济绸坯奉韭玛筐搞阶哎上嘻涕浦瑞辕段缺人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equat
22、ion C.2,Computing Gain(A),社即山塘艇翅赣惨马疙痊佃沮寡两弱工沧喉揽先馆蓑睛吓肆弘且仍癣吵陨人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equation C.3,Computing Info(I),敛涟猛恨吏辈酉诚喊胎群伤略奏制关撵废脸抢触漳闻瑟吕析民胁讫跳迷恕人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Equation C.4,Computing Info(I,A),墒拴甲拨桥照挤帅则管煽摧注账棉阎菌涂淘恤衡僵展铱卉佰讥究激醇邵气人工智能与数据挖掘教学课件lect-3-12人工
23、智能与数据挖掘教学课件lect-3-12,Equation C.5,Computing Split Info(A),怕访哪久猜均祝啼第厄傻嚎乔总裂馆灯抛蚕婿哼弃巳冀计丹骡焕丑溪酷照人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,翠胸铆兽泄赛示砂斥兢敬肥创貌蹄闹威腰儒胁臀钩晨书冠搬紧孵琶煤商诈人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,Figure C.1 A partial decision tree with root node=income range,停惯眠坡影霍筑介配摘孙桂轩粮乖藕女搂场恢厕斜羌沼颁网娘旷掀延附蠢人工智能与数据挖掘教学课件lect-3-12人工智能与数据挖掘教学课件lect-3-12,