R语言实验十.doc_三一办公31ppt.com

资源描述

《R语言实验十.doc》由会员分享，可在线阅读，更多相关《R语言实验十.doc（17页珍藏版）》请在三一办公上搜索。

1、精选优质文档-倾情为你奉上实验十判别分析、聚类分析与主成分分析【实验类型】验证性【实验学时】2 学时【实验目的】1、掌握判别分析的基本原理及三种重要的判别法的使用条件；2、掌握聚类分析的基本原理及聚类方法的划分方法；3、掌握主成分分析的使用条件及其与逐步回归、岭估计的区别与联系。【实验内容】1、判别分析的计算及其结果解释；2、聚类分析的计算及其结果解释；3、主成分分析的计算及其结果解释。【实验结果】第一部分、课件例题：（第九章 1-3 节中的所有例题）#例9.1 的求解：# 按矩阵和因子形式输入数据train-matrix(c(24.8, 24.1, 26.6, 23.5, 25.5, 27

2、.4, 22.1, 21.6,22.0, 22.8, 22.7, 21.5, 22.1, 21.4, -2.0, -2.4, -3.0, -1.9, -2.1, -3.1, -0.7, -1.4, -0.8, -1.6, -1.5, -1.0, -1.2, -1.3), ncol=2)sp-factor(rep(1:2, c(6,8), labels=c(Have, No)library(MASS); lda.sol-lda(train, sp) #判别分析tst-c(23.5, -1.6); predict(lda.sol, tst)$class #预测table(sp, predict(l

3、da.sol)$class) #回代预测的结果是无春旱；回代的结果说明，原本有6个春旱的年份只判断对了5个。# 使用二次判别函数qda.sol-qda(train, sp)predict(qda.sol, tst)$class #预测table(sp, predict(qda.sol)$class) #回代二次判别预测的结果是有春旱，这与前面线性判别的预测结果不同；从回代结果来看，可能是有春早更合理一些，因为二次判别的回代正确率是100%。#例9.2 的求解：iris #R自带数据集train -sample(1:150, 100) #随机选取100个作为训练样本library(MASS);z

4、1-lda(Species ., iris, prior = c(1,1,1)/3, subset = train)#Species . 表示对所有自变量；先验概率各为1/3(class1-predict(z1, iris-train, )$class)#iris-train,表示在预测函数中使用其余的50个样本作为待测样本sum(class1=iris$Species-train) #检验预测结果的准确性结果表明：此次预测结果全部正确（随着每次抽取样本的不同，线性判别预测结果的准确率会有相应的变化）# 使用二次判别函数qda()z2-lda(Species ., iris, prior =

5、c(1,1,1)/3, subset = train)(class2-predict(z2, iris-train, )$class) #加()表示显示结果sum(class2=iris$Species-train) #检验预测结果的准确性结果表明：二次判别函数的预测结果全部正确。#例9.3#输入数据, 生成距离结构x-c(1,2,6,8,11); dim(x)-c(5,1); d-dist(x)#生成系统聚类hc1-hclust(d, single); hc2-hclust(d, complete)hc3-hclust(d, median); hc4-hclust(d, average)#绘

6、出所有树形结构图, 并以22的形式绘在一张图上opar - par(mfrow = c(2, 2), omi=rep(0,4)plot(hc1,hang=-1); plot(hc2,hang=-1)plot(hc3,hang=-1); plot(hc4,hang=-1)par(opar)结果表明：不同的聚类方法所得的结果基本一样。# 例9.4 的求解：x-c(1.000, 0.846, 0.805, 0.859, 0.473, 0.398, 0.301, 0.382, 0.846, 1.000, 0.881, 0.826, 0.376, 0.326, 0.277, 0.277, 0.805,

7、0.881, 1.000, 0.801, 0.380, 0.319, 0.237,0.345, 0.859, 0.826, 0.801, 1.000, 0.436, 0.329, 0.327, 0.365, 0.473, 0.376, 0.380,0.436, 1.000, 0.762, 0.730, 0.629, 0.398, 0.326, 0.319, 0.329, 0.762, 1.000, 0.583, 0.577, 0.301, 0.277, 0.237, 0.327, 0.730, 0.583, 1.000, 0.539, 0.382, 0.415, 0.345,0.365, 0.

8、629, 0.577, 0.539, 1.000) #输入相关矩阵names-c( 身高, 手臂长, 上肢长, 下肢长, 体重, 颈围, 胸围, 胸宽)r-matrix(x, nrow=8, dimnames=list(names, names)#作系统聚类分析，as.dist()作用是将普通矩阵转化为聚类分析用的距离结构d-as.dist(1-r); hc-hclust(d); dend-as.dendrogram(hc)#写一段小程序, 其目的是在绘图命令中调用它, 使谱系图更好看nP-list(col=3:2, cex=c(2.0, 0.75), pch= 21:22, bg= c(li

9、ght blue, pink), lab.cex = 1.0, lab.col = tomato)addE - function(n) if(!is.leaf(n) attr(n,edgePar)-list(p.col=plum) attr(n,edgetext)-paste(attr(n,members),members) #画出谱系图.de - dendrapply(dend, addE) #对谱系图的所有节点应用函数par(mai=c(0.5, 0.5, 0.0, 0.0)plot(de, nodePar= nP) #绘制谱系图 #例9.5 的求解：#读取数据X-read.table(F

10、:/文档/大学课程/R语言/ch09/consume.dat)#生成距离结构, 作系统聚类d - dist(scale(X) #为消除数据在数量级的影响，对数据作标准化变换method=c(complete, average, centroid, ward.D)for(m in method) hc-hclust(d, m); class-cutree(hc, k=5) #分成5类 print(m); print(sort(class) #使用sort()排序# 使用plot() 绘制谱系图，用rect.hclust() 将地区分成5类类for(m in method) hc-hclust(

11、d, m) windows() #建立新窗口显示图形 plot(hc, hang=-1) re-rect.hclust(hc, k=5, border=red) print(m); print(re) 结果表明：不同的聚类方法所得的结果有所不同。结果表明：cutree()与rect.hclust()得到的聚类结果是相同的。#例9.6X-read.table(F:/文档/大学课程/R语言/ch09/consume.dat)km-kmeans(scale(X), 5, nstart = 20)kmsort(km$cluster) #对分类情况排序说明：因为kmeans是随机的确定k个点作为初始点

12、，所以动态聚类每一次的运行结果可能是不相同的。#例9.7 的求解：student-read.table(F:/文档/大学课程/R语言/ch09/student_body.dat) #读数据student.pr - princomp(student, cor = TRUE) #用相关矩阵作主成分分析，相当于对数据作标准化变换summary(student.pr, loadings=TRUE)说明：Standard deviation为主成分的标准差；Proportion of Variance为主成分的贡献率；Cumulative Proportion为主成分的累积贡献率；Loadings为载荷

13、矩阵。得到主成分与原始变量的线性关系为：# 作预测predict(student.pr)分析：从第 1 主成分的主成分得分来看：较小得分的样本是 25 号、 3 号和 5号，说明这几个学生身材魁梧，较大得分的样本是 11 号、 15 号和 29号，说明这几个学生身材瘦小；从第 2 主成分的主成分得分来看：较大得分的几个样本是 23 号、 19 号和 4 号，说明这几个学生属于 “ 细高 ” 型，而 17 号、 8 号和 2 号样本的主成分得分较小，说明这几个学生身材属于 “ 矮胖 ” 型。# 画出主成分的的碎石图par(mai=c(0.5, 0.8, 0.4, 0.2)screeplot(st

14、udent.pr,type=lines) #选择折线型结果表明：从第二个主成分之后，图线变化趋于平稳，因此可以选择前两个主成分做分析，与前面的结论一致。# 画出双坐标图par(mai=c(.8, .8, .5, .5)biplot(student.pr, scale =0.5)#例9.8 的求解：x-scan(F:/文档/大学课程/R语言/ch09/man_body.dat, nmax=120)#sum(1:15)=120,读取120个数据names-scan(F:/文档/大学课程/R语言/ch09/man_body.dat, what=character(0), skip=15) #读取变量

15、名R-matrix(1, nrow=16, ncol=16, dimnames=list(names, names)for (i in 2:16) for (j in 1:(i-1) Ri,j-x(i-2)*(i-1)/2+j;Rj,i-Ri,j #生成相关矩阵pr-princomp(covmat=R); load-loadings(pr) #主成份par(mai=c(.8, .8, .2, .1)plot(load,1:2, pch=19) #画散点图text(load,1, load,2, adj=c(-0.4, 0.3)#例9.9 ：对法国经济分析数据(见例7.8)作主成分回归分析。fr

16、ance-data.frame(x1 = c(149.3, 161.2, 171.5, 175.5,180.8, 190.7, 202.1, 212.4, 226.1, 231.9, 239.0), x2 = c(4.2, 4.1, 3.1, 3.1, 1.1, 2.2, 2.1, 5.6, 5.0, 5.1, 0.7), x3 = c(108.1,114.8, 123.2, 126.9, 132.1, 137.7, 146.0, 154.1, 162.3, 164.3,167.6), y=c(15.9, 16.4, 19.0, 19.1, 18.8, 20.4, 22.7, 26.5, 28

17、.1,27.6, 26.3)france.pr-princomp(x1+x2+x3, data=france, cor=TRUE)summary(france.pr) #作主成份分析结果表明：前两个主成分已达到99%的贡献率，因此选择第1和第2主成分。#例9.9 的求解：# 主成份回归pre-predict(france.pr) #计算样本的主成分得分france$z1-pre,1; france$z2-pre,2 #将第1和第2主成分得分存放在数据框france中lm.sol-lm(yz1+z2, data=france) #回归分析summary(lm.sol)回归系数和回归方程均通过检验

18、，且效果显著，即得到回归方程：#例9.9 的求解：# 作变换, 得到原坐标下的关系表达式beta-coef(lm.sol); A-loadings(france.pr),1:2x.bar-france.pr$center; x.sd-france.pr$scalecoef-A %*% beta2:3/x.sdbeta0 - beta1- x.bar %*% coefc(beta0, coef)得因变量与自变量的回归方程：第二部分、教材例题：#例 10.1.1 student-data.frame( X1=c(148, 139, 160, 149, 159, 142, 153, 150, 151

19、, 139, 140, 161, 158, 140, 137, 152, 149, 145, 160, 156, 151, 147, 157, 147, 157, 151, 144, 141, 139, 148), X2=c(41, 34, 49, 36, 45, 31, 43, 43, 42, 31, 29, 47, 49, 33, 31, 35, 47, 35, 47, 44, 42, 38, 39, 30, 48, 36, 36, 30, 32, 38), X3=c(72, 71, 77, 67, 80, 66, 76, 77, 77, 68, 64, 78, 78, 67, 66, 7

20、3, 82, 70, 74, 78, 73, 73, 68, 65, 80, 74, 68, 67, 68, 70), X4=c(78, 76, 86, 79, 86, 76, 83, 79, 80, 74, 74, 84, 83, 77, 73, 79, 79, 77, 87, 85, 82, 78, 80, 75, 88, 80, 76, 76, 73, 78)student.pr-princomp(student, cor=TRUE)summary(student.pr,loadings=TRUE)#例 10.2.1data(iris)attach(iris)names(iris)lib

21、rary(MASS)iris.lda - lda(Species Sepal.Length + Sepal.Width + Petal.Length + Petal.Width)iris.ldairis.pred=predict(iris.lda) $ classtable(iris.pred, Species)detach(iris)#例 10.3.1x-c(1, 2, 4.5, 6, 8)dim(x)-c(5, 1)d-dist(x)hc1-hclust(d, single)hc2-hclust(d, complete)hc3-hclust(d, median)hc4-hclust(d,

22、ward.D)opar-par(mfrow=c(2, 2)plot(hc1, hang=-1);plot(hc2, hang=-1)plot(hc3, hang=-1);plot(hc4, hang=-1)par(opar)#例 10.3.2data(iris); attach(iris)iris.hc-hclust(dist(iris,1:4)plot(iris.hc, hang = -1)plclust(iris.hc,labels = FALSE, hang=-1)re-rect.hclust(iris.hc,k=3)iris.id - cutree(iris.hc,3)table(ir

23、is.id,Species)detach(iris)第三部分、课后习题：#10.1x-c( 1 , 0.79,0.36,0.96,0.89,0.79,0.76,0.26,0.21,0.26,0.07,0.52,0.77,0.25,0.51,0.21, 0.79,1, 0.31,0.74,0.58,0.58,0.55,0.19,0.07,0.16,0.21,0.41,0.47,0.17,0.35,0.16, 0.36,0.31,1,0.38,0.31,0.3,0.35,0.58,0.28,0.33,0.38,0.35,0.41,0.64,0.58,0.51, 0.96,0.74,0.38,1,0

24、.9,0.78,0.75,0.25,0.2,0.22,0.08,0.53,0.79,0.27,0.57,0.26, 0.89,0.58,0.31,0.9,1,0.79,0.74,0.25,0.18,0.23,-0.02,0.48,0.79,0.27,0.51,0.23, 0.79, 0.58, 0.3, 0.78, 0.79, 1, 0.73, 0.18, 0.18, 0.23, 0, 0.38, 0.69, 0.14, 0.26, 0, 0.76, 0.55, 0.35, 0.75, 0.74,0.73,1,0.24,0.29,0.25,0.1,0.44,0.67,0.16,0.38,0.1

25、2, 0.26,0.19,0.58,0.25,0.25,0.18,0.24,1,-0.04,0.49,0.44, 0.3,0.32,0.51,0.51,0.38, 0.21, 0.07, 0.28,0.2,0.18,0.18,0.29,-0.04,1,-0.34,-0.16,-0.05,0.23,0.21,0.15,0.18, 0.26, 0.16,0.33,0.22,0.23,0.23,0.25,0.49,-0.34,1,0.23,0.5,0.31,0.15,0.29,0.14, 0.07, 0.21,0.38,0.08,-0.02,0,0.1,0.44,-0.16,0.23,1,0.24,

26、0.1,0.31,0.28,0.31, 0.52,0.41,0.35,0.53,0.48,0.38,0.44,0.3,-0.05,0.5,0.24,1,0.62,0.17,0.41,0.18, 0.77,0.47,0.41,0.79,0.79,0.69,0.67,0.32,0.23,0.31,0.1,0.62,1, 0.26,0.5,0.24, 0.25,0.17,0.64,0.27,0.27,0.14,0.16,0.51,0.21,0.15,0.31,0.17,0.26,1,0.63,0.5, 0.51,0.35,0.58,0.57,0.51,0.26,0.38,0.51,0.15,0.29

27、,0.28,0.41,0.5,0.63,1,0.65, 0.21,0.16,0.51,0.26,0.23,0,0.12,0.38,0.18,0.14,0.31,0.18,0.24,0.5,0.65,1)co-matrix(x,nrow=16,ncol=16,dimnames = list(c(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16),c(x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16)pr-princomp(co)summary(pr,loadings=TRUE)

28、说明：（1）Standard deviation: 表示主成分的标准差, 即主成分的方差平方根, 即相应特征值的开方; （2）Proportion of Variance: 表示方差的贡献率; （3）Cumulative Proportion: 表示方差的累计贡献率.（4）用summmary函数中loadings=TRUE选项列出了主成分对应原始变量的系数, 因此得到前三个主成分是：#10.2x-read.table(F:/文档/大学课程/R语言/实验/10.2习题.csv,header=TRUE,sep=,)xx1-data.frame(x$mur,x$rape,x$rob,x$ass,x$

29、bur,x$lar,x$aut)x1fa-factanal(x1, factors=3)fa说明：用x1,x2,x3,x4,x5,x6,x7来表示谋杀(MURDER),强暴(RAPE), 抢劫(ROBBERY), 骚扰(ASSAULT), 夜间偷窃(BUR-GLARY), 盗窃(LARCENY)及偷车(AUTO)。Proportion Var 是方差贡献率, Cumulative Var是累计方差贡献率, 检验表明三个因子已经充分#10.5x1-c(228,245,200,170,100,225,130,150,120,160,185,170,165,135,100)x2-c(134,134,

30、167,150,167,125,100,117,133,100,115,125,142,107,117)x3-c(20 ,10 ,12 ,7 ,20 ,7 ,6 ,7 ,10 ,5 ,5 ,6 ,5 ,2 ,7 )x4-c(11 ,40 ,27 ,8 ,14 ,14 ,12 ,6 ,26 ,10 ,19 ,4 ,3 ,12 ,2 )x5-c(1 ,1 ,1 ,1 ,1 ,2 ,2 ,2 ,2 ,2 ,3 ,3 ,3 ,3 ,3 )newdata-data.frame(x1,x2,x3,x4,x5)library(MASS)newdata.lda-lda(x5x1+x2+x3+x4)newdat

31、a.ldanewdata.pred=predict(newdata.lda) $ classtable(newdata.pred, x5)结果：结果说明:1) Group means: 包含了每组的平均向量2) Coefficients of linear discriminants: 线性判别系数3) Proportion of trace: 表明了第i判别式对区分各组的贡献大小4) Species: 表明将原始数据代入线性判别函数后的判别结果, 1组没有错判, 2有一个错判, 3有一个错判.#10.6x-c(1,2,4,6,9,11)dim(x)-c(6, 1)d-dist(x)hc1-h

32、clust(d, single)hc2-hclust(d, complete)hc3-hclust(d, median)hc4-hclust(d, ward.D)opar-par(mfrow=c(2, 2)plot(hc1, hang=-1);plot(hc2, hang=-1)plot(hc3, hang=-1);plot(hc4, hang=-1)par(opar)可见, 四种分类方法结果一致, 都将第5，6个分在一类, 其余在第二类.#10.7x1-c(190.33,135.20,95.21 ,104.78,128.41,145.68,159.37,116.22,221.11,144.9

33、8,169.92,153.11,144.92,140.54,115.84,101.18)x2-c(43.77 ,36.40 ,22.83 ,25.11 ,27.63 ,32.83 ,33.38 ,29.57 ,38.64 ,29.12 ,32.75 ,23.09 ,21.26 ,21.50 ,30.26 ,23.26 )x3-c(9.73 ,10.47 ,9.30 ,6.40 ,8.94 ,17.79 ,18.37 ,13.24 ,12.53 ,11.67 ,12.72 ,15.62 ,16.96 ,17.64 ,12.20 ,8.46 )x4-c(60.54 ,44.16 ,22.44 ,9

34、.89 ,12.58 ,27.29 ,11.81 ,13.76 ,115.65,42.60 ,47.12 ,23.54 ,19.52 ,19.19 ,33.61 ,20.20 )x5-c(49.01 ,36.49 ,22.81 ,18.17 ,23.99 ,39.09 ,25.29 ,21.75 ,50.82 ,27.30 ,34.35 ,18.18 ,21.75 ,15.97 ,33.77 ,20.50 )x6-c(9.04 ,3.94 ,2.80 ,3.25 ,3.27 ,3.47 ,5.22 ,6.04 ,5.89 ,5.74 ,5.00 ,6.39 ,6.73 ,4.94 ,3.85

35、,4.30 )province-c(北京,天津,河北,山西,内蒙古,辽宁,吉林,黑龙江,上海,江苏,浙江,安徽,福建,江西,山东,河南)newdata-data.frame(x1,x2,x3,x4,x5,x6)newdata.hc-hclust(dist(newdata)plot(newdata.hc, hang=-1)re-rect.hclust(newdata.hc,k=3)newdata.id - cutree(newdata.hc,3)table(newdata.id,province)结果：说明: 上图为典型的聚类树枝型分类图(Cluster Dendrogram), 它是将两相近(距离最短)的数据向量连接在一起, 然后进一步组合, 直至所有数据都连接在一起; 函数cuttree( )将数据newdata分类结果newdata.hc编为三组,分别以1, 2, 3表示, 保存在newdata.id中. 将newdata.id与province作比较发现, 1类包含北京、浙江, 2类包含安徽、福建、河北、河南、吉林、江苏、江西、辽宁、内蒙古、山东、山西、天津，3类包含上海. 专心-专注-专业

展开阅读全文