《pca主成分分析应用举例.ppt》由会员分享,可在线阅读,更多相关《pca主成分分析应用举例.ppt(30页珍藏版)》请在三一办公上搜索。
1、PCA主成分分析应用举例,例1,a=c(177,179,95,96,53,32,-7,-4,-3,179,419,245,131,181,127,-2,1,4,95,245,302,60,109,142,4,4,11,96,131,60,153,102,42,4,3,2,53,181,109,102,137,96,4,5,6,32,127,142,42,96,128,2,2,8,-7,-2,4,4,4,2,34,31,33,-4,1,4,3,5,2,31,39,39,-3,4,11,2,6,8,33,39,48),s=matrix(a,ncol=9),S为样本方差阵,求方差阵S的特征值和特征向
2、量,c=eigen(s)c,样本前3个主成分的系数是:,rho=diag(1/(sqrt(diag(s)%*%s%*%diag(1/(sqrt(diag(s)rho,例2学生身体各指标的主成分分析.随机抽取30名某年级中学生,测量其身高(X1)、体重(X2)、胸围(X3)和坐高(X4)。试对中学生身体指标数据做主成分分析.,30名中学生的四项身体指标,#用数据框形式输入数据student-data.frame(X1=c(148,139,160,149,159,142,153,150,151,139,140,161,158,140,137,152,149,145,160,156,151,147,
3、157,147,157,151,144,141,139,148),X2=c(41,34,49,36,45,31,43,43,42,31,29,47,49,33,31,35,47,35,47,44,42,38,39,30,48,36,36,30,32,38),X3=c(72,71,77,67,80,66,76,77,77,68,64,78,78,67,66,73,82,70,74,78,73,73,68,65,80,74,68,67,68,70),X4=c(78,76,86,79,86,76,83,79,80,74,74,84,83,77,73,79,79,77,87,85,82,78,80,7
4、5,88,80,76,76,73,78),cor(student)X1 X2 X3 X4X1 1.0000000 0.8631621 0.7321119 0.9204624X2 0.8631621 1.0000000 0.8965058 0.8827313X3 0.7321119 0.8965058 1.0000000 0.7828827X4 0.9204624 0.8827313 0.7828827 1.0000000 eigen(cor(student)$values1 3.54109800 0.31338316 0.07940895 0.06610989$vectors,1,2,3,41
5、,-0.4969661 0.5432128-0.4496271 0.50574712,-0.5145705-0.2102455-0.4623300-0.69084363,-0.4809007-0.7246214 0.1751765 0.46148844,-0.5069285 0.3682941 0.7439083-0.2323433,#作主成分分析 student.pr#并显示分析结果summary(student.pr,loadings=TRUE)Importance of components:Comp.1 Comp.2 Comp.3 Comp.4Standard deviation 1.
6、8817805 0.55980636 0.28179594 0.25711844Proportion of Variance 0.8852745 0.07834579 0.01985224 0.01652747Cumulative Proportion 0.8852745 0.96362029 0.98347253 1.00000000Loadings:Comp.1 Comp.2 Comp.3 Comp.4X1-0.497 0.543-0.450 0.506X2-0.515-0.210-0.462-0.691X3-0.481-0.725 0.175 0.461X4-0.507 0.368 0.
7、744-0.232,PRINCOMP过程由相关阵出发进行主成分分析.由相关阵的特征值可以看出,第一主成分的贡献率已高达88.53%;且前二个主成分的累计贡献率已达96.36%.因此只须用两个主成分就能很好地概括这组数据.另由第三和四个特征值近似为0,可以得出这4个标准化后的身体指标变量(Xi*,i=1,2,3,4)有近似的线性关系(即所谓共线性),如 0.505747 X1*-0.690844 X2*+0.461488 X3*-0.232343 X4*c(常数).,由最大的两个特征值对应的特征向量可以写出第一和第二主成分:Z1=-0.4970 X1*-0.5146 X2*-0.4809 X3*
8、-0.5069 X4*Z2=0.5432 X1*-0.2102 X2*-0.7246 X3*+0.3683X4*第一和第二主成分都是标准化后变Xi*(i=1,2,3,4)的线性组合,且组合系数就是特征向量的分量.,利用特征向量各分量的值可以对各主成分进行解释.第一大特征值对应的第一个特征向量的各个分量值均在0.5附近,它反映学生身材的魁梧程度.身体高大的学生,他的4个部位的尺寸都比较大;而身体矮小的学生,他的4个部位的尺寸都比较小.因此我们称第一主成分为大小因子.,第二大特征值对应的特征向量中第一(即身高X1的系数)和第四个分量(即坐高X4的系数)为正值,而第二(即体重X2的系数)和第三个分量
9、(即胸围X3的系数)为负值,它反映学生的胖瘦情况,故称第二主成分为胖瘦因子.,#各样本的主成分的值 predict(student.pr)Comp.1 Comp.2 Comp.3 Comp.4 1,0.06990950-0.23813701-0.35509248-0.266120139 2,1.59526340-0.71847399 0.32813232-0.118056646 3,-2.84793151 0.38956679-0.09731731-0.279482487 4,0.75996988 0.80604335-0.04945722-0.162949298 5,-2.73966777
10、0.01718087 0.36012615 0.358653044 6,2.10583168 0.32284393 0.18600422-0.036456084 7,-1.42105591-0.06053165 0.21093321-0.044223092 8,-0.82583977-0.78102576-0.27557798 0.057288572 9,-0.93464402-0.58469242-0.08814136 0.18103774610,2.36463820-0.36532199 0.08840476 0.04552012711,2.83741916 0.34875841 0.03
11、310423-0.03114693012,-2.60851224 0.21278728-0.33398037 0.21015757413,-2.44253342-0.16769496-0.46918095-0.16298783014,1.86630669 0.05021384 0.37720280-0.35882191615,2.81347421-0.31790107-0.03291329-0.222035112,16,0.06392983 0.20718448 0.04334340 0.70353362417,-1.55561022-1.70439674-0.33126406 0.00755
12、187918,1.07392251-0.06763418 0.02283648 0.04860668019,-2.52174212 0.97274301 0.12164633-0.39066799120,-2.14072377 0.02217881 0.37410972 0.12954896021,-0.79624422 0.16307887 0.12781270-0.29414076222,0.28708321-0.35744666-0.03962116 0.08099198923,-0.25151075 1.25555188-0.55617325 0.10906893924,2.05706
13、032 0.78894494-0.26552109 0.38808864325,-3.08596855-0.05775318 0.62110421-0.21893961226,-0.16367555 0.04317932 0.24481850 0.56024899727,1.37265053 0.02220972-0.23378320-0.25739971528,2.16097778 0.13733233 0.35589739 0.09312368329,2.40434827-0.48613137-0.16154441-0.00791402130,0.50287468 0.14734317-0
14、.20590831-0.122078819,-聚类分析,biplot(student.pr)#画出第一主成分和第二主成分的样本散点图,可以看出那些学生属于高大魁魁梧,如25;哪些同学属于身材瘦小的,如11和15。细高的同学,如23;矮胖的同学,如17.,#画碎石图screeplot(student.pr),#用数据框的形式输入数据conomy conomy x1 x2 x3 y1 149.3 4.2 108.1 15.92 161.2 4.1 114.8 16.43 171.5 3.1 123.2 19.04 175.5 3.1 126.9 19.15 180.8 1.1 132.1 18.8
15、6 190.7 2.2 137.7 20.47 202.1 2.1 146.0 22.78 212.4 5.6 154.1 26.59 226.1 5.0 162.3 28.110 231.9 5.1 164.3 27.611 239.0 0.7 167.6 26.3,#作线性回归 lm.sol summary(lm.sol)Call:lm(formula=y x1+x2+x3,data=conomy)Residuals:Min 1Q Median 3Q Max-0.52367-0.38953 0.05424 0.22644 0.78313 Coefficients:Estimate Std.
16、Error t value Pr(|t|)(Intercept)-10.12799 1.21216-8.355 6.9e-05*x1-0.05140 0.07028-0.731 0.488344 x2 0.58695 0.09462 6.203 0.000444*x3 0.28685 0.10221 2.807 0.026277*-Signif.codes:0*0.001*0.01*0.05.0.1 1Residual standard error:0.4889 on 7 degrees of freedomMultiple R-squared:0.9919,Adjusted R-square
17、d:0.9884 F-statistic:285.6 on 3 and 7 DF,p-value:1.112e-07,得到回归方程 Y=-10.12799-0.05140 X1+0.58695 X2+0.28685X3,Y是进口量,X1是国内总产值,而对应系数的符号却是负的,也就是说,国内生产总值越高,其进口量越少,这与实际情况是不相符的。其原因是三个变量存在共线性关系。,#作主成分分析 conomy.pr-princomp(x1+x2+x3,data=conomy,cor=T)summary(conomy.pr,loadings=TRUE)Importance of components:C
18、omp.1 Comp.2 Comp.3Standard deviation 1.413915 0.9990767 0.0518737839Proportion of Variance 0.666385 0.3327181 0.0008969632Cumulative Proportion 0.666385 0.9991030 1.0000000000Loadings:Comp.1 Comp.2 Comp.3x1-0.706 0.707x2-0.999 x3-0.707-0.707 前两个主成分已经达到99%的累计贡献率。第一主成分椒关于国内总产值和总消费,因此称为产销因子,第二主成分只与存储量
19、有关,称为存储因子。,pre|t|)(Intercept)21.8909 0.1658 132.006 1.21e-14*z1-2.9892 0.1173-25.486 6.02e-09*z2-0.8288 0.1660-4.993 0.00106*-Signif.codes:0*0.001*0.01*0.05.0.1 1,#作变换,得到原坐标下的关系表达式 beta-coef(lm.sol);A-loadings(conomy.pr)x.bar-conomy.pr$center;x.sd-conomy.pr$scale coef-(beta2*A,1+beta3*A,2)/x.sd beta0-beta1-sum(x.bar*coef)c(beta0,coef)(Intercept)x1 x2 x3-9.13010782 0.07277981 0.60922012 0.10625939,