《路径模型和PLSppt课件.pptx》由会员分享,可在线阅读,更多相关《路径模型和PLSppt课件.pptx(79页珍藏版)》请在三一办公上搜索。
1、路径模型和PLS,吴喜之,基于回归的传统方法的假定(e.g.,multiple regression analysis,discriminant analysis,logistic regression,analysis of variance),简单模型结构:The postulation of a simple model structure(at least in the case of regression-based approaches);变量是可观测的:The assumption that all variables can be considered as observabl
2、e;所有变量可精确测量:The conjecture that all variables are measured without error,which may limit their applicability in some research situations.,为克服第一代基于回归的模型的弱点 Structural equation modeling(SEM),SEM仅同时分析自变量和因变量之间的链接中的一层.SEM允许多个自变量和因变量结构中的关系的同时建模.因此不再区别因变量和自变量,但是区别外生和内生隐变量变量(the exogenous and endogenous la
3、tent variables),前者不被设定的模型所解释(总是因变量),后者为被解释变量.SEM 能够构造由指标变量(indicators,items,manifest variables,or observed measures)以及可观测变量的度量误差来度量的不可观测变量,两种模型,基于协方差(或最大似然)的方法:Covariance-based SEM(软件工具:EQS,AMOS,SEPATH,and COSAN,the LISREL)基于方差(成分)的方法:Variance-based SEM(Component-based SEM),and to present partial le
4、ast squares(PLS),内生和外生隐变量的关系,内生隐变量及其指标及测量误差的关系,外生隐变量及其指标及测量误差的关系,名词,(eta)=latent endogenous variable;(xi)=latent exogenous(i.e.,independent)variable;(zeta)=random disturbance term;“errors in equations”(gamma)=path coefficient;(phi)noncausal relationship between two latent exogenous variables;yi=indi
5、cators of endogenous variables;i(epsilon)=measurement errors for indicators of endogenous variable;yi(lambda y)=loadings of indicators of endogenous variable;xi=indicators of endogenous variable;i(delta)=measurment errors for indicators of exogenous variable;xi=(lambda x)loadings of indicators of ex
6、ogenous variable.,内生和外生隐变量的关系:theoretical equations:representing nonobservational hypotheses and theoretical definitions(structural model),内生隐变量及其指标及测量误差的关系(measurement equations)(measurement model),外生隐变量及其指标及测量误差的关系(measurement equations)(measurement model),矩阵记号,结构模型,度量模型,三种不同类型的不可观测变量,原则上不可观测变量:va
7、riables that are unobservable in principle(e.g.,theoretical terms);原则上不可观测,但暗含经验概念或能够从观测值导出:variables that are unobservable in principle but either imply empirical concepts or can be inferred from observations(e.g.,attitudes,which might be reflected in evaluations);用可观测变量定义的不可观测变量:unobservable varia
8、bles that are defined in terms of observables.,两类指标变量:a)reflective indicators that depend on the construct;b)formative ones(also known as cause measures)that cause the formation of or changes in an unobservable variable,二者的区别,Reflective indicators should have a high correlation(as they are all depen
9、dent on the same unobservable variable),formative indicators of the same construct can have positive,negative,or zero correlation with one another(Hulland,1999),which means that a change in one indicator does not necessarily imply a similar directional change in others(Chin,1998a).,基于协方差(SEM-ML)和基于方
10、差(SEM-PLS)的两种建模,基于协方差方法试图减少样本协方差和理论预测的协方差的区别,因此参数估计过程试图重新产生观测到协方差矩阵(先计算模型参数,然后用回归得到个体估计值)基于方差的方法:使得被自变量解释的因变量方差最大,而不是再生经验协方差矩阵.除了结构模型和测量模型之外,PLS有第三部分:用来估计隐变量的个体值的加权关系(weight relations)(先计算个体值不可观测变量值用他们的指标变量的线性组合表示,所用权重使得最终的个体值反映了因变量的大多数方差,再估计不可观测变量的估计值.最后确定结构模型的参数.),PLS估计步骤:,两步确定权重(wi):第一步:外部近似(类似于主
11、成份分析for reflective,回归 for formative indicators)第二步:内部近似(三种方法:centroid,factor,and path weighting scheme),得到更新的,重复这两步直到收敛,PLS 优点:没有总体假定或度量标度的假定,因此也没有分布假定.然而需要某些假定,如线性回归的系统部分等于因变量的条件期望.根据Monte Carlo模拟,PLS非常稳健,而且隐变量的得分总是和真值吻合.由于隐变量的个体值为显变量的整合,由于后者的度量误差,该值为不相合的(但渐近相合).由于样本及每个隐变量的指标的有限性,PLS有低估隐变量之间的相关及高估载
12、荷(测量变量的系数)的倾向.,在基于协方差和基于方差的SEM之间的选择,在每个隐变量的指标变量数目太大时,基于协方差的SEM就没有办法了.而实际上,如果没有足够的指标变量(有时达到500个),不能做任何严肃的路径模型研究.由于有充分多的指标变量,选择权重不会对路径系数有任何影响,相合性问题就不是问题了.Therefore,the researcher would be well advised to use PLS instead of covariance-based SEM in such situations.Recapitulating these arguments by using
13、 the words of S.Wold(1993),H.Wolds son,one can say that“the natural domain for LV latent variable models such as PLSis where the number of significant LVs is small,much smaller than the number of measured variables and than the number of observations.”(p.137).,其它PLS占优势的情况,Constructs are measured pri
14、marily by formative indicators.那时基于协方差的方法(LISREL)会有严重的识别困难LISREL至少要100,甚至200个观测值,但PLS只需50(甚至在两个隐变量,27个显变量时只有10个观测值的情况).,Sohn&Park(2001)3的蒙特卡罗模拟比较表明:(1)以均方误差和对因子载荷的方差为标准,在数据量小,而且表现出稍微非正态时,ML性能最差;当数据是正态或近似正态时,在ML和PLS之间没有显著差别,(2)以因子载荷的偏差为标准,无论数据量大小,ML随着非正态增加而性能变差,(3)以回归系数的均方误差为标准,PLS比ML要好。,顾客满意度模型,感知表现
15、,顾客预期质量,顾客满意度,顾客抱怨,顾客忠诚,五个隐含变量中,顾客预期质量为外生隐变量(exogenous latent variable),其余为内生隐变量(endogenous latent variable)。,欧洲顾客满意度指数模型,感知质量软件,感知质量硬件,感知价值,预期质量,形象,顾客满意度,顾客忠诚,美国顾客满意度指数模型,感知质量,感知价值,预期质量,顾客满意度,顾客抱怨,顾客忠诚度,美国顾客满意度指数模型,中国耐用消费品顾客满意度指数模型,中国非耐用消费品顾客满意度指数模型,中国服务行业顾客满意度指数模型,中国耐用消费品顾客满意度指数模型,这里,包含有b的B矩阵、h及z是
16、未知的。而B矩阵的形式完全被图模型所确定。,这里,包含有l的L矩阵、h是未知的,而x是可观测的。而L矩阵的形式完全被图模型所确定。,偏最小二乘(PLS)法解路径模型(Path Model),吴喜之(plspm),例子(先不看数字),其中:reflective indicators“loadings”,其中:reflective indicators“weights”,library(plspm)#typical example of PLS-PM in customer satisfaction analysis#model with six LVs and reflective indica
17、tors data(satisfaction)IMAG-c(0,0,0,0,0,0)EXPE-c(1,0,0,0,0,0)QUAL-c(0,1,0,0,0,0)VAL-c(0,1,1,0,0,0)SAT-c(1,1,1,1,0,0)LOY-c(1,0,0,0,1,0)sat.mat-rbind(IMAG,EXPE,QUAL,VAL,SAT,LOY)sat.sets-list(1:5,6:10,11:15,16:19,20:23,24:27)sat.mod-rep(A,6)#reflective indicators res2-plspm(satisfaction,sat.mat,sat.set
18、s,sat.mod,scheme=centroid,scaled=FALSE)#plot diagram of the inner model plot(res2)#plot diagrams of both the inner model and outer model(loadings and weights)plot(res2,what=weights)plot(res2,what=loadings)plot(res2,what=all)#End(Not run),程序,plspm(x,inner.mat,sets,modes=NULL,scheme=centroid,scaled=TR
19、UE,boot.val=FALSE,br=NULL,plsr=FALSE),输出,An object of class plspm.When the function plspm.fit is called,it returns a list with basic results:,输出,If the function plspm is called,the previous list of results also contains the following elements:,#typical example of PLS-PM in customer satisfaction anal
20、ysis#model with six LVs and reflective indicators data(satisfaction)IMAG-c(0,0,0,0,0,0)EXPE-c(1,0,0,0,0,0)QUAL-c(0,1,0,0,0,0)VAL-c(0,1,1,0,0,0)SAT-c(1,1,1,1,0,0)LOY-c(1,0,0,0,1,0)sat.mat-rbind(IMAG,EXPE,QUAL,VAL,SAT,LOY)sat.sets-list(1:5,6:10,11:15,16:19,20:23,24:27)sat.mod-rep(A,6)#reflective indic
21、ators res2-plspm(satisfaction,sat.mat,sat.sets,sat.mod,scaled=FALSE)summary(res2)plot(res2),res2$unidim,res2$outer.mod,res2$out.weights输出第1列,res2$loadings输出第2列,res2$inner.mod,res2$path.coefs,res2$r.sqr,res2$inner.sum,res2$gof,res2$latents:输出所有观测值的latent值res2$scores:输出所有观测值的latent scores值,res2$effect
22、s#即路径系数path.coef,例,data(arizona)ari.inner-matrix(c(0,0,0,0,0,0,1,1,0),3,3,byrow=TRUE)dimnames(ari.inner)-list(c(ENV,SOIL,DIV),c(ENV,SOIL,DIV)ari.outer-list(c(1,2),c(3,4,5),c(6,7,8)ari.mod-c(B,B,B)#formative indicators res1-plspm(arizona,inner=ari.inner,outer=ari.outer,modes=ari.mod,scheme=factor,sca
23、led=TRUE,plsr=TRUE)res1 summary(res1),plot(res1,what=all),例,#example of PLS-PM in multi-block data analysis#estimate a path model for the wine data set#requires package FactoMineR library(FactoMineR)data(wine)SMELL-c(0,0,0,0)VIEW-c(1,0,0,0)SHAKE-c(1,1,0,0)TASTE-c(1,1,1,0)wine.mat-rbind(SMELL,VIEW,SH
24、AKE,TASTE)wine.sets-list(3:7,8:10,11:20,21:29)wine.mods-rep(A,4)#using function plspm.fit(basic pls algorithm)res4-plspm.fit(wine,wine.mat,wine.sets,wine.mods,scheme=centroid)plot(res4,what=all,arr.pos=.4,box.prop=.4,cex.txt=.8)#End(Not run),#Not run:#example with customer satisfaction analysis#grou
25、p comparison based on the segmentation variable gender data(satisfaction)IMAG-c(0,0,0,0,0,0)EXPE-c(1,0,0,0,0,0)QUAL-c(0,1,0,0,0,0)VAL-c(0,1,1,0,0,0)SAT-c(1,1,1,1,0,0)LOY-c(1,0,0,0,1,0)sat.inner-rbind(IMAG,EXPE,QUAL,VAL,SAT,LOY)sat.outer-list(1:5,6:10,11:15,16:19,20:23,24:27)sat.mod-rep(A,6)#reflecti
26、ve indicators pls-plspm(satisfaction,sat.inner,sat.outer,sat.mod,scheme=factor,scaled=FALSE)#permutation test with 100 permutations res.group-plspm.groups(pls,satisfaction$gender,method=permutation,reps=100)res.group plot(res.group)#End(Not run),plspm.groups plspm:Group Comparison in PLS-PM,nipals p
27、lspm:Non-linear Iterative Partial Least Squares(主成份分析),Principal Component Analysis with NIPALS algorithm,library(plspm)data(wines)nip1-nipals(wines,-1,nc=5)plot(nip1),#USArrests data vary nip2-nipals(USArrests)plot(nip2),plsca plspm:PLS-CA:Partial Least Squares Canonical Analysis(典型相关分析),#example o
28、f PLSCA with the vehicles datasetdata(vehicles);head(vehicles)names(vehicles)1 diesel turbo two.doors hatchback wheel.base 6 length width height curb.weight eng.size 11 horsepower peak.rpm price symbol city.mpg 16 highway.mpg can-plsca(vehicles,1:12,vehicles,13:16)can plot(can),semPLS,library(semPLS
29、)#下面是如何构建一个模型(以ECSI为例)#getting the path to the.csv file representing the inner Modelptf_Struc-system.file(ECSIstrucmod.csv,package=semPLS)#getting the path to the.csv file representing the outer Modelsptf_Meas-system.file(ECSImeasuremod.csv,package=semPLS)sm-as.matrix(read.csv(ptf_Struc)(w=read.csv(
30、ptf_Struc)mm-as.matrix(read.csv(ptf_Meas),构建一个模型(以ECSI为例),Expectation,Quality,Value,Image,Satisfaction,Complaints,Loyalty,ECSI,ECSI,data(mobi)class(mobi);dim(mobi);head(mobi)#data.frame“1 250 24ECSI-plsm(data=mobi,strucmod=sm,measuremod=mm)ECSI,exogen(ECSI)endogen(ECSI)reflective(ECSI)formative(ECSI
31、)indicators(ECSI,Image)predecessors(ECSI),#semplsdata(ECSImobi);class(ECSImobi);summary(ECSImobi);names(ECSImobi)#就是前面的ECSIecsi names(ecsi)#计算结果的名称 1 coefficients path_coefficients outer_loadings cross_loadings total_effects 6 inner_weights outer_weights blocks factor_scores data 11 scaled model wei
32、ghting_scheme sum1 pairwise 16 method iterations convCrit tolerance maxit 21 N incomplete,ecsi$coe#隐变量-显变量,ecsi$path_coe#隐变量-隐变量,ecsi$outer_loadings#和coefficient一样,但为矩阵形式is.matrix(ecsi$outer_loadings),ecsi$cross_loadings#上面有的这里一样,但上面为0的这里也有值,ecsi$total_effects#隐变量-隐变量,和path_coe不同#Total effects=direct effects+indirect effects,ecsi$inner_weights#隐变量-隐变量,和前面不同.这是权重,和下面的路径系数不同(红线部分),ecsi$outer_weights#隐变量-显变量,和coefficient不同(权重相加为1),ecsi$factor_scores#隐变量的个体值(矩阵)dim(ecsi$factor_scores)=250 7ecsi$data#观测值class(ecsi$data)数据矩阵,SEM with R?,(四页ppt),