第9章　异方差问题检验与修正.ppt

资源描述

《第9章　异方差问题检验与修正.ppt》由会员分享，可在线阅读，更多相关《第9章　异方差问题检验与修正.ppt（64页珍藏版）》请在三一办公上搜索。

1、第9章异方差：检验与修正,Heteroskedasticity:test and correction,Contents,Whats heteroskedasticity?Why worry about heteroskedasticity?How to test the heteroskedasticity?Corrections for heteroskedasticity?,Whats heteroskedasticity?,What is Heteroskedasticity?,Recall the assumption of homoskedasticity implied tha

2、t conditional on the explanatory variables,the variance of the unobserved error,u,was constantvar(u|X)=s2(homoskedasticity)If this is not true,that is if the variance of u is different for different values of the Xs,then the errors are heteroskedasticvar(ui|Xi)=si2(heteroskedasticity),Example of hom

3、oskedasticity,Example of Heteroskedasticity,Examples,Generally,cross-section data more easily induce heteroskedasticity because of different characteristics of different individuals.Consider a cross-section study of family income and expenditures.It seems plausible to expect that low income individu

4、als would spend at a rather steady rate,while the spending patterns of high income families would be relatively volatile.If we examine sales of a cross section of firms in one industry,error terms associated with very large firms might have larger variances than those error terms associated with sma

5、ller firms;sales of larger firms might be more volatile than sales of smaller firms.,Patterns of heteroskedasticity,The relation between R&D expenditure and Sales,The scatter graph between R&D expenditure and Sales,Why Worry About Heteroskedasticity?,The consequences of heteroskedasticity,OLS estima

6、tes are still unbiased and consistent,even if we do not assume homoskedasticity.take the simple regression as an example Y=b0+b1 X+uWe know the OLS estimator of b1 is,The consequences of heteroskedasticity,cont.,The R2 and adj-R2 are unaffected by heteroskedasticity.Because RSS and TSS are not affec

7、ted by heteroskedasticity,our R2 and adj-R2 are also not affected by heteroskedasticity.,The consequences of heteroskedasticity,cont.,The standard errors of the estimates are biased if we have heteroskedasticity,The consequences of heteroskedasticity,cont.,The OLS estimates arent efficient,thats the

8、 variances of the estimates are not the smallest variances.If the standard errors are biased,we can not use the usual t statistics or F statistics for drawing inferences.That is,the t test and F test and the confidence interval based on these test dont work.In a word,when there exists heteroskedasti

9、city,we can not use t test and F test as usual.Or else,well get the misleading result.,Summary of the consequences of heteroskedasticity,OLS estimates are still unbiased and consistentThe R2 and adj-R2 are unaffected by heteroskedasticityThe standard errors of the estimates are biased.The OLS estima

10、tes arent efficient.Then,the t test and F test and the confidence interval dont work.,How to test the heteroskedasticity?,Residual plot,In the OLS estimation,we often use the residual ei to estimate the random error term ui,therefore,we can test whether there is heteroskedasticity of ui by examine e

11、i.We plot the scatter graph between ei2 and X.,Residual plot,cont.,Residual plot,cont.,If there are more than one independent variables,we should plot the residual squared with all the independent variables,separately.There is a shortcut to do the residual plot test when there are more than 1 indepe

12、ndent variables.That is,we plot the residual with the fitted value,because is just the linear combination of all Xs.,Residual plot:example 9.2,Park test,If there exists heteroskedasticity,then the variance of error term ui,si2 may be correlated with some of the independent variables.Therefore,we can

13、 test whether si2 is correlated with any of the explanatory variables.If they are related,then there exists heteroskedasticity,on the contrary,theres no heteroskedasticity.For example,for the simple regression model ln(si2)=b0+b1 ln(Xi)+vi,Procedure of Park test,Regress dependent variable(Y)on indep

14、endent variables(Xs),first.Get the residual of the first regression,ei and ei2.Then,take ln(ei2)as dependent variable,the original independent variables logged as explanatory variables,make a new regression.ln(ei2)=b0+b1 ln(Xi)+viThen test H0:b1=0 against H1:b1 0.If we can not reject the null hypoth

15、esis,then that prove there is no heteroskedasticity,thats,homoskedasticity.,Park test:Example,Let take example 9.2 as exampleFirst,regress R&D expenditure(rdexp)on sales(sales),we getrdexp=192.91+0.0319 salesSe=(991.01)(0.0083)N=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=14.67Second,get the residuals(ei)of

16、the regressionThird,regress ln(ei2)on ln(sales),we getln(ei2)=1.216 ln(sales)Se=(0.057)p=(0.000)R2=0.9637 Adj-R2=0.9615Finally,we test whether the slope of the second regression equal zero.From the p-value of the parameter,given 5%significant level,we will can reject the null hypothesis.Therefore,th

17、ere exist heteroskedasticity in the first regression.Note:Park test is not a good test for heteroskedeasticity because of his special specification of the auxiliary regression,which may be heteroskedastic.,Glejser test,The essence of Glejser test is same to Park test.But,Glejser suggest we can use t

18、he following regression to detect the heteroskedasticity of u.|ei|=b0+b1 Xi+vi|ei|=b0+b1 Xi+vi|ei|=b0+b1(1/Xi)+viStill,we just test H0:b1=0 against H1:b1 0.If we can reject the null hypothesis,then that prove there is heteroskedasticity.On the contrary,its homoskedasticity.,Glejser test:example 9.2,

19、First,regress R&D expenditure(rdexp)on sales(sales),we getrdexp=192.91+0.0319 salesSe=(991.01)(0.0083)N=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=14.67Second,get the residuals(ei)of the regressionThird,regress|ei|on 1/sales,we get|ei|=2273.65-1992500(1/sales)se=(604.69)(12300000)p=(0.002)(0.125)Finally,tes

20、t whether the slope is zero.From the p-value of the slope,we can see it larger than 5%of significance level.We can not reject the null hypothesis,that means there doesnt exist heteroskedasticity.,The White Test,The White test is more general test,which allows for nonlinearities by using squares and

21、crossproducts of all the Xs,ie.,k=3Y=b0+b1X1+b2X2+b3X3+ue2=d0+d1 X1+d2X2+d3 X3+d4 X12+d5X22+d6X32+d7X1X2+d8X1X3+d9X2X3+vUsing an F or LM to test whether all the Xj,Xj2,and XjXh are jointly significant,that is,to test H0:d1=d2=d9=0 against H1:H0 is not true.If we can reject H0,that means there exists

22、 heteroskedasticity.,The White Test,To test H0:d1=d2=d9=0,we can use F test learned in chapter 4.Let R2 stands for the goodness of fit from the auxiliary regression.F=R2/k/(1 R2)/(n k 1)We also can use LM test.LM=nR2c2(k),n is number of obs.k is the number of restrictions.,The White Test:Example 9.2

23、,First,regress R&D expenditure(rdexp)on sales(sales)and profits(profits),we getrdexp=-13.93+0.0126 sales+0.2398profitsse=(991.997)(0.018)(0.1986)p=(0.989)(0.496)(0.246)n=18 R2=0.5245 Adj-R2=0.4611 F=8.27Second,we get the residuals e from the regression above.Third,regress e2 on sales,profits,sales2,

24、profits2,and salesprofits.e2=693735.5+135.00sales-1965.7profits-0.0027sales2-0.116 profits2+0.050salesprofitsN=18 R2=0.8900 F(5,12)=19.42 Prob F=0.0000Finally,test H0:d1=d2=d3=d4=d5=0,The p-value of the F test is 0.0000,so we can reject H0.LM=nR2=180.89=16.02 c20.05(5)=11.07,also reject H0.So,there

25、exists heteroskedasticity in the first regression.,Alternate form of the White test,This can get to be unwieldy pretty quicklyConsider that the fitted values from OLS,are a function of all the XsThus,2 will be a function of the squares and crossproducts and and 2 can proxy for all of the Xj,Xj2,and

26、XjXh,so Regress the residuals squared on and 2 and use the R2 to form an F or LM statisticNote only testing for 2 restrictions now,The procedure of the special case of white test,regress Y on X1,X2,Xk.We get the residual eiCalculate,2(predict ybar,xb.Gen ybarsq=ybar2)regress e2 on,2.And test the joi

27、nt zero hypotheses of the regressorsUse F statistic or LM test to test the null hypothesis of homoskedasiticity.,Example:white test in wage determination equation,First,using OLS estimate the model without considering heteroskedasticitywge=-2.87+0.599educ+0.022exper+0.139tenureCalculate the residual

28、s of regression,ei and the fitted value of wage,wge.Therefore,the value of ei2,wge2.Regress ei2 on wge,wge2,we getei2=7.36 2.86 wge+0.49 wge2se=(5.62)(1.76)(0.125)n=526 R2=0.0984 F=28.55 ProbF=0.000Test Ho:d1=d2=0,F test,F=28.55 ProbF=0.000 5.99=c20.05(2),reject H0.,Corrections for Heteroskedasticit

29、y,Corrections for Heteroskedasticity,Known variances,Var(ui|X)=si2The original model isYi=b0+b1Xi1+bkXik+uiTwo sides divided by si at the same timeThe new disturbance isui*=ui/si,then var(ui*)=var(ui/si)=var(ui)/si2=1So the new modelYi/si=b0/si+b1Xi1/si+bkXik/si+ui/si,that is,Y*=b0*+b1X1*+bkXk*+u*We

30、 can estimate the new model with OLS,this is called WLSBut,usually,we dont know the variances.,Case of form being known up to a multiplicative constant,Suppose the heteroskedasticity can be modeled as Var(u|X)=s2h(X),where the trick is to figure out what h(X)hi looks likeE(ui/hi|X)=0,because hi is o

31、nly a function of X,and Var(ui/hi|X)=s2,because we know Var(u|X)=s2hiSo,if we divided our whole equation by hi we would have a model where the error is homoskedastic,Case 1:h(X)=X,The simple regression modelYi=b0+b1Xi+uiWe know ui is heteroskedasticity and the variance of ui is Var(u|Xi)=s2h(Xi)=s2X

32、i,Then,we divide the original model by Xi two sides,get a know modelYi/Xi=b0/Xi+b1 Xi/Xi+ui/Xi,rewrite it asYi/Xi=b0/Xi+b1Xi+vi(*)Var(vi)=var(ui/Xi)=var(ui)/Xi=s2,which is homoskedastic.Therefore,the new equaiton(*)can be estimated using OLS.,Example 9.6(textbook2e,p233),We have proved that there ex

33、ist heteroskedasticity in the model of R&D expenditure determination model.Now,we assume the variance of the error term change with independent variable sales,that is,var(ui)=s2salesiThe original model isrdexpi=b0+b1salesi+uiThe transformed model isrdexpi/salesi=b0(1/salesi)+b1 salesi+vi,Where,vi=ui

34、/salesi,Example 9.6(textbook2e,p233),Estimate of the transformed model isrdexp/sales=-246.73(1/sales)+0.0368 salesrdexp=-246.73+0.0368salesse=(381.16)(0.0071)t=(-0.65)(5.17)n=18 R2=0.6923 Adj-R2=0.6538 F=18.00WLS command:reg rdexp sales aweight=1/salesEstimate of the original model isrdexp=192.91+0.

35、0319 salesSe=(991.01)(0.0083)t=(0.19)(3.83)N=18 R2=0.4783 Adj-R2=0.4457 F(1,16)=14.67Compare the result of the two estimation,what do you find?,Case 2:h(X)=X2,The simple regression modelYi=b0+b1Xi+uiWe know ui is heteroskedasticity and the variance of ui is Var(u|Xi)=s2h(Xi)=s2Xi2,Then,we divide the

36、 original model by Xi two sides,get a know modelYi/Xi=b0/Xi+b1 Xi/Xi+ui/Xi,rewrite it asYi/Xi=b0/Xi+b1+vi(*)Var(vi)=var(ui/Xi)=var(ui)/Xi2=s2,which is homoskedastic.Therefore,the new equaiton(*)can be estimated using OLS.,Generalized Least Squares,Estimating the transformed equation by OLS is an exa

37、mple of generalized least squares(GLS)GLS will be BLUE in this case,(because the transformed equation will meet the Gauss-Markov assumption)GLS is a weighted least squares(WLS)procedure where each squared residual is weighted by the inverse of Var(ui|xi),More on WLS,More on WLS,cont.,More on WLS,con

38、t.,A similar weighting arises when we are using per capita data at the city,country,state,or country level.If the individual-level equation satisfies the Guass-Markov assumptions,then the error in per captia equation has a variance proportional to one over the size of the population.Therefore,weight

39、ed least squares with weights equal to the population is appropriate.,Summary of WLS,WLS is great if we know what Var(ui|xi)looks likeIn most cases,wont know form of heteroskedasticityExample where do is if data is aggregated,but model is individual levelWant to weight each aggregate observation by

40、the inverse of the number of individuals,Feasible GLS,More typical is the case where you dont know the form of the heteroskedasticity.In this case,you need to estimate h(xi)Typically,we start with the assumption of a fairly flexible model,such asVar(u|x)=s2exp(d0+d1x1+dkxk)Since we dont know the d,m

41、ust estimate,Feasible GLS(continued),Our assumption implies that u2=s2exp(d0+d1x1+dkxk)vWhere E(v|x)=1,then if E(v)=1ln(u2)=a0+d1x1+dkxk+eWhere E(e)=0 and e is independent of xNow,we know that e is an estimate of u,so we can estimate this by OLS,Feasible GLS(continued),Now,an estimate of h is obtain

42、ed as=exp(),and the inverse of this is our weight,So,what did we do?Run the original OLS model,save the residuals,e,square them and take the log,that is ln(e2)Regress ln(e2)on all of the independent variables and get the fitted values,Do WLS using 1/exp()as the weight,Example of FGLS:Demand for Ciga

43、rettes(Smoke.raw),What determine the peoples daily demand consumption?Variablescigs,cigarettes smoked per day.income,annual income,$.cigpric,cigarettes price for per pack.age,in yearsrestaurn,dummy vaiable=1 if state restaurant smoking restrictionsModelcgs=-3.64+0.88 log(income)0.75 log(cigpric)0.50

44、 educ+0.77 age 0.009 age2 2.83 restaurn,Example of FGLS:Demand for Cigarettes,Use White test the heteroskedasticity:Get e2 and reg e2 on all independent variablesGet F=13.69 p-value=0 Or,LM=8070.0329=26.55 p-value=0That proves there exists heteroskedasticity.reg ln(e2)on all the independent variable

45、s and get the fitted value Transforming all the data with 1/e,and regress the transformed equation without constant.cgs=5.63+1.295 log(income)2.94 log(cigpric)0.463 educ+0.482 age 0.0056 age2 3.461 restaurnThe income effect is now statistically significant and larger in magnitude.The estimates chang

46、ed somewhat,but the basic story is still the same.Cigarette smoking is negatively related to schooling,has a quadratic relationship with age,and is negatively affected by restaurant smoking restrictions.,New specifications to correct heteroskedasticity,Use new specifications,sometimes,can correct th

47、e heteroskedasticity.The log-linear model is more often homoskedasticity.For example,in the example 9.2,we can use the following specification:ln(rdexpi)=b0+b1ln(salesi)+uiThe estimated model isln(rdexpi)=-7.37+1.32 ln(salesi)se=(1.85)(0.17)t=(-3.99)(7.87)n=18 R2=0.7946 Adj-R2=0.7818 F=61.91,New spe

48、cifications to correct heteroskedasticity,Using the White test to test heteroskedasticity2=-5.84+0.96 ln(salesi)-0.034 ln(sales)2se=(15.16)(2.84)(0.13)t=(-0.39)(0.34)(-0.26)n=18 R2=0.1454 Adj-R2=0.0315 F=1.28 ProbF=0.3078Test:Ho:d1=d2=0 F test:F=1.28 F0.05=3.68LM test:LM=nR2=180.1454=2.62 c20.05(2)=

49、5.99So,we will cant reject H0,that is,there is no heteroskedasticity.,White Heteroskedasticity-robust standard error,We have just learned that although the OLS estimate is still unbiased when there is heteroskedasticity,but OLS estimate is not efficiency.That is,the standard error from usual OLS is

50、biased.Therefore the corresponding t test and F test dont work any more.White designed a new method which considering the heteroskedasticity,which can calculate the correct standard error and the corresponding t test works.In this case,we can use the t test and F test,but it requires large sample,be

展开阅读全文

第9章 异方差问题检验与修正.ppt

第9章　异方差问题检验与修正.ppt