《多元数据分析》PPT课件.ppt

上传人:牧羊曲112 文档编号:5488915 上传时间:2023-07-12 格式:PPT 页数:25 大小:345.97KB
返回 下载 相关 举报
《多元数据分析》PPT课件.ppt_第1页
第1页 / 共25页
《多元数据分析》PPT课件.ppt_第2页
第2页 / 共25页
《多元数据分析》PPT课件.ppt_第3页
第3页 / 共25页
《多元数据分析》PPT课件.ppt_第4页
第4页 / 共25页
《多元数据分析》PPT课件.ppt_第5页
第5页 / 共25页
点击查看更多>>
资源描述

《《多元数据分析》PPT课件.ppt》由会员分享,可在线阅读,更多相关《《多元数据分析》PPT课件.ppt(25页珍藏版)》请在三一办公上搜索。

1、多元数据分析,刘国庆,Question?,How to determine a fair market value for the property?In building the model,the goal is to predict the market value of the Leslie Salt property,using the information available.,Data,LeslieSaltData.txt,Regression Analysis,It is used to explore the relationship between a set of in

2、dependent variables(Xs)and a single dependent variable(Y).A regression model is a linear combination of independent variables that corresponds as possible to the dependent variable.,Regression for answering questions,How can we describe the relationship between the dependent variable and the indepen

3、dent variables?Is the relationship described by the model statistically significant?Which independent variables are most important?How well does the model generalize to observations outside the sample?,资料预处理,大多数的销售价格低于$10,000,但是有一些价格甚至高于$20,000 or$30,000每亩;一般地,这种价值的差别会带来问题。为此,采用log函数。一些特殊的点需要排除,Vari

4、ables,DescriptionVariables.txt,Correlation Matrix,CorrMatData.txt,分析,从表3中可以看出与log(Price)高度关联的变量是Elevation,Sewer,Date,Flood.从表格中我们也可以看到在独立的变量之间也存在着相关性。比如,Elevation,County.,回归模型,Results of Regression,ResultsRegress.txt,How good is the Fit?,How good is the Fit(continue I),One of the drawbacks of R2 is

5、that whenever an independent variable is added to the model it always increases,no matter how small the contribution in fit.In general,when building models,one wants to make a trade-off between parsimony and improvement in fit.,How good is the Fit(continue II),Is it Significant?,Test the error terms

6、 are normally distributed;that is Testing the model,F-Test,FTest.txtThe critical value for the F-distribution at the 0.01 level is F(4,26)=4.14,which suggests that our model is highly significant.,Detecting problems with the model,One measure of multicollinearity is called the condition index(CI),He

7、teroscedasticity(异方差性),We have assumed that the error terms all have the same variance.This assumption is sometimes called homoscedasticity.When the assumption is violated(i.e.,the variances are not all the same),we have what is called heteroscedasticity.,Weighted least squares(WLS),Influential Obse

8、rvations,Sometimes regression model results can be inordinately influenced by one or a few observations in the data.This type of observation as an outlier.,DFBETAs,We can look at the impact of the observation I on the parameter estimate.,Return to the Leslie model,The observation 2 seems the most ex

9、treme with respect to its lack of fit.Having identified the second observation as a marginally influential observation and a potential outlier,what should we do about it.One thing to remember is that it is never appropriate to discard an observation simply because it seems atypical.,Comparing Models,Forecasting,Model Validation,

展开阅读全文
相关资源
猜你喜欢
相关搜索

当前位置:首页 > 生活休闲 > 在线阅读


备案号:宁ICP备20000045号-2

经营许可证:宁B2-20210002

宁公网安备 64010402000987号