《会计资讯系统计学上课投影.ppt》由会员分享,可在线阅读,更多相关《会计资讯系统计学上课投影.ppt(29页珍藏版)》请在三一办公上搜索。
1、5-1,Data Collection and Sampling,Chapter 5,5-2,Recall,Statistics is a tool for converting data into information:,ButWhere then does data come from?How is it gathered?How do we ensure its accurate(正確)?Is the data reliable(可靠)?Is it representative(代表性)of the population from which it was drawn?This cha
2、pter explores some of these issues.,5-3,5.1 Methods of Collecting Data,The reliability and accuracy of the data affect the validity of the results of a statistical analysis.The reliability and accuracy of the data depend on the method of collection.Four of the most popular sources of statistical dat
3、a are:Published data(公開資料)Observational studies(觀察)Experimental studies(實驗)Surveys(調查),5-4,This is often a preferred source of data due to low cost and convenience.Published data is found as printed material,tapes,disks,and on the Internet.Data published by the organization that has collected it is
4、called PRIMARY DATA(初級資料).,For example:Data published by the US Bureau of Census.,Data published by an organization different than the organization that has collected it is called SECONDARY DATA(次級資料).,For example:The Statistical abstracts of the United States,compiles data from primary sources Comp
5、ustat,sells variety of financial data tapescompiled from primary sources,Published Data,5-5,Observational study is one in which measurements representing a variable of interest are observed and recorded,without controlling any factor that might influence their values.Experimental study is one in whi
6、ch measurements representing a variable of interest are observed and recorded,while controlling factors(控制變數)that might influence their values.,When published data is unavailable,one needs to conduct a study to generate the data.,Observational and experimental studies,5-6,Surveys solicit information
7、 from people.e.g.pre-election polls;marketing surveys.The Response Rate(回收率)(i.e.the proportion of all people selected who complete the survey)is a key survey parameter.Surveys can be made by means of personal interview(個人訪談)telephone interview(電話訪談)self-administered questionnaire(自我管理調查),Surveys,5-
8、7,Questionnaire Design(問卷設計),Key design principles of a good questionnaire:Keep the questionnaire as short as possible.Ask short,simple,and clearly worded questions.Start with demographic questions to help respondents get started comfortably.Use dichotomous(yes|no)and multiple choice questions.Use o
9、pen-ended questions cautiously.Avoid using leading-questions.Pretest a questionnaire on a small number of people.Think about the way you intend to use the collected data when preparing the questionnaire.,5-8,5.2 Sampling(抽樣),Recall that statistical inference permits us to draw conclusions about a po
10、pulation based on a sample.Motivation for conducting a sampling procedure:Costs.(e.g.its less expensive to sample 1,000 television viewers than 100 million TV viewers)Population size.The possible destructive nature(破壞性)of the sampling process.(e.g.performing a crash test on every automobile produced
11、 is impractical).The sampled population(抽樣母體)and the target population(目標母體)should be similar to one another.,5-9,5.3 Sampling Plans,A sampling plan is just a method or procedure for specifying how a sample will be taken from a population.We will focus our attention on these three methods:Simple ran
12、dom sampling(簡單隨機抽樣)Stratified random sampling(分層隨機抽樣)Cluster sampling(集群抽樣),5-10,Simple Random Sampling,In simple random sampling all the samples with the same size are equally likely to be chosen.To conduct random sampling assign a number to each element of the chosen population(or use already giv
13、en numbers),randomly select the sample numbers(members).Use a random numbers table,or a software package.,5-11,Example 5.1A government income-tax auditor is responsible for 1,000 tax returns.The auditor will randomly select 40 returns to audit.Use Excels random number generator to select the returns
14、.SolutionWe generate 50 numbers between 1 and 1000(we need only 40 numbers,but the extra might be used if duplicate numbers are generated.),Simple Random Sampling,5-12,Simple Random Sampling,Example 5.1:A government income tax auditor must choose a sample of 40 of 1,000 returns to audit,Extra#s may
15、be used if duplicate random numbers are generated.,5-13,Simple Random Sampling,X(100),Round-up,38310159790088595915408864139246.,The auditor should select 40 files numbered 383,101,.,5-14,This sampling procedure separates the population into mutually exclusive sets(strata)(互斥的層別),and then draw simpl
16、e random samples from each stratum.,Stratified Random Sampling,5-15,With this procedure we can acquire information aboutthe whole populationeach stratumthe relationships among strata.,Stratified Random Sampling,5-16,Stratified Random Sampling,After the population has been stratified,we can use simpl
17、e random sampling to generate the complete sample.For example,keep the proportion of each stratum in the population.,5-17,Cluster sampling is a simple random sample of groups or clusters of elements.This procedure is useful whenit is difficult and costly to develop a complete list of the population
18、members(making it difficult to develop a simple random sampling procedure.the population members are widely dispersed geographically.Cluster sampling may increase sampling error(抽樣誤差),because of probable similarities among cluster members.,Cluster Sampling,5-18,Sample Size(樣本數),Numerical techniques
19、for determining sample sizes will be described later,but suffice it to say that the larger the sample size is,the more accurate we can expect the sample estimates to be.,5-19,5.4 Sampling and Non-Sampling Errors,Two major types of error can arise when a sample of observations is taken from a populat
20、ion:Sampling error(抽樣誤差)refers to differences between the sample and the population that exist only because of the observations that happened to be selected for the sample.Nonsampling errors(非抽樣誤差)are more serious and are due to mistakes made in the acquisition of data or due to the sample observati
21、ons being selected improperly.,5-20,Sampling Error,Sampling error refers to differences between the sample and the population that exist only because of the observations that happened to be selected for the sample.Another way to look at this is:the differences in results for different samples(of the
22、 same size)is due to sampling error:E.g.Two samples of size 10 of 1,000 households.If we happened to get the highest income level data points in our first sample and all the lowest income levels in the second,this delta is due to sampling error.Increasing the sample size will reduce this type of err
23、or.,5-21,Population income distribution,m(population mean),Sampling error,Sampling Errors,5-22,Nonsampling Error,Nonsampling errors are more serious and are due to mistakes made in the acquisition of data or due to the sample observations being selected improperly.Three types of nonsampling errors:E
24、rrors in data acquisitionNonresponse errors(無反應偏差)Selection bias(取樣誤差)Note:increasing the sample size will not reduce this type of error.,5-23,Errors in data acquisition,arises from the recording of incorrect responses,due to:incorrect measurements being taken because of faulty equipment,mistakes ma
25、de during transcription from primary sources,inaccurate recording of data due to misinterpretation of terms,orinaccurate responses to questions concerning sensitive issues.,5-24,Data Acquisition Error,Sampling error+Data acquisition error,Population,Sample,5-25,Nonresponse Error,refers to error(or b
26、ias)introduced when responses are not obtained from some members of the sample,i.e.the sample observations that are collected may not be representative of the target population.As mentioned earlier,the Response Rate(i.e.the proportion of all people selected who complete the survey)is a key survey pa
27、rameter and helps in the understanding in the validity of the survey and sources of nonresponse error.,5-26,Non-Response Error,Population,Sample,No response here.,may lead to biased results here.,5-27,Selection Bias,occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample.,5-28,Selection Bias,Population,Sample,When parts of the population cannot be selected.,the sample cannot representthe whole population.,5-29,