
上传人:牧羊曲112 文档编号:1325532 上传时间:2022-11-09 格式:PPT 页数:35 大小:195.50KB
返回 下载 相关 举报
第1页 / 共35页
第2页 / 共35页
第3页 / 共35页
第4页 / 共35页
第5页 / 共35页


1、基础统计学Basic Statistics,数据类型 Types of Data,计量值和计数值Variable and Attribute Data,定量的数据叫做计量值。这些可测量的数据往往是用来回答如“多长”,“多少体积”,“多少时间”此类问题的(Data that quantitative is called variable data. The measured data that are answers to questions like ”how long” “what volume” & “how much time”.)定性的数据叫做计数值。这些数据是用来回答如“多少”,“多

2、久一次”此类问题(While data that is qualitative is called attribute data. Counted data are answers tothe questions of “how many” or “ how often”)例子:上班时间和迟到记录 (Example: Traveling to work),Variable data,Attribute data,Data,Variable,Attribute,Discrete,Continuous,Ordinal,Nominal,连续性和离散性Continuous and Discrete,D

3、ata,Variable,Attribute,Discrete,Continuous,Ordinal,Nominal,连续变量(Continuous Variable)能够将刻度尺寸划分地更精确(Infinitely divisible scale into decimal or continuum. )数据通过测量获得(Data obtained by measuring.)数据可在趋势图中展示出来 (Data displayed in trend chart. )比如:温度,时间和内径直径(E.g. temperature, time and ID diameter)离散变量(Discre

4、te Variable)在某一指定区域内不能再分,数据通过计算获得(cannot be plotted on an infinitely divisible scale. data obtained by counting.)数据可以通过直方图来表示(Data Bar chart.)例如:将缺陷的数目或零件故障的数目细分为1.5便没有任何意义 (E.g. subdivisions are not meaningful as number of defects or number of part failures 1.5),名义性和序数性Nominal and Ordinal,Data,Vari

5、able,Attribute,Discrete,Continuous,Ordinal,Nominal,名义性 (Nominal )直观的,如:男性和女性以及准时和迟到 (Categorical e.g Male and Female 1 to 10, 10 indicates higher grading for most people choose.) 又如: 产品的缺陷数目被做划分如下: (Product defects are tabulated as follows)A1 16B132C942,统计学 Statistics,统计学是一门讲述通过对数据的收集,陈述,分析,诠释进行一系列处

6、理以用于决策及解决问题的分支科学.(Statistics is the branch of science that deals with the collection, presentation, analysis & interpretation of data for the purpose of decision-making and problem-solving.)统计学作为品质改善上一项重要的技术,可用于描述和理解可变性.(Statistics is a critical skill in quality improvement as statistical techniques

7、 can be used to describe and to understand variability.),总体和样本 (Population vs Sample),总体 (Population)预测量的整体对象范围 (the entire set of measurements of interest),样本 (Sample) 来自总体的一个子集 (a subset of data from the population) 参数 (Parameters) 代表总体的测量数值 (numerical measures of a population) 统计学 (Statistics) 代表

8、样本的测量数值 (numerical measures of a sample),Population,Sample,参量 和 统 计 (Parameters vs Statistics),ParameterStatistic均值 (Mean)方差 (Variance)标准偏差 (Standard Deviation),数据的测量 Numerical Measures,描述数组的特性Describes the characteristics of the data set.主要的数组衡量 (Key numerical measures):位置的衡量 (中值趋势) measures of loc

9、ation (central tendency)分散程度的衡量 (方差) measures of dispersion (variation)形状的衡量 (分布) measures of shape (distribution),测量的位置 (Measures of Location),均值 Mean中值 Median众值 Mode,均值 (Mean),均值是指所观察一组样品的平均值;例子:SSI 房10个员工的平均高度计算如下:(Mean is average of the observation for a sample of size; n. Example: height of the

10、 10 employees in SSI room)Mean, x = 1.65+1.68+1.71+1.65+1.67+ 1.65+1.68+1.62+1.60+1.65 = 16.56/10 = 1.656,中值 (Median),中值的优点是不被数组里的最大值或最小值而影响.The advantage of the median is that it is not influenced verymuch by higher or lower values.,中值是将一组数据由上升或下降趋势排列后所取的中间数值. 如果是一个偶数数组,中值则是由中间两个数据和的平均值得到.(Median i

11、s the middle value in a set of data points sorted either ordescending order. If an even number of data points, the middle of the list ishalfway btw the 2 middle data points. )1.60 1.62 1.65 1.65 1.651.66 1.67 1.68 1.68 1.71 Median = (1.65+1.66)/2 = 1.655,均 值 对 中 值Mean Vs Median : 例1:Example 1,如有观察数组

12、是:(If the sample observations are)1 3 4 2 7 8 6那么此数组的均值中值是4.4和4.(The sample mean and median are 4.4 and 4 respectively. )两个数据都表示出这组数据中心趋势的合理量度.Both quantities give a reasonable measure of the central tendency of the data.如果最后一个数据的值改变为:(If the last observation is changed so that the data are)1 3 4 2

13、7 8 2450这组数据的均值是353.6而中值没有改变(The sample mean is 353.6 while the sample median remains unchanged).,众值 (Mode),众值是指在一组观察数据中出现频率最多的数值.Mode is the observation that it occurs most frequently value in a set of the sample/data points. 众值是比较独特的,可以是多个,有时也众值也不存在The mode may be unique, or there may be more than

14、 1 mode. Sometimes, the mode may not exist.1.60 1.62 1.65 1.65 1.651.66 1.67 1.68 1.68 1.71 Mode = 1.65,众值跟中值一样,也不会因为出现一个较大或较小的值而受影响.As for median, it is not influenced much by higher or lower value,众值:例2Mode: Example 2,如果一组观察数据是: (If the sample observations are)3 6 9 3 5 8 3 4 6 3 1 10 这组数据的众值是3,因为

15、它出现了4次. (The sample mode is 3, since it occurs four times.) 如果一组观察数据是: (If the sample observations are)3 6 9 3 5 8 3 4 6 3 1 10 6 2 5 6 这组数据的众值是3和6,因为它们都出现了4次 The sample modes are at 3 and 6, since they both occur four times. 如果一组观察数据是: If the sample observations are1 3 4 2 7 6 8 这组数据没有众值 The sample

16、 mode does not exist.,分散的衡量 (Measures of Dispersion),极 差 Range方差 Variance标准偏差 Standard Deviation,1) 极 差 Range,是一组观察数组中最大值和最小值的差距 The difference between the largest and the smallest sample observations例如: SSI 房10 位员工的高度 Example: Height of 10 employees in SSI,1.60 1.62 1.65 1.65 1.651.66 1.67 1.68 1.6

17、8 1.71,Max value,Min value,Range = 1.71 1.60 = 0.11,分散的衡量 Measures of Dispersion,距差:信息的丢失Range: Information Loss,分析两组观察数据1,3,5,8,9和1,5,5,5,9 Consider the two samples 1, 3, 5, 8, 9 and 1, 5, 5, 5, 9.都有着同样的距差(r=8)Both have the same range (r=8). 然而,第二组数据中仅仅是两端的极值不同,而第一组中间的数据差值都相当.However, in the second

18、 sample there is variability only in the two extreme values, while in the first sample the middle values vary considerably.,当观察数组个数小于10时,与距差相关的信息丢失不会太严重When the sample is small (n10), the information lossassociated with the range is not too serious.,2) 方差和标准偏差Variance & Standard Deviation,标准偏差是表现数据相

19、对于数据中心的集中程度Standard Deviation how closely the data points are clustered around the mean value in a set of data Sigma, s = (x-)2Mean = 1. 656m,1.60 1.62 1.65 1.65 1.651.66 1.67 1.68 1.68 1.71,n -1,分散的衡量 Measures of Dispersion,2)方差和标准偏差 (Variance & Standard Deviation),分散的衡量 (Measures of Dispersion),1.

20、60 1.656 = - 0.0561.62 1.656 = - 0.0361.65 1.656 = - 0.0061.65 1.656 = - 0.0061.65 1.656 = - 0.0061.66 1.656 = 0.0041.67 1.656 = 0.0141.68 1.656 = 0.0241.68 1.656 = 0.024 1.71 1.656 = 0.054,Std Deviation = (x-)2,n-1,1.60 1.656 = (- 0.056) 21.62 1.656 = (- 0.036) 21.65 1.656 = (- 0.006) 21.65 1.656 =

21、 (- 0.006) 21.65 1.656 = (- 0.006) 21.66 1.656 = ( 0.004) 21.67 1.656 = ( 0.014) 21.68 1.656 = ( 0.024) 21.68 1.656 = ( 0.024) 2 1.71 1.656 = ( 0.054 ) 2,1.60 1.656 = (- 0.056) 2 +1.62 1.656 = (- 0.036) 2 +1.65 1.656 = (- 0.006) 2 + 1.65 1.656 = (- 0.006) 2 +1.65 1.656 = (- 0.006) 2 +1.66 1.656 = (

22、0.004) 2 +1.67 1.656 = ( 0.014) 2 +1.68 1.656 = ( 0.024) 2 +1.68 1.656 = ( 0.024) 2 +1.71 1.656 = ( 0.054 ) 2 +,2)方差和标准偏差Variance & Standard Deviation,分散的衡量 Measures of Dispersion,Std Deviation, = (x-)2,n-1,Std Deviation, = 0.00884/(10-1) = 0.0306Variance = 2 = 0.000982,1.60 1.656 = (- 0.056) 2 +1.6

23、2 1.656 = (- 0.036) 2 +1.65 1.656 = (- 0.006) 2 + 1.65 1.656 = (- 0.006) 2 +1.65 1.656 = (- 0.006) 2 +1.66 1.656 = ( 0.004) 2 +1.67 1.656 = ( 0.014) 2 +1.68 1.656 = ( 0.024) 2 +1.68 1.656 = ( 0.024) 2 +1.71 1.656 = ( 0.054 ) 2 +(x-u) 2 = 0.00884,方差和标准偏差Variance & Standard Deviation,以两个前面引用的数组为例:For

24、the two samples quoted earlierSample A :1, 3, 5, 8, 9 Sample B :1, 5, 5, 5, 9,分散的衡量 例3 Measures of Dispersion Example 3,形状的衡量 Measures of Shape,偏 度 Skewness峰 度 Kurtosis,偏 度 Skewness,围绕均值分布的不对称程度被称为它的偏度The degree of asymmetry of a distribution around its mean isreferred to as its skewness.正偏度是指不对称一方尾

25、部的分布趋向更高值. Positive skewness implies a distribution with an asymmetric tail extending towards higher values. 常常也叫做右偏度. Sometimes referred to as right-handed skew. 负偏度是指不对称一方尾部的分布趋向更低值. Negative skewness implies a distribution with an asymmetric tail extending towards lower values. 常常也叫做左偏度. Sometime

26、s referred to as left-handed skew.,偏 度 Skewness,峰 度 Kurtosis,峰度是相对于正态分布而言对频数分布曲线高峰形态尖梢或平阔的测度Kurtosis characterizes the relative peakedness or flatness ofa distribution compared to a normal (mesokurtic) distribution.正峰度是指其分布较正态分布的峰尖峭 Positive kurtosis indicates a relatively peaked (leptokurtic) distr

27、ibution compared to the normal distribution.负峰度指其分布较正态分布的峰平阔。 Negative kurtosis indicates a relatively flat (platykurtic) distribution compared to the normal distribution. 峰度仅与不对称分布有关Kurtosis is relevant only for symmetrical distributions.,Kurtosis,绘图陈述 Graphical Presentations,可视性的阐述一系列数据:Visual int

28、erpretation the data set.一般的绘图工具:(Common graphical tools)散布图 Scatter plot(X/Y plot)柏拉图 Pareto Chart时间序列图 Time Series Diagram控制图 Control Charts点状图 Dot Plot直方图 Histogram茎叶图 Stem & Leaf Diagram,1) X/Y 图(散布图) X/Y plots (Scatter Plot),展现两个变量之间的关系 shows the relationship btw 2 variables,x,y,x,y,x,y,2) 柏拉图

29、Pareto Diagram,问题或原因中的相对重要部分 ( Relative importance in problem or causes.) 在改善问题中帮助集中注意力在需要优先解决的问题上(Help to focus on the priority issues for improving. )解决分布为80%的缺陷(Cut of mark at 80%),3) 时间序列图 Time Series Diagram,- 一个线形图表将数据用时间顺序表示出来 A line graph that show data in a time sequence. X-轴表示时间 X-axis as

30、time,类似一个时间序列图 as a time series plot . 但不同的是上控制线和下控制线被加入控制图中,为了监控制程能力或后来的改善。 But the different is only the UCL and LCL are added into control chart. To monitor the process performance or after improvement,AVG,time,UCL,LCL,4) 控制图 Control Charts,5) 点状图 Dot Plot,来自两个供应商的结合线拉力情况如下:The pull strength of b

31、onding wires from two suppliers are shown belowA :16.85 16.40 17.21 16.35 16.52 17.04 16.96 17.15 16.59 16.57B :17.50 17.63 18.25 18.00 17.86 17.75 18.22 17.90 17.96 18.15点状图揭示出A供应商的线材拉力强度较低,但是两组的变异性相同。The Dot Plot reveals that wires from supplier A seems to result in lowerpull strength , but the va

32、riability within both groups is about the same.,6) 直方图 Histogram,Sample Size = 100 unitsa) Bins = 5Width = 40b) Bins = 9Width = 20c) Bins = 18Width = 10,7) 茎叶图 Stem & Leaf Diagram,Stem LeafStem LeafStem Leaf 6 1 3 4 5 5 6 6 1 3 4 6 1 7 0 1 1 3 5 7 8 8 9 6* 5 5 6 6t 3 8 1 3 4 4 7 8 8 7 0 1 1 3 6f 4 5

33、 5 9 2 3 5 7* 5 7 8 8 9 6s 6 8 1 3 4 4 6* 8* 7 8 8 7 0 1 1 9 2 3 7t 3 9* 5 7f 6 7s 7 7* 8 8 9 8 1 8t 3 8f 4 4 8s 7 8* 8 8 9 9t 2 3 9f 5 9s 9*,注释 Note: 茎叶图相当于一个更原始的直方图,不会丢失信息细节. The Stem & Leaf Diagram can serve as a crude Histogram, without losing details of the information. 然而茎的数量过多也会导致数据信息的丢失 Excessive stems can lead to a loss of information.,Consider the following set of observed data :95 71 70 75 92 83 81 77 88 84 84 79 6578 87 71 64 88 66 63 78 73 61 65 93,


当前位置:首页 > 生活休闲 > 在线阅读



宁公网安备 64010402000987号