《两个群体的比较Stata教学.ppt》由会员分享,可在线阅读,更多相关《两个群体的比较Stata教学.ppt(63页珍藏版)》请在三一办公上搜索。
1、Ming-chi Chen,社會統計,Page.1,Stata教學,第四講兩個樣本之間的比較,Ming-chi Chen,社會統計,Page.2,打開85q1family.dta這個社會變遷基本資料調查第三期第二次家庭的Stata資料檔因為中文相容性問題有一些亂碼,辨識不易可以打開85q1_format.txt看變數名稱以及變數值名稱以j2、j3為例j2問受訪者拾.2.通常您平均每週大約花多少時間做家務工作?_ 小時j3問受訪者拾.3.通常您的配偶平均每週大約花多少時間做家務工作?_小時,Ming-chi Chen,社會統計,Page.3,我們的資料裡有變數標籤,但是因為相容性的關係會有亂碼查
2、看是否有亂碼?Data-data editor在j2這個變數名稱上click一下,下面一整欄的數值都反白了滑鼠右鍵-variable-properties-label出現的中文是通常您平均牢週大約花多少時間做家務工作把亂碼改好也將j3變數標籤的亂碼改好,Ming-chi Chen,社會統計,Page.4,查看變數有無異常值,關掉Data editor視窗用box plot來看有無極端值Graphics-easy graphs-box plot-main-在variable的空格裡鍵入j2,Ming-chi Chen,社會統計,Page.5,用box plot來看有無極端值,Ming-chi C
3、hen,社會統計,Page.6,同樣方法也可以查看j3的極端值也可以直接在指令欄,Ming-chi Chen,社會統計,Page.7,這就是指令欄,Ming-chi Chen,社會統計,Page.8,在指令欄裡直接鍵入Graph box j2然後按enter,Ming-chi Chen,社會統計,Page.9,Summarize varname,detail,指令欄鍵入summarize j2,detail或statistics-summaries,tables,&tests-summary statistics-summary statistics,Ming-chi Chen,社會統計,Pa
4、ge.10,.通常您平均每週大約花多少時間做家務工作?-Percentiles Smallest 1%0 0 5%0 010%0 0 Obs 192425%2 0 Sum of Wgt.192450%7 Mean 50.32692 Largest Std.Dev.191.134275%20 99890%35 998 Variance 36532.2895%70 998 Skewness 4.71770799%996 999 Kurtosis 23.40378,太愛做家事了吧!,高得不合理,Ming-chi Chen,社會統計,Page.11,Recode極端值,我們到85q1_format.t
5、xt去看,發現J2 J3 996不知道 998不適用 999拒答所以要把995以上定義為system missingRecode j2 995/max=.這裡的句點.就是Stata系統定義的缺失值。,12,.summarize j2,detail 通常您平均每週大約花多少時間做家務工作?-Percentiles Smallest 1%0 0 5%0 010%0 0 Obs 184925%2 0 Sum of Wgt.184950%7 Mean 11.96106 Largest Std.Dev.15.3076275%15 10590%28 112 Variance 234.323295%36 1
6、68 Skewness 3.20855599%70 168 Kurtosis 20.90302,一週只有168小時,所以應該合理換算,以一天16小時算,一週112小時,13,.inspect j2j2:通常您平均每週大約花多少時間做家務工作 Number of Observations-Total Integers Nonintegers|#Negative-|#Zero 305 305-|#Positive 1544 1544-|#-|#Total 1849 1849-|#.Missing 75+-0 168 1924(47 unique values),用inspect來看大致分佈以及缺失
7、個案數Data-describe data-inspect variables,Ming-chi Chen,社會統計,Page.14,Recode j2 168=112,15,.inspect j2j2:通常您平均每週大約花多少時間做家務工作 Number of Observations-Total Integers Nonintegers|#Negative-|#Zero 305 305-|#Positive 1544 1544-|#-|#Total 1849 1849-|#.Missing 75+-0 112 1924(46 unique values),16,.sum j2,detail
8、 通常您平均每週大約花多少時間做家務工作?-Percentiles Smallest 1%0 0 5%0 010%0 0 Obs 184925%2 0 Sum of Wgt.184950%7 Mean 11.90049 Largest Std.Dev.14.7918875%15 10590%28 112 Variance 218.799695%36 112 Skewness 2.63237799%70 112 Kurtosis 12.87359,17,.inspect j3j3:通常您的配偶平均每週大約花多少時間做家 Number of Observations-Total Integers
9、Nonintegers|#Negative-|#Zero 263 263-|#Positive 1661 1661-|#-|#Total 1924 1924-|#.#Missing-+-0 999 1924(54 unique values),18,.summarize j3,detail通常您的配偶平均每週大約花多少時間做家務工作?Percentiles Smallest1%0 05%0 010%0 0 Obs 192425%4 0 Sum of Wgt.192450%14 Mean 278.8342Largest Std.Dev.436.233675%996 99890%998 999 V
10、ariance 190299.795%998 999 Skewness 1.0388899%998 999 Kurtosis 2.085666,Ming-chi Chen,社會統計,Page.19,Missing value&recode,Recode j3 990/max=.Recode j3 168=112,20,.recode j3 168=112(j3:4 changes made).inspect j3j3:通常您的配偶平均每週大約花多少時間做家 Number of Observations-Total Integers Nonintegers|#Negative-|#Zero 26
11、3 263-|#Positive 1144 1144-|#-|#Total 1407 1407-|#.Missing 517+-0 150 1924(50 unique values),21,.summarize j3,detail通常您的配偶平均每週大約花多少時間做家務工作?Percentiles Smallest1%0 05%0 010%0 0 Obs 140725%2 0 Sum of Wgt.140750%7 Mean 14.49893Largest Std.Dev.18.229675%21 11290%35 112 Variance 332.318595%49 150 Skewnes
12、s 2.56952699%85 150 Kurtosis 12.65059,Ming-chi Chen,社會統計,Page.22,Recode j3 112/max=112Tabulate j3,Ming-chi Chen,社會統計,Page.23,70|10 0.71 98.29 80|3 0.21 98.51 84|6 0.43 98.93 85|1 0.07 99.00 90|1 0.07 99.08 98|4 0.28 99.36 100|1 0.07 99.43 105|1 0.07 99.50 112|7 0.50 100.00-+-Total|1,407 100.00,Ming-
13、chi Chen,社會統計,Page.24,來看看男女的差別A1.這題是性別,男是1,女是2。Data-data editor-找的A1這個變數-滑鼠右鍵Variable-properties-label改成性別Value label-define/modify-define-label name輸入gender-OK-value鍵入1-text鍵入男-OKvalue鍵入1-text鍵入男-OK-cancel-close-value label選擇gender-OK關掉Data editor視窗,Ming-chi Chen,社會統計,Page.25,男女的家務分擔是否有不同?,Statisti
14、cs-Summaries,tables,&tests-tables-One/Two-way table of summary statistics,自變數,依變數,Ming-chi Chen,社會統計,Page.26,差別很大嗎?,|Summary of|通常您平均每週大約花多少時間做家務工作|性別|Mean Std.Dev.Freq.-+-男|6.0485537 10.23684 968 女|18.330306 16.287017 881-+-Total|11.900487 14.791877 1849,Ming-chi Chen,社會統計,Page.27,母體變異數未知但已知相等,Stat
15、istics-Summaries,tables,&tests-Classical tests of hypotheses-Group mean comparison tests,依變數,自變數,信賴水準,28,.ttest j2,by(a1)level(99)Two-sample t test with equal variances-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-男|968 6.048554.3290245 10.23684 5.199367 6.897741 女|881 18.33031.5487235 16.28702
16、 16.91382 19.7468-+-combined|1849 11.90049.3439971 14.79188 11.01349 12.78748-+-diff|-12.28175.6268771-13.89815-10.66535-diff=mean(男)-mean(女)t=-19.5920Ho:diff=0 degrees of freedom=1847 Ha:diff 0 Pr(T|t|)=0.0000 Pr(T t)=1.0000,Ming-chi Chen,社會統計,Page.29,母體變異數未知但已知不相等,以上的方法是假設母體變異數未知但已知相等。不管樣本大小,統計軟體一
17、般用t檢定那如果母體變異數未知但已知不相等,怎麼辦?,Ming-chi Chen,社會統計,Page.30,母體變異數未知但已知不相等,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group mean comparison tests,變異數不相等,自由度需要比較複雜,由Welch提出的運算方式,Ming-chi Chen,社會統計,Page.31,男女性負擔家務工作時數的差異,在母體變異數未知但已知不等的情況下,.ttest j2,by(a1)unequal welch level(99)Two-sampl
18、e t test with unequal variances-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-男|968 6.048554.3290245 10.23684 5.199367 6.897741 女|881 18.33031.5487235 16.28702 16.91382 19.7468-+-combined|1849 11.90049.3439971 14.79188 11.01349 12.78748-+-diff|-12.28175.6398083-13.93195-10.63155-diff=mean(男)-mea
19、n(女)t=-19.1960Ho:diff=0 Welchs degrees of freedom=1456.62 Ha:diff 0 Pr(T|t|)=0.0000 Pr(T t)=1.0000,Ming-chi Chen,社會統計,Page.32,變異數相等與否的Levene檢定,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group variance comparison tests,依變數,自變數,Ming-chi Chen,社會統計,Page.33,變異數相等與否的Levene檢定,.sdtest
20、j2,by(a1)level(99)Variance ratio test-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-男|968 6.048554.3290245 10.23684 5.199367 6.897741 女|881 18.33031.5487235 16.28702 16.91382 19.7468-+-combined|1849 11.90049.3439971 14.79188 11.01349 12.78748-ratio=sd(男)/sd(女)f=0.3950Ho:ratio=1 degrees of freedo
21、m=967,880 Ha:ratio 1 Pr(F f)=1.0000,sd(男)/sd(女)不等於一,p值顯示可以拒斥變異數相等的虛無假設,Ming-chi Chen,社會統計,Page.34,根據Levene檢定的結果,選擇變異數不相等的假設比較正確。也就是男性分擔家務的時數顯著地少於女性。,Ming-chi Chen,社會統計,Page.35,已婚未婚者的家務工作負擔的比較,A5為受訪者的婚姻狀況1為未婚,2為已婚,3為其他已婚者家務負擔比較大嗎?,Ming-chi Chen,社會統計,Page.36,已婚未婚者的家務工作負擔的比較,仿照男女的比較得到如下的錯誤回報.ttest j2,b
22、y(a5)level(99)more than 2 groups found,only 2 allowedr(420);這是因為a5這個變數有三個變數值:未婚、已婚和其他要用條件是來限制,僅比較未婚者和已婚者,Ming-chi Chen,社會統計,Page.37,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group mean comparison tests,Ming-chi Chen,社會統計,Page.38,變異數相等,.ttest j2 if a5!=3,by(a5)level(99)Two-samp
23、le t test with equal variances-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-未婚|306 5.598039.5156249 9.019752 4.261516 6.934562 已婚|1531 13.12671.3912873 15.31029 12.11757 14.13586-+-combined|1837 11.87262.3434793 14.7216 10.98695 12.75828-+-diff|-7.528675.9051995-9.862742-5.194608-diff=mean(未婚)-
24、mean(已婚)t=-8.3171Ho:diff=0 degrees of freedom=1835 Ha:diff 0 Pr(T|t|)=0.0000 Pr(T t)=1.0000,Ming-chi Chen,社會統計,Page.39,變異數不相等,.ttest j2 if a5!=3,by(a5)unequal welch level(99)Two-sample t test with unequal variances-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-未婚|306 5.598039.5156249 9.019752 4.
25、261516 6.934562 已婚|1531 13.12671.3912873 15.31029 12.11757 14.13586-+-combined|1837 11.87262.3434793 14.7216 10.98695 12.75828-+-diff|-7.528675.6472826-9.20044-5.85691-diff=mean(未婚)-mean(已婚)t=-11.6312Ho:diff=0 Welchs degrees of freedom=712.885 Ha:diff 0 Pr(T|t|)=0.0000 Pr(T t)=1.0000,Ming-chi Chen,社
26、會統計,Page.40,Levene檢定,.sdtest j2 if a5!=3,by(a5)level(99)Variance ratio test-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-未婚|306 5.598039.5156249 9.019752 4.261516 6.934562 已婚|1531 13.12671.3912873 15.31029 12.11757 14.13586-+-combined|1837 11.87262.3434793 14.7216 10.98695 12.75828-ratio=sd(未婚)
27、/sd(已婚)f=0.3471Ho:ratio=1 degrees of freedom=305,1530 Ha:ratio 1 Pr(F f)=1.0000,無法拒斥變異數相等的虛無假設,Ming-chi Chen,社會統計,Page.41,兩層群體的比較,已婚男女間,未婚男女間是否有差異?婚姻是否不利於女性(至少就花在家務勞動上的時間而言)?,Ming-chi Chen,社會統計,Page.42,變異數相等,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group mean comparison tests
28、,43,.by a5,sort:ttest j2 if a5!=3,by(a1)level(99)-a5=未婚Two-sample t test with equal variances-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-男|177 5.316384.7992975 10.63396 3.234972 7.397796 女|129 5.984496.5435252 6.173259 4.563295 7.405698-+-combined|306 5.598039.5156249 9.019752 4.261516 6.9345
29、62-+-diff|-.6681119 1.04519-3.377347 2.041123-diff=mean(男)-mean(女)t=-0.6392Ho:diff=0 degrees of freedom=304 Ha:diff 0 Pr(T|t|)=0.5232 Pr(T t)=0.7384,多重比較變異數相等,44,多重比較變異數相等,-a5=已婚Two-sample t test with equal variances-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-男|784 6.095663.3493023 9.780465 5
30、.193722 6.997605 女|747 20.50602.6054935 16.54893 18.94238 22.06967-+-combined|1531 13.12671.3912873 15.31029 12.11757 14.13586-+-diff|-14.41036.6909184-16.19227-12.62845-diff=mean(男)-mean(女)t=-20.8568Ho:diff=0 degrees of freedom=1529 Ha:diff 0 Pr(T|t|)=0.0000 Pr(T t)=1.0000,45,多重比較變異數不相等,.by a5,sort
31、:ttest j2 if a5!=3,by(a1)unequal welch level(99)-a5=未婚Two-sample t test with unequal variances-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-男|177 5.316384.7992975 10.63396 3.234972 7.397796 女|129 5.984496.5435252 6.173259 4.563295 7.405698-+-combined|306 5.598039.5156249 9.019752 4.261516 6.934
32、562-+-diff|-.6681119.96659-3.174232 1.838008-diff=mean(男)-mean(女)t=-0.6912Ho:diff=0 Welchs degrees of freedom=292.466 Ha:diff 0 Pr(T|t|)=0.4900 Pr(T t)=0.7550,46,多重比較變異數不相等,-a5=已婚Two-sample t test with unequal variances-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-男|784 6.095663.3493023 9.78046
33、5 5.193722 6.997605 女|747 20.50602.6054935 16.54893 18.94238 22.06967-+-combined|1531 13.12671.3912873 15.31029 12.11757 14.13586-+-diff|-14.41036.699024-16.2138-12.60693-diff=mean(男)-mean(女)t=-20.6150Ho:diff=0 Welchs degrees of freedom=1199.87 Ha:diff 0 Pr(T|t|)=0.0000 Pr(T t)=1.0000,47,多層次比較變異數相等檢
34、定,.by a5,sort:sdtest j2 if a5!=3,by(a1)level(99)-a5=未婚Variance ratio test-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-男|177 5.316384.7992975 10.63396 3.234972 7.397796 女|129 5.984496.5435252 6.173259 4.563295 7.405698-+-combined|306 5.598039.5156249 9.019752 4.261516 6.934562-ratio=sd(男)/sd(女)
35、f=2.9673Ho:ratio=1 degrees of freedom=176,128 Ha:ratio 1 Pr(F f)=0.0000 Pr(F f)=0.0000,48,多層次比較變異數相等檢定,-a5=已婚Variance ratio test-Group|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-男|784 6.095663.3493023 9.780465 5.193722 6.997605 女|747 20.50602.6054935 16.54893 18.94238 22.06967-+-combined|1531 13.12
36、671.3912873 15.31029 12.11757 14.13586-ratio=sd(男)/sd(女)f=0.3493Ho:ratio=1 degrees of freedom=783,746 Ha:ratio 1 Pr(F f)=1.0000,Ming-chi Chen,社會統計,Page.49,Box Plot箱型圖的比較,Ming-chi Chen,社會統計,Page.50,單身男性和已婚男性是否有差別?單身女性和已婚女性是否有差別?,Ming-chi Chen,社會統計,Page.51,配對樣本,結婚對女性不利?前例的分析中,我們比較已婚者與未婚者從事家務時間的差異,由此來推
37、論婚前婚後可能產生的變化。但婚前組與婚後組是由不同受訪者所構成的獨立樣本。如果是否結婚與某些人格特質有關,則我們不知道是因為婚姻本身造成行為上的改變,還是具有某種行為傾向的人比較容易選擇婚姻。即我們的分析可能隱藏自我選擇 self-selection的問題,Ming-chi Chen,社會統計,Page.52,配對樣本,為了證明婚姻對從事家務時間的影響不是來自於自我選擇,更好的分析樣本為長期追蹤資料(longitudinal data),即能追蹤同一個受訪者,在婚前及婚後所產生行為上的變化。但這種樣本資料的蒐集十分費時費力。,Ming-chi Chen,社會統計,Page.53,配對樣本,夫妻
38、之間從事家務的時間是否有顯著的差異?我們可以用兩種方式來分析:將已婚男性與已婚女性當作兩獨立樣本,比較所有先生的平均值與太太的平均值是否有差異?,Ming-chi Chen,社會統計,Page.54,配對樣本,但夫妻從事家務的時間不是獨立事件,先生多分擔,太太自然可以少做。因此應該比較同一家庭中,夫與妻從事家務的時間是否有差異,而不是比較所有的夫的平均值與所有妻的平均值。,Ming-chi Chen,社會統計,Page.55,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Mean comparison test
39、s,paired data,1st-2nd,Ming-chi Chen,社會統計,Page.56,夫妻之間的家務分工,Paired t test-Variable|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-j2|1380 12.80652.3971524 14.75356 11.78211 13.83094 j3|1380 14.32391.4762999 17.69376 13.09535 15.55248-+-diff|1380-1.517391.6578304 24.43732-3.214199.1794161-mean(diff)=mean
40、(j2-j3)t=-2.3067 Ho:mean(diff)=0 degrees of freedom=1379 Ha:mean(diff)0 Pr(T|t|)=0.0212 Pr(T t)=0.9894,配偶間相減,但是是妻減夫還是夫減妻?,僅知夫妻間有差異,比配偶少,且達顯著水準,Ming-chi Chen,社會統計,Page.57,配對樣本,如果要比較先生與太太從事家務時間的多寡,則應該如何分析?男女分開分析,Ming-chi Chen,社會統計,Page.58,產生新的變數並定義其計算式,Generate h_work=(j3 j2)Replace h_work=(j2 j3)if a
41、1=2,Ming-chi Chen,社會統計,Page.59,One sample mean comparison test,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-one sample mean comparison test,Ming-chi Chen,社會統計,Page.60,.ttest h_work=0,level(99)One-sample t test-Variable|Obs Mean Std.Err.Std.Dev.99%Conf.Interval-+-h_work|1380 15.90
42、435.5009808 18.61061 14.61212 17.19658-mean=mean(h_work)t=31.7464Ho:mean=0 degrees of freedom=1379 Ha:mean 0 Pr(T|t|)=0.0000 Pr(T t)=0.0000,已婚女性的負擔,Ming-chi Chen,社會統計,Page.61,質化變數(比例)的比較,K1問如果母親外出工作,對還沒上學的小孩比較不好。1非常贊成,2贊成,3不贊成,4非常不贊成,5無意見,6不知道,7不瞭解題意,9拒答,0未答Recode k1(1 2=1)(3 4=0)(else=.)把這個依變數變成1和0
43、兩個數值而已。,Ming-chi Chen,社會統計,Page.62,Statistics-Summaries,tables,&tests-Classical tests of hypotheses-Group proportion test,Ming-chi Chen,社會統計,Page.63,.prtest k1,by(a1)level(99)Two-sample test of proportion 男:Number of obs=935 女:Number of obs=861-Variable|Mean Std.Err.z P|z|99%Conf.Interval-+-男|.7754011.0136478.7402468.8105554 女|.7584204.0145876.7208453.7959956-+-diff|.0169806.0199765-.0344753.0684366|under Ho:.0199596 0.85 0.395-diff=prop(男)-prop(女)z=0.8507 Ho:diff=0 Ha:diff 0 Pr(Z z)=0.1975,