Download One-Way Analysis of Variance (ANOVA) - Comparing Treatment Means - Prof. Johan Dorp and more Study notes Systems Engineering in PDF only on Docsity! EMSE 171/271: DATA ANALYSIS For Engineers and Scientists Session 1 : One-Way Analysis of Variance (ANOVA)" Lecture Notes by: J. René van Dorp1 www.seas.gwu.edu/~dorpjr 1 Department of Engineering Management and Systems Egineering, School of Engineering and Applied Science, The George Washington University, 1776 G Street, N.W. Suite 110, Washingtonß D.C. 20052. E-mail: dorpjr@gwu.edu. EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 281dorpjr@gwu.edu ONE-WAY ANOVA Summary of Tests • Univariate -test: X L À œ ß L À Á Ð! ! " ! !. . . . .. Scalar is specified) x11 x12 … x1nSample x µ0x1 Independence within Sample Compare • Two Sample Univariate -test: .X L À œ ß L À Á! " # " " #. . . . x11 x12 … x1nSample x x1 y11 y12 … y1nSample y y1 Compare Independence between Sample x and Sample y Independence within Samples x and y EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 284dorpjr@gwu.edu ONE-WAY ANOVA Introduction • Objective of Analysis of Variance (ANOVA): x11 x12 … x1nSample 1 µ10x1 Paired Comp- arisons x21 x22 … x2nSample 2 µ20x2 … xp1 xp2 … xpnSample p µp0xp … … … … … In de pe nd en ce B et w ee n Sa m pl es 2 p Independence within Sample i Tensile Strength Example: The tensile strength of synthetic fiber used to make cloth for men's shirts is of interest to a manufacturer. It is suspected that the strength is affected by the percentage of cotton in the fiber. Five levels of cotton percentages are of interest, %, %, %, %, and %. Five observations are"& #! #& $! $& to be taken at each level of cotton percentage, and the 25 total observations are to be run in random order. Total number of paired comparisons: Œ & # œ "! EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 285dorpjr@gwu.edu ONE-WAY ANOVA Introduction • It seems that this problem can be solved by performing a two-sample test on> all possible pairs. However, this solution would be incorrect, since it would lead to .a considerable distortion in the type I error Tensile Strength Example: There are possible pairs. If the probability of accepting the null hypothesis"! (there is no difference between a pair) for all 10 tests is , then the" œ !Þ*&α probability of correctly accepting the null hypothesis for all 10 tests (i.e. there is no difference between the 10 samples) equals: Ð!Þ*&Ñ œ !Þ'! Í œ " !Þ'! œ !Þ%!"! Type I error . if the tests are independent. Thus a substantial increase of Type I error has occurred. The appropriate procedure for testing equality of several means in the setting above is the ANALYSIS OF VARIANCE. • It is interesting to note that we are testing equality of means here by analyzing variances. To see how does works we need to study some of the mechanics. EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 286dorpjr@gwu.edu ONE-WAY ANOVA Introduction • In ANOVA samples are referred to as . Hence, we have:"treatments" x11 x12 … x1nTreatment 1 µ10x1 Independence within Treatment i Paired Comp- arisons x21 x22 … x2nTreatment 2 µ20x2 … xp1 xp2 … xpnTreatment p µp0xp … … … … … In de pe nd en ce B et w ee n Tr ea tm en ts 2 p \ œ ß 3 œ "ßá ß : 4 œ "ßá ß 834 3 34 . 7 % œ .: a parameter common to all treatments called the overall mean 73 À ßa parameter unique to the -th treatment called 3 the treatment effect % % 534 34À µ RÐ!ß Ñ a random error component, for all and 3ß 4 3Þ3Þ.Þ EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 289dorpjr@gwu.edu ONE-WAY ANOVA Mechanics • Total sum of squares: WW œ ÐB B Ñ œ ÐB B Ñ 8 Ð B B ÑX 34 34 3 3 3œ" 4œ" 3œ" 4œ" 3œ" : : :8 8 # # # •• • • •• or WW œ WW WWX I X</+>7/8>= where; WW œ ÐB B Ñ œ ÐB B ÑI 34 3 34 3 3œ" 4œ" 3œ" 4œ" : :8 8 # #” •• • The sum of squares within an treatment , summed over all treatments3 WW œ 8 Ð B B ÑX</+>7/8>= 3 3œ" : # • •• The sum of squares of treatment means against the overall mean B B3• •• EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 290dorpjr@gwu.edu ONE-WAY ANOVA Mechanics • The sample variance in the -th treatment equals:3 W œ ÐB B Ñ Í Ð8 "ÑW œ ÐB B Ñ " 8 " # # # # 3 3 4œ" 4œ" 8 8 34 3 34 3• • • These 's can be combined to get an estimate of overall variance as followsW#3 " : Ð8 "Ñ: 8: : W œ œ œ Ð8 "Ñ W Ð8 "ÑW á Ð8 "ÑW œ œ ÐB B Ñ R : R : WW ” • 3œ" : # 3 3œ" : # 3 # # " : 3œ" 4œ" : 8 34 3 # I • • Recalling noting :% 534 µ RÐ!ß Ñ and de QW œ Ê IÒQW Ó œ I W œ œ WW " " R : : : I I I 4œ" 4œ" : : # # # 3” • 5 5 EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 291dorpjr@gwu.edu ONE-WAY ANOVA Mechanics • Recalling that the estimators of the -th treatment% 534 µ RÐ!ß Ñ, observe 3 means \ œ \ " 8 3 34 4œ" 8 • are all random variables with common variance 3 œ "ßá ß : Î8Þ5# • we also have thatIf the treatments means are all equal, \ œ \ œ \ " " " : 8 8: •• ” • 3œ" 4œ" 3œ" 4œ" : :8 8 34 34 is an unbiased estimate of the treatment mean .common . • Hence, if the treatments means are all equal, I Ð\ \ Ñ œ Í I Ð\ \ Ñ œ " 8 : " 8 : " ” • ” • 3œ" 3œ" : : 3 3 # # # # • •• • •• 5 5 EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 294dorpjr@gwu.edu ONE-WAY ANOVA Mechanics • ANALYSIS OF VARIANCE (ANOVA) TABLE: Source Sum of Degrees of Mean Squares Freedom Square Between treatments Error (within J WW : " QW WW R : QW ! X</+>7/8>= X</+>7/8> QW QW I I X</+>7/8>= I treatments) Total WW R "X • Convenient calculation formulas also for the unbalanced case):Ð WW œ B ß R œ 8 B R WW œ B B 8 R X 3 3œ" 4œ" 3œ" : :8 # 34 # X</+>7/8>= 3œ" : # # 3 3 3 •• • •• unbalanced = different number of observations in each treatment EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 295dorpjr@gwu.edu ONE-WAY ANOVA Example Tensile Strength Example: The tensile strength of synthetic fiber used to make cloth for men's shirts is of interest to a manufacturer. It is suspected that the strength is affected by the percentage of cotton in the fiber. Five levels of cotton percentage are of interest, 15%, 20%, 25%, 30%, and 35%. Five observations are to be taken at each level of cotton percentage, and the 25 total observations are to be run in random order. Table: Tensile Strength of Synthetic Fiber (lb/in.2) Percentage of Cotton 1 2 3 4 5 xi• 15% 7 7 15 11 9 49 20% 12 17 12 18 18 77 25% 14 18 18 19 19 88 30% 19 25 22 19 23 108 35% 7 10 11 15 11 54 x•• 376 Observations EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 296dorpjr@gwu.edu ONE-WAY ANOVA Example Source of Variation Sum of Squares Degrees of Freedom Mean Square F0 p-value SSTreatments 475.76 4 118.94 14.76 9.13E-06 SSE 161.2 20 8.06 SST 636.96 24 • -value for % % % Reject for all these 's: − Ö" ß & ß "! × Ê Lα α α! Conclusion: At least one of the treatment means differs! • can be done using the least squares approachEstimation of parameters (similar to ). Recall: linear regression analysis \ œ 34 3 34. 7 % . 7s œ \ ß œ \ \ ß 3 œ "ßá ß :ßs•• • ••3 3 . . 7 53 3 3 I I #s sœ œ \ ß œ QW œ WW ÎÐR :Ñs s• EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 299dorpjr@gwu.edu ONE-WAY ANOVA Confidence Intervals MINITAB Box Plot of Treatment Means D at a 35%30%25%20%15% 25 20 15 10 5 Boxplot of 15%, 20%, 25%, 30%, 35% EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 300dorpjr@gwu.edu ONE-WAY ANOVA Confidence Intervals • "!!Ð" Ñα % confidence intervals difference treatment means . .3 5 : \ \ „ > #QW Î83 5 IÎ#ßR:• • α È Comparison of to other treatment means..$ -10.00 -5.00 0.00 5.00 10.00 15.00 1 2 3 4 5 Treatment C on fid en ce In te rv al s Lower Bound 5% Mean Upper Bound 95% Conclusion: We only fail to reject the null-Hypothesis that !!!. .$ #œ EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 301dorpjr@gwu.edu ONE-WAY ANOVA Contrasts • Using the % confidence intervals for differences of treatment"!!Ð" Ñα means of and 5 we tested the hypotheses:. .$ 5 ß 5 œ "ß #ß % L À œ ß L À Á! $ 5 " $ 5. . . . • These hypotheses could be tested by investigating an appropriate linear combination of treatment totals, say: \ \$ 5• • If we had suggested that the average of cotton percentages and did not" $ differ from the average of cotton percentages and , then the hypothesis% & would have been L À œ àL À Á ! " $ % & " " $ % &. . . . . . . . which implies the linear combination: \ \ \ \ œ !" $ % &• • • • • A linear combination of treatments totals such that isG œ - \ - œ ! 3œ" 3œ" : : 3 3 3• called a contrast. EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 304dorpjr@gwu.edu ONE-WAY ANOVA Contrasts • Contrast coefficient must be chosen prior to running the experiment and prior to examining the data. Otherwise bias in Type I error may occur. Tensile Strength Example: L À œ G œ \ \! % & " % &. . • • (Compares the average of Treatment 4 and with that of Treatment 5) L À œ G œ \ \ \ \! " $ % & # " $ % &. . . . • • • • (Compares the average of Treatments 1 and 3 with that of Treatments 4 and 5) L À œ G œ \ \! " $ $ " $. . • • (Compares the average of Treatment 1 and with that of Treatment 3) L À % œ G œ \ %\ \ \ \! # " $ % & " # $ % &. . . . . 4 • • • • • (Compares the average of Treatments 2 with that of Treatments 1, 3, 4 and 5) Notice that the contrast coefficients are orthogonal! EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 305dorpjr@gwu.edu ONE-WAY ANOVA Contrasts Source of Variation Sum of Squares Degrees of Freedom Mean Square F0 p-value C1 291.6 1 291.6 36.18 7.01E-06 C2 31.25 1 31.25 3.88 6.30% C3 152.1 1 152.1 18.87 3.15E-04 C4 0.81 1 0.81 0.10 75.5% SSTreatments 475.76 4 118.94 14.76 9.13E-06 SSE 161.2 20 8.06 SST 636.96 24 Conclusion: • There are differences between the treatment means. • Furthermore, differences are observed between Treatment 4 and Treatment 5 (C1), and differences of Treatment 1 and Treatment 3 (C3). • No difference is observed between the average of 1 and 3 and the average of 4 and 5 (C2). • No difference is observed between treatment 2 and the average of treatments 1, 3, 4 and 5 (C4) EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 306dorpjr@gwu.edu ONE-WAY ANOVA Model Adequacy Checking • It was assumed in the model that the error terms are normal distributed%34 with a mean and a variance ! Þ5# • The normality assumptions of the residuals can be checked via a normal%34 probability plot. • It is important to recognize that we are testing the equality of treatment means by testing for the equality of variances. • The required assumption that allows us to do this is that the variance of the error terms is constant across treatments .%34 3 œ "ß á ß : • The assumption of equality of variance may be visually verified by plotted the residuals of each treatment against one another. • Alternatively, we may also use Bartlett's test, to test for equality of variance across treatment. EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 309dorpjr@gwu.edu ONE-WAY ANOVA Model Adequacy Checking Bartlett's Test for Equality of Variance across treatments L À œ œ á œ ß L À! " # # # # " # : 35 5 5 5Not true for at least one Test Statistic: ; ;# #! :"œ µ ; - ; œ ÐR :Ñ ‚ P8W Ð8 "Ñ ‚ P8W ß R œ 8# #:996/. 3œ" 3œ" : : 3 33 W œ ß - œ " Ð8 "Ñ ÐR :Ñ Ð8 "ÑW ÐR :Ñ $Ð: "Ñ "# " " :996/. 3œ" : 3 # 3 3œ" : 3– — Tensile Strength Example: R œ #&ß : œ &ß W ¸ )Þ!'ß ; ¸ "Þ!$ß - ¸ "Þ"!ß ¸ !Þ*$ß# #:996/. !; : @+6?/ ¸ !Þ*#ÞConclusion: Fail to Reject the null-Hypothesis EMSE 171/271 - FALL 2006 J.R. van Dorp - 11/29/06; ; Page 310dorpjr@gwu.edu ONE-WAY ANOVA Model Adequacy Checking C2 95% Bonferroni Confidence Intervals for StDevs 35 30 25 20 15 1614121086420 Bartlett's Test 0.863 Test Statistic 0.93 P-Value 0.920 Levene's Test Test Statistic 0.32 P-Value Test for Equal Variances for Tensile Strength