Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

ANOVA in STAT-3610: Understanding One-Way Analysis of Normal Populations, Study Guides, Projects, Research of Design

A part of the STAT-3610 course notes by Carpenter at Auburn University. It introduces the concept of One-Way Analysis of Variance (ANOVA) for examining differences between normal populations with possibly different means but equal variances. an example problem, data sources, and links to related notes and figures. It explains the balanced design single-factor ANOVA, the data structure for ANOVA, and the decomposition of sum of squares.

Typology: Study Guides, Projects, Research

2021/2022

Uploaded on 09/12/2022

andreasge
andreasge 🇬🇧

4.2

(12)

12 documents

1 / 18

Toggle sidebar

Related documents


Partial preview of the text

Download ANOVA in STAT-3610: Understanding One-Way Analysis of Normal Populations and more Study Guides, Projects, Research Design in PDF only on Docsity! MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter Review Source: Chapter 10 - Analysis of Variance (ANOVA). Example Data Source: Example problem 10.1 (dataset: exp10-1.mtw) Link to Data: http://www.auburn.edu/~carpedm/courses/stat3610/textbookdata/MINITAB/ CH10/ Link to Notes: http://www.auburn.edu/~carpedm/courses/stat3610/CourseNotesPowerPoint/DevStat8e_10_ 01.ppt http://www.auburn.edu/~carpedm/courses/stat3610/CourseNotesPowerPoint/DevStat8e_ 10_02.ppt http://www.auburn.edu/~carpedm/courses/stat3610/CourseNotesPowerPoint/DevStat8e_ 10_03.ppt 1 Introduction to One-way ANOVA Suppose we wished to examine the differences between I different normal populations with possibly different means µ1, µ2, . . . , µI , but all the variances are equal to σ2 (a generalization of the two- sample t-test with equal population variances in Chapter 9). In One-Way Analysis of Variance (ANOVA), we begin with the following null hypothesis, H0 : µ1 = µ2 = · · · = µI with the alternative hypothesis Ha : µl 6= µm, for some l 6= m, so, if the alternative is true, we say at least two means are different. Figure 1 is a plot of three (I = 3) normal distributions all with variance equal to one (σ2 = 1, but means 100, 110, and 120. Figure 1: Plot of three different normal densities with means 100, 110 and 120, with common variance equal to 1. This is an illustration of the null hypothesis, H0 : µ1 = µ2 = µ3, being false. To develop a test, we would, draw random samples from each population in question, then use this data to draw inferences about the true state of nature in the underlying distributions. Two such Balanced Analysis of Variance (ANOVA) Page 1 MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter designed experiments are referred to as balanced and unbalanced single-factor designs associated with ANOVA. Balanced Design Single Factor ANOVA: In a balanced design, we would draw independent random samples of the same size, say J , from each of the I populations. If the sample sizes were not all equal, the design would be said to be unbalanced (note: there is nothing inherently wrong with an unbalanced design). Table 1 (a) illustrates a balanced design with I treatments/groups and J measurements/observations and Table 1 (b) presents an unbalanced design. Table 1 (a): Illustration of a balanced single-factor/one-way design. X̄i. = ∑J j=1Xij/J, i = 1, 2, . . . , I are the individual sample means for each treatment/group and S2 1 , S 2 2 , . . . , S 2 I are the individual sample variances from each treatment/group. Group or treatment Random sample sample size Mean Var Assumed Distribution 1 X11, X12, . . . , X1J J X̄1. S2 1 N(µ1, σ 2) 2 X21, X22, . . . , X2J J X̄2. S2 2 N(µ2, σ 2) ... ... ... I XI1, XI2, . . . , XIJ J X̄I. S2 I N(µI , σ 2) I × J X̄.. where X̄.. = (X̄1. + X̄2. + · · ·+ X̄I.) I = (∑J j=1X1j + ∑J j=1X2j + · · ·+ ∑J j=1X1j ) I × J = ∑I i=1 ∑J j=1Xij I × J is the grand mean. Table 1 (b): Illustration of a unbalanced single-factor/one-way design. X̄i. = ∑Ji j=1Xij/Ji, i = 1, 2, . . . , I are the individual sample means for each treatment/group and S2 1 , S 2 2 , . . . , S 2 I are the individual sample variances from each treatment/group. Group or treatment Random sample sample size Mean Var Assumed Distribution 1 X11, X12, . . . , X1J1 J1 X̄1. S2 1 N(µ1, σ 2) 2 X21, X22, . . . , X2J2 J2 X̄2. S2 2 N(µ2, σ 2) ... ... ... I XI1, XI2, . . . , XIJI JI X̄I. S2 I N(µI , σ 2) J1 + J2 + · · ·+ JI X̄.. where X̄.. = (X̄1. + X̄2. + · · ·+ X̄I.) I = (∑J1 j=1X1j + ∑J2 j=1X2j + · · ·+ ∑JI j=1X1j ) J1 + J2 + · · ·+ JI = ∑I i=1 ∑Ji j=1Xij J1 + J2 + · · ·+ JI Balanced Analysis of Variance (ANOVA) Page 2 MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter Figures 3 (a), (b) and (c): Balanced Analysis of Variance (ANOVA) Page 5 MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter Analysis of means (ANOM) From MINITAB description, ANOM is a graphical analog to ANOVA that tests the equality of population means. The graph displays each factor level mean, the overall mean, and the decision limits. If a point falls outside the decision limits, then evidence exists that the factor level mean represented by that point is significantly different from the overall mean. Figure 4 contains the ANOM for Example 1. Since the second box type mean is above the decision limits and fourth box type mean is below decision limits, this suggest the second is significantly above the the rest and the fourth is significantly below. The first and third means are not significantly different from each other. We will show later that these conclusions will be confirmed by ANOVA and multiple comparisons using Tukey’s method. Figure 4: Analysis of Means (ANOM) from Example 1 data. Balanced Analysis of Variance (ANOVA) Page 6 MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter 1.1 Sum of Squares (balance design) Total Sum of Squares: Total Sum of Squares (SST) would be the numerator of the sample variance if you were to compute the sample variance of all n = I × J observations without regard to group/treatment, SST = I∑ i=1 J∑ j=1 (Xij − X̄..) 2 This is why SST is referred to as the total amount of variability in the response/measurement variable. The degrees of freedom associated with SST are I × J − 1. Error Sum of Squares: Error Sum of Squares (SSE) is the numerator for pooled sample vari- ances, SSE = I∑ i=1 J∑ j=1 (Xij − X̄i.) 2 SSE is considered the within treatment/group variability. Note, we can also express SSE as SSE = (J − 1)S2 1 + (J − 1)S2 2 + +(J − 1)S2 I . SSE is the unexplained variability to uncertainly or random variability. The degrees of freedom associated with SSE are I × (J − 1. Treatment Sum of Squares: Treatment Sum of Squares (SSTr) represents the variability between treatment/groups, SSTr = I∑ i=1 J∑ j=1 (X̄i. − X̄..) 2 If all the population means were equal, SSTr would tend to be small. The bigger the differences between the means, the larger SSTr would tend to be. SSTr is the amount of variability explained by differences between group/treatments. The degrees of freedom associated with SSTr are I − 1. Decomposition of Sum of Squares: It can be shown that the total variability (SST) can be decomposed into the sum of SSTr and SSE. Also, the degrees of freedom can be decomposed additively. That is, SST = SSTr + SSE and df(total) = df(Error) + df(Treatments). So, the overall variability in the response/measurement variable is sum of the between group/treatment variability and the the within group (random variability). In other words, it is the sum of explained variability and unexplained variability. Coefficient of Determination (R2): The coefficient of determination, denoted as R2, is the proportion of the total variability (SST) which is explained by the between treatment/group (SSTr) variability. Since SST = SSTr + SSE, the coefficient of determination is defined to be R2 = SSTr SST = 1− SSE SST . Balanced Analysis of Variance (ANOVA) Page 7 MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter Calculator or algebraic simplification for Sum of Squares: SST = I∑ i=1 Ji∑ j=1 (xij − x̄..)2 = I∑ i−1 Ji∑ j=1 x2ij − 1 n x2.. SSTr = I∑ i=1 Ji∑ j=1 (x̄i. − x̄..)2 = 1 Ji I∑ i=1 x2i. − 1 n x2.. SSE = I∑ i=1 Ji∑ j=1 (xij − x̄i.)2, where xi. = Ji∑ j=1 xij and x.. = I∑ i=1 xi. and n = J1+J2+· · ·+JI . Balanced Analysis of Variance (ANOVA) Page 10 MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter 1.3 Mean Square Error Mean Square Error (balanced design): The Mean-Squared-Error (MSE) is MSE = S2 1 + S2 2 + · · ·+ S2 I I = ∑I i=1 ∑J j=1(Xij − X̄i.) 2 I(J − 1) = SSE I(J − 1) Notice that MSE is an unbiased estimator of σ2, since ES2 i = σ2, i = 1, 2, . . . , I. Mean Square Error (unbalanced design): The Mean-Squared-Error (MSE) is MSE = (J1 − 1)S2 1 + (J2 − 1)S2 2 + · · ·+ (JI − 1)S2 I n− I = ∑I i=1 ∑Ji j=1(Xij − X̄i.) 2 n− I = SSE I(J − 1) Notice that MSE is an unbiased estimator of σ2, since ES2 i = σ2, i = 1, 2, . . . , I. Mean Square for Treatments (both balanced and unbalanced design): The Mean square for treatments (MSTr) is MSTr = SSTr i− 1 . Note: if the null hypothesis is true, µ1 = µ2 = · · · = µI , the MSTr is also an unbiased estimator of σ2. However, if then null hypothesis were false then E(MSTr) > E(MSE) = σ2. F Ratio: If all the normality assumptions hold and the null hypothesis is true, then the ratio of the mean squares is distributed as a F distribution with numerator degrees of freedom equal to (I-1) and denominator degrees of freedom I(J-1), F = MSTr MSE ∼ FI−1,I(J−1) ANOVA Table (balanced one-factor design) Source of Sum of Variation df squares Mean Square f Treatments I-1 SStr MSTr=SSTr/(I-1) MSTr/MSE Error I(J-1) SSE MSE=SSE/[I(J-1)] Total IJ-1 SST ANOVA Table (unbalanced one-factor design) Source of Sum of Variation df squares Mean Square f Treatments I-1 SStr MSTr=SSTr/(I-1) MSTr/MSE Error n-I SSE MSE=SSE/[I(J-1)] Total n-1 SST Balanced Analysis of Variance (ANOVA) Page 11 MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter where n = J1 + J2 + · · ·+ JI . Example 1 (continued): Let µ1, µ2, µ3, and µ4 represent the true mean compression strength for each of box types 1, 2, 3 and 4, respectively. Assuming the populations are normally distributed with common variance, σ2, use the data provided to test the null hypothesis that all populations means are equal. That is, the null hypothesis is H0 : µ1 = µ2 = · · · = µI with the alternative hypothesis Ha : µl 6= µm, for some l 6= m. The completed ANOVA table (produced by MINITAB) is given below. The instructions for MINITAB are given on the next page in Figure 5. ANOVA: strength versus box-type Factor Type Levels Values box-type fixed 4 1, 2, 3, 4 Analysis of Variance for strength Source DF SS MS F P box-type 3 127351 42450 25.10 0.000 Error 20 33831 1692 Total 23 161181 S = 41.1282 R-Sq = 79.01% R-Sq(adj) = 75.86% Coefficient-of-variation (R2): The coefficient of variation is, R2 = 79.01%. This means that 79.01% of the total variability in compression strength is explained by the mean differences between box-types. p-value and the test: The p-value associated with the “global” hypothesis that all the population means are equal (null hypothsis) is near zero, which would lead us reject the null hypothesis for any reasonable significance level α. Therefore, we reject the null hypothesis and conclude that the sample means are significantly different between the box types. p-value interpretation: If the null hypothesis were true (all the population mean strengths were equal), there is a near zero chance of observing sample mean differences as large or larger then we did in this experiment/sample. So, since we concluded that the sample means strengths were statistically signficant between the box-types and 79.01% of the variability is explained by the differences in strengths between the box-types, we have a great deal of evidence that at least two of the box-types are different, on average. We would like to investigate this further using multiple comparisons. Balanced Analysis of Variance (ANOVA) Page 12 MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter Since the means for 2, 1 and 3, all share the same letter, the means aren’t considered significantly different. However, box-type 4 has a different letter than the other three, so we say the sample mean for box-type 4 is significantly different than the other three. This confirms our visual inspection (graphical analysis). MINTAB also supplied the output for Turkey simultaneous Tests for Differences between the means, upon which the above summery was based. I provided this output below: Tukey Simultaneous Tests for Differences of Means Difference Difference SE of Adjusted of Levels of Means Difference 95% CI T-Value P-Value 2 - 1 43.9 23.7 ( -22.6, 110.4) 1.85 0.280 3 - 1 -14.9 23.7 ( -81.4, 51.6) -0.63 0.922 4 - 1 -151.0 23.7 (-217.5, -84.5) -6.36 0.000 3 - 2 -58.9 23.7 (-125.4, 7.6) -2.48 0.094 4 - 2 -194.9 23.7 (-261.4, -128.4) -8.21 0.000 4 - 3 -136.0 23.7 (-202.5, -69.5) -5.73 0.000 Individual confidence level = 98.89% Figure 6: In MINTAB, follow all the steps to produce the ANOVA (as demonstrated previously), but click on the “Comparisons” button and fill in the dialogue box as indicated below, Balanced Analysis of Variance (ANOVA) Page 15 MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter Figure 7 (a): Figure 7(b): Balanced Analysis of Variance (ANOVA) Page 16 MINITAB Tutorial 2.1. April 21, 2015 STAT-3610, Carpenter 1.5 Checking the Assumptions In class, I went through an example where we checked the assumptions. Recall, for the F-test in an ANOVA to be valid, we assumption that the underlying populations are normally distributed with equal variances (common variance assumption). To examine the normality common variance assumptions, we produce a histogram, normal probability plot and residual plot all based on the residuals. To get the plots in Figure 8, below, you fill in the “Graphs” dialogue box as indicated in Figure 9, on the next page. Figure 8: Residual plots based on ANOVA for Example 1 Based on the normal probability plot (upper left corner), the data do not indicate any significant deviations from normality. The residual plot (upper right hand corner), we would expect random scatter about zero and the spread of the data points should be constant. Based on this graph it looks like the constant variance assumption holds. Balanced Analysis of Variance (ANOVA) Page 17
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved