Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Notes on One-way Analysis of Variance (ANOVA) | STAT 30100, Study notes of Data Analysis & Statistical Methods

Material Type: Notes; Professor: Sorola; Class: Elementary Statistical Methods; Subject: STAT-Statistics; University: Purdue University - Main Campus; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-l7u
koofers-user-l7u 🇺🇸

5

(1)

10 documents

1 / 14

Toggle sidebar

Related documents


Partial preview of the text

Download Notes on One-way Analysis of Variance (ANOVA) | STAT 30100 and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! 1 CHAPTER 12 One-Way Analysis of Variance (ANOVA) One-way analysis of variance is used when you want to compare more than two means. It is a technique that generalizes the two-sample t procedure which compares two means. Like the two-sample t test, it is robust and useful. Examples: 1. The presence of harmful insects in farm fields is detected by erecting boards covered with a sticky material and then examining the insects trapped on the boards. To investigate which colors are most attractive to cereal leaf beetles, researchers placed six boards of each of four colors in a field of oats in July. 2. An ecologist is interested in comparing the concentration of the pollutant cadmium in five streams. She collects 50 water specimens for each stream and measures the concentration of cadmium in each specimen. Note: The first example is an experiment with four treatments (the colors) and the second example is an observational study where the concentration of cadmium is compared between the five streams. In both cases we can use ANOVA to compare the mean responses. We will use the F statistic to compare the variation among the means of several groups with the variation within the groups. In the ANOVA test, an SRS from each population is drawn and the data is used to test the null hypothesis that the populations are all equal against the alternative that not all are equal. If we reject the null, we need to perform some further analysis to draw conclusions about which population means differ. 2 Assumptions of the ANOVA: 1. The data is normally distributed. 2. The population standard deviations are equal. The One-Way ANOVA Model: The one-way ANOVA model is xij i ijµ ε= + for i = 1,…..,I and j = 1,….., i n . The ij ε are assumed to be from an (0, )N σ distribution. The parameters of the model are the population means 1 2 , ,......, I µ µ µ and the common standard deviation . Note: I = the number of groups. N = the total sample size. i n = the sample size for group i. Example 1: The strength of concrete depends upon the formula used to prepare it. One study compared five different mixtures. Six batches of each mixture were prepared, and the strength of the concrete made from each batch was measured. a) What is the response variable? b) Give the values for I, the i n , and N. 5 Testing Hypotheses in One-Way ANOVA: • State null and alternative hypotheses 0 1 2: .... IH µ µ µ= = = :aH not all the Iµ ’s are equal (at least one is different) • Find the test statistic MSG F MSE = The F statistic has the ( 1, 1)F I N− − distribution. • Find the P-value on the printout • Compare the P-value to the α level If P-value α≤ , then reject 0H If P-value α≥ , fail to then reject 0H • State your conclusions in terms of the problem The ANOVA output (see pg 736 for more detail) Source Sum of Squares Degrees of Freedom Mean Square F Sig. Groups (Between Groups) SSG DFG=I-1 SSG MSG DFG = MSG MSE P-value Error (Within Groups) SSE DFE=N-I 2 P SSE MSE S DFE = = Total SST DFT=N-1 SST MST DFT = Note: N is the total number of observation (the sum of all the in ). The coefficient of determination: 2R SSG SST = ( 2R is the percent of variation in the model that is accounted for by the FIT part of the model.) 6 For example 1: Answer the following questions: c) What are the degrees of freedom for the model, for error and for the total? d) State the null and alternative hypotheses. e) Give the numerator and denominator degrees of freedom for the F statistic. Example 2: (From Moore and McCabe 4th edition) The presence of harmful insects in farm fields is detected by erecting boards covered with a sticky material and then examining the insects trapped on the boards. To investigate which colors are most attractive to cereal leaf beetles, researchers placed six boards of each of four colors in a field of oats in July. The table below gives data on the number of cereal leaf beetles trapped: Color Insects trapped Lemon yellow 45 59 48 46 38 47 White 21 12 14 17 13 17 Green 37 32 15 25 39 41 Blue 16 11 20 21 14 7 Write hypotheses: 7 Using SPSS: Enter the data into SPSS in vertical columns with the following labels: color, number_trapped, and treatment. All the densities are listed in one long column. “color” is where you list “lemon yellow,” “white,” “green,” and “blue.” Treatment is a numerical way of describing your group. Make “lemon yellow” be 1, “white” be 2, “green” be 3 and “blue” be 4. For some reason ANOVA needs a numerical column for the “factor” box. 1. Identify the response variable, in , N and I for this study. 2. Make a table giving the mean and standard deviation for each color group. Make a graph of the means. Is it reasonable to pool the variances? Using SPSS: Analyze > Compare Means > Means. Move “number_trapped” into “Dependent List” box. Move “treatment” into “Iindependent Llist” box. Click the “Options” box to get your summary statistics. Click “OK.” Note: lemon yellow = 1, white = 2, green = 3 and blue = 4 Report number_trapped 47.17 6.795 38 59 46.50 6 15.67 3.327 12 21 15.50 6 31.50 9.915 15 41 34.50 6 14.83 5.345 7 21 15.00 6 27.29 14.948 7 59 21.00 24 treatment 1 2 3 4 Total Mean Std. Deviation Minimum Maximum Median N 10 Example 3: (From Moore and McCabe 4th Edition) Recommendations regarding how long infants in developing countries should be breast-fed are controversial. If the nutritional quality of the breast milk is inadequate because the mothers are malnourished, then there is risk of inadequate nutrition for the infant. On the other hand, the introduction of other foods carries the risk of infection from contamination. Further complicating the situation is the fact that companies that produce infant formulas and other foods benefit when these foods are consumed by large numbers of customers. One question related to this controversy concerns the amount of energy intake for infants who have other foods introduced in to the diet at different ages. Part of one study compared the energy intakes, measured in kilocalories per day (kcal/d), for infants who were breast-fed exclusively for 4, 5, or 6 months. The data are below. Breast-fed for: 4 months 5 months 6 months 499 490 585 620 395 647 469 402 477 485 177 445 660 475 485 Energy Intake 588 617 703 (kcal/d) 675 616 528 517 587 465 649 528 209 518 404 370 738 431 628 518 609 639 617 368 704 538 558 519 653 506 548 11 1. Identify the response variable, ni, N, and I for this study. 2. Make a table giving the sample size, mean, and standard deviation for each group of infants. Is it reasonable to pool the variances? 3. Show side-by-side boxplots for the 3 groups. Report Energy 570.00 19 122.958 609.00 209 738 483.00 18 112.948 512.00 177 639 541.88 8 93.963 506.50 445 703 530.20 45 118.906 528.00 177 738 Time BF4 BF5 BF6 Total Mean N Std. Deviation Median Minimum Maximum BF4 BF5 BF6 Time 100 200 300 400 500 600 700 800 E n er g y 23 10 12 4. Run the analysis of variance. Report the F statistic and P- value. Write the hypotheses for your test. What do you conclude? 5. What is the estimate for population standard deviation? 6. What is 2R ? Multiple Comparisons: Multiple comparisons are used when: • The means differ. (The ANOVA’s 0H is rejected). • When we are unable to formulate specific questions in advance of the analysis. ANOVA Energy 71288.325 2 35644.163 2.718 .078 550810.9 42 13114.545 622099.2 44 Between Groups Within Groups Total Sum of Squares df Mean Square F Sig.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved