Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding ANOVA: A Statistical Approach to Comparing Group Differences in Experiments, Lecture notes of Statistics

ANOVA (Analysis of Variance), a statistical method used to compare group differences in experiments. It covers the concept of ANOVA, its advantages, and how it distinguishes systematic from random variation. The document also includes a detailed explanation of how ANOVA works, including the variance formula, degrees of freedom, mean squares, and F-ratio.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

dreamingofyou
dreamingofyou 🇬🇧

4.5

(14)

15 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding ANOVA: A Statistical Approach to Comparing Group Differences in Experiments and more Lecture notes Statistics in PDF only on Docsity! Research Skills: One-way independent-measures ANOVA, Graham Hole, March 2009: Page 1: One-way Independent-measures Analysis of Variance (ANOVA): What is "Analysis of Variance"? Analysis of Variance, or ANOVA for short, is a whole family of statistical tests that are widely used by psychologists. This handout will (a) explain the advantages of using ANOVA; (b) describe the rationale behind how ANOVA works; (c) explain, step-by-step, how to do a simple ANOVA. Why use ANOVA? ANOVA is most often used when you have an experiment in which there are a number of groups or conditions, and you want to see if there are any statistically significant differences between them. Suppose we were interested in the effects of caffeine on memory. We could look at this experimentally as follows. We could have four different groups of participants, and give each group a different dosage of caffeine. Group A might get no caffeine (and hence act as a control group against which to compare the others); group B might get one milligram of caffeine; group C five milligrams; and group D ten milligrams. We could then give each participant a memory test, and thus get a score for each participant. Here's the data you might obtain: Group A (0 mg) Group B (1 mg) Group C (5 mg) Group D (10 mg) 4 7 11 14 3 9 15 12 5 10 13 10 6 11 11 15 2 8 10 14 mean = 4 mean = 9 mean = 12 mean = 13 How would we analyse these data? Looking at the means, it appears that caffeine has affected memory test scores. It looks as if the more caffeine that's consumed, the higher the memory score (although this trend tails off with higher doses of caffeine). What statistical test could we use to see if our groups truly differed in terms of their performance on our memory test? We could perform lots of independent-measures t-tests, in order to compare each group with every other. So, we could do a t-test to compare group A with group B; another t-test to compare group A with group C; yet another to compare group A with group D; and so on. The problem with this is that you would end up doing a lot of t-tests on the same data. With four groups, you would have to do six-tests to compare each group with every other one: A with B, A with C, A with D; B with C, B with D; C with D. With five groups you would have to do ten tests, and with six groups, fifteen tests! The problem with doing lots of tests on the same data like this, is that you run an increased risk of getting a "significant" result purely by chance: a so-called "Type 1" error. Revision of Type 1 and Type 2 Errors: Remember that every time you do a statistical test, you run the risk of making one of two kinds of error: (a) "Type 1" error: deciding there is a real difference between your experimental conditions when in fact the difference has arisen merely by chance. In statistical jargon, this is known as rejecting the null hypothesis (that there's no difference between your groups) when in fact it is true. (You might also see this referred to as an "alpha" error). (b) "Type 2" error: deciding that the difference between conditions is merely due to chance, when in fact it's a real difference. In the jargon, this is known as accepting the null Research Skills: One-way independent-measures ANOVA, Graham Hole, March 2009: Page 2: hypothesis (that there's non difference between your groups) when in fact it is false. (You might see this referred to as a "beta" error). The chances of making one or other of these errors are always with us, every time we run an experiment. If you try to reduce the risks of making one type of error, you increase the risk of making the other. For example, if you decide to be very cautious, and only accept a difference between groups as "real" when it is a very large difference, you will reduce your risk of making a type 1 error (accepting a difference as real when it's really just due to random fluctuations in performance). However, because you are being so cautious, you will increase your chances of making a type 2 error (dismissing a difference between groups as being due to random variation in performance, when in fact it is a genuine difference). Similarly, if you decide to be incautious, and decide that you will regard even very small differences between groups as being "real" ones, then you will reduce your chances of making a type 2 error (i.e., you won't often discount a real difference as being due to chance), but you will probably make lots of type 1 errors (lots of the differences you accept as "real" will have arisen merely by chance). The conventional significance level of 0.05 represents a generally-accepted trade-off between the chances of making these two kinds of errors. If we do a statistical test, and the results are significant at the 0.05 level, what we are really saying is this: we are prepared to regard the difference between groups that has given rise to this result as being a real difference, even though, roughly five times in a hundred, such a result could arise merely by chance. The 0.05 refers to our chances of making a type 1 error. ANOVA and the Type 1 error: Hopefully, you should be able to see why doing lots of tests on the same data is a bad idea. Every time you do a test, you run the risk of making a type 1 error. The more tests you do on the same data, the more chance you have of obtaining a spuriously "significant" result. If you do a hundred tests, five of them are likely to give you "significant" results that are actually due to chance fluctuations in performance between the groups in your experiment. It's a bit like playing Russian Roulette: pull the trigger once, and you are quite likely to get away with it, but the more times you pull the trigger, the more likely you are to end up blowing your head off! (The results of making a type 1 error in a psychology experiment are a little less messy, admittedly). One of the main advantages of ANOVA is that it enables us to compare lots of groups all at once, without inflating our chances of making a type 1 error. Doing an ANOVA is rather like doing lots of t-tests all at once, but without the statistical disadvantages of doing so. (In fact, ANOVA and the t-test are closely related tests in many ways). Other advantages of ANOVA: (a) ANOVA enables us to test for trends in the data: Looking at the mean scores in our caffeine data, it looks as if there is a trend: the more caffeine consumed, the better the memory performance. We can use ANOVA to see if trends in the data like this are "real" (i.e., unlikely to have arisen by chance). We don't have to confine ourselves to seeing if there is a simple "linear" trend in the data, either: we can test for more complicated trends (such as performance increasing and then decreasing, or performance increasing and then flattening off, etc.) This is second-year stuff, however... (b) ANOVA enables us to look at the effects of more than one independent variable at a time: So far in your statistics education, you have only looked at the effects of one independent variable at a time. Moreover, you have largely been limited to making comparisons between just two levels of one IV. For example, you might use a t-test or a Mann-Whitney test to compare the memory performance of males and females (two levels of a single independent variable, "sex"). The only tests you have covered that enable you to compare more than two groups at a time are the Friedman and Kruskal-Wallis tests, but even these only enable you to deal with one IV at a time. The real power of ANOVA is that it enables you to look at the effects of more than one IV in a single experiment. So, for example, instead of just looking at "the effects on memory of caffeine dosage" (one IV, one DV), you could look at "sex differences in the effects on memory of caffeine Research Skills: One-way independent-measures ANOVA, Graham Hole, March 2009: Page 5: Here is the formula for the variance: In English, this means do the following: Take a set of scores (e.g. one of the groups from the table), and find their mean. Find the difference between each of the scores and the mean. Square each of these differences (because otherwise they will add up to zero). Add up the squared differences. Normally, you would then divide this sum by the number of scores, N, in order to get an average deviation of the scores from the group mean - i.e., the variance. However, in ANOVA, we will want to take into account the number of participants and number of groups we have. Therefore, in practice we will only use the top line of the variance formula (called the "Sum of Squares", or "SS" for short): We will divide this not by the number of scores, but by the appropriate "degrees of freedom" (which is usually the number of groups or participants minus 1). More details on this below. Earlier, I said that the total variation amongst a set of scores consisted of between- groups variation plus within-groups variation. Another way of expressing this is to say that the total sums of squares can be broken down into the between-groups sums of squares, and the within-groups sums of squares. What we have to do is to work these out, and then see how large the between-groups sums of squares is in relation to the within-groups sums of squares, once we've taken the number of participants and number of groups into account by using the appropriate degrees of freedom. Step-by-step example of a One-way Independent-Measures ANOVA: As mentioned earlier, there are lots of different types of ANOVA. The following example will show you how to perform a one-way independent-measures ANOVA. You use this where you have the following: (a) one independent variable (which is why it's called "one-way"); (b) one dependent variable (you get only one score from each participant); (c) each participant participates in only one condition in the experiment (i.e., they are used as a participant only once). A one-way independent-measures ANOVA is equivalent to an independent-measures t- test, except that you have more than two groups of participants. (You can have as many groups of participants as you like in theory: the term "one-way" refers to the fact that you have only one independent variable, and not to the number of levels of that IV). Another way of looking at it is to say that it is a parametric equivalent of the Kruskal-Wallis test. Although some statistics books manage to make hand-calculation of ANOVA look scary, it's actually quite simple. However, since it is so quick and easy to use SPSS to do the work, I'm just going to give you an overview of what SPSS works out and why. ( ) N XX∑ − = 2 iancevar ( )2 Squares of Sum ∑ −= XX Research Skills: One-way independent-measures ANOVA, Graham Hole, March 2009: Page 6: Total SS: This shows us how much variation there is between all of the scores, regardless of which group they belong to. Total degrees of freedom: This is the total number of scores minus 1. In our example, total d.f. = 20 - 1 = 19. Between-Groups SS: This is a measure of how much variation exists between the groups in our experiment. Between-Groups degrees of freedom: This is the number of groups minus 1. We have four groups, so we our between-groups d.f. = 3. Within-Groups SS: This tells us how much variation exists within each of our experimental groups. Within-Groups degrees of freedom: This is obtained by taking the number of scores in group A minus 1, and adding this number to the number of scores in group B minus 1, and so on. Here, we have five scores in each group, and so the within-groups d.f. = 4 + 4 + 4 + 4 = 16 Arithmetic check: Note that the between-groups SS and the within-groups SS add up to the total SS. Essentially, we break down the total SS into its two components (between-groups variation and within-groups variation), so these two combined cannot come to more than the total SS! Similarly, the within-groups d.f. added to the between-groups d.f. must equal the total d.f.. Here, 3 + 16 = 19, so again we are okay. If the numbers don't add up correctly, something is very wrong! Mean Squares: As mentioned earlier, we are going to compare the amount of variation between groups to that existing within groups, but in order to do this, we need to take into account the number of scores on which each sums of squares is based. This is where the degrees of freedom that we have been calculating come into play. We (well, SPSS anyway!) need to work out things called "Mean Squares" ("MS" for short), which are like "average" amounts of variation. Dividing the between-groups SS by the between-groups d.f., produces the "Between Groups Mean Squares". Dividing the within-groups SS by the within-groups d.f. gives us the "Within-Groups Mean Squares". Between-groups MS = Between-groups SS / Between-groups d.f. Within-groups MS = Within-groups SS / Within-groups d.f. F-ratio: Now we need to see how large the between-groups variation is in relation to the within- groups variation. We do this by dividing the between-groups MS by the within-groups MS. The result is called an F-ratio. F = between-groups MS / within-groups MS The ANOVA summary table: The results of an Analysis of Variance are often displayed in the form of a summary table. Here's the table for our current example: Research Skills: One-way independent-measures ANOVA, Graham Hole, March 2009: Page 7: Source: SS d.f MS F Between groups 245.00 3 81.67 25.13 Within groups 51.98 16 3.25 Total 297.00 19 Different statistics packages may display the results in a different way, but most of these principal details will be there somewhere. The really important bit is the following: Assessing the significance of the F-ratio: The bigger the value of the F-ratio, the less likely it is to have arisen merely by chance. How do you decide whether it's "big"? You consult a table of "critical values of F". (There's one on my website). If your value of F is equal to or larger than the value in the table, it is unlikely to have arisen by chance. To find the correct table value against which to compare your obtained F-ratio, you use the between-groups and within-groups d.f.. In the present example, we need to look up the critical F-value for 3 and 16 d.f. Here is an extract from a table of critical F-values, for a significance level of 0.05: 1 2 3 4 1 161.4 199.5 215.7 224.6 2 18.51 19.00 19.16 19.25 3 10.13 9.55 9.28 9.12 4 7.71 6.94 6.59 6.39 5 6.61 5.79 5.41 5.19 6 5.99 5.14 4.76 4.53 7 5.59 4.74 4.35 4.12 8 5.32 4.46 4.07 3.84 9 5.12 4.26 3.86 3.63 10 4.96 4.10 3.71 3.48 11 4.84 3.98 3.59 3.36 12 4.75 3.89 3.49 3.26 13 4.67 3.81 3.41 3.18 14 4.60 3.74 3.34 3.11 15 4.54 3.68 3.29 3.06 16 4.49 3.63 3.24 3.01 17 4.45 3.20 3.20 2.96 Treat the between-groups d.f. and the within-groups d.f. as coordinates: we have 3 between-groups d.f. and 16 within-groups d.f., so we go along 3 columns in the table, and down 16 rows. At the intersection of these coordinates is the critical value of F that we seek: with our particular combination of d.f, values of F as large as this one or larger, are likely to occur by chance with a probability of 0.05 - i.e., less than 5 times in a 100. Therefore, if our obtained value of F is equal to or larger than this critical value of F in the table, our obtained value must have a similarly low probability of having occurred by chance. In short, we compare our obtained F to the appropriate critical value of F obtained from the table: if our obtained value is equal to or larger than the critical value, there is a significant difference between the conditions in our experiment. On the other hand, if our obtained value is smaller than the critical value, then there is no significant difference between the conditions in our experiment.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved