Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Significance Tests in Biostatistics: Sign Test & Hypothesis Testing, Study notes of Mathematical Methods

The concept of significance tests in the context of applied biostatistics, using the sign test as an example. It discusses the role of hypothesis testing, the null hypothesis, and the alternative hypothesis, as well as the importance of determining the probability of observing data as extreme as those obtained if the null hypothesis were true. The document also covers the concept of errors in significance tests and provides a brief overview of the principles of significance tests.

Typology: Study notes

2010/2011

Uploaded on 09/10/2011

myohmy
myohmy 🇬🇧

4.8

(10)

82 documents

1 / 8

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding Significance Tests in Biostatistics: Sign Test & Hypothesis Testing and more Study notes Mathematical Methods in PDF only on Docsity! 1 Health Sciences M.Sc. Programme Applied Biostatistics Week 4: Significance tests Testing a hypothesis A significance test enables us to measure the strength of evidence which the data supply for or against some proposition of interest. For example, Table 1 shows the results of a crossover trial of pronethalol for the treatment of angina, the number of attacks over four weeks on each treatment. These 12 patients are a sample from the population of all patients. Would the other members of this population experience fewer attacks while using pronethalol? We can see that the number of attacks is highly variable from one patient to another, and it is quite possible that this is true from one occasion to another as well. So it could be that some patients would have fewer attacks while on pronethalol than while on placebo quite by chance. In a significance test, we ask whether the difference observed was small enough to have occurred by chance if there were really no difference in the population. If it were so, then the evidence in favour of there being a difference between the treatment periods would be weak. On the other hand, if the difference were much larger than we would expect due to chance if there were no real population difference, then the evidence in favour of a real difference would be strong. To carry out the test of significance we suppose that, in the population, there is no difference between the two treatment periods. The hypothesis of ‘no difference’ or ‘no effect’ in the population is called the null hypothesis. We compare this with the alternative hypothesis of a difference between the treatments, in either direction. We do this by finding the probability of getting data as extreme as those observed if the null hypothesis were true. If this probability is large the data are consistent with the null hypothesis; if it is small the data are unlikely to have arisen if the null hypothesis were true and the evidence is in favour of the alternative hypothesis. An example: the sign test We shall now find a way of testing this null hypothesis, using a method called the sign test. An obvious start is to consider the differences between the number of attacks on the two treatments for each patient, as in Table 1. If the null hypothesis were true, then differences in number of attacks would be just as likely to be positive as negative, they would be random. If we kept on testing patients indefinitely, the proportion of changes which were negative would be equal to the proportion which were positive. Another way of saying this is that the probability of a change being negative would be equal to the probability of it becoming positive. These would both be 0.5. Then the number of negatives would behave in exactly the same way as the number of heads if we toss a coin 12 times. This is quite easy to investigate mathematically. We can work out the probability that 12 tosses of a coin would show any given number of heads. This is also the proportion of occasions on which that 12 tosses of a coin would show the given number of heads. These probabilities are shown in Table 2. We call this the Binomial distribution with n = 12 and p = 0.05. 2 Table 1. Trial of pronethalol for the prevention of angina pectoris (data of Pritchard et al., 1963) Number of attacks while on: Patient number placebo pronethalol Difference, placebo minus pronethalol Sign of difference 1 71 29 42 + 2 323 348 –25 – 3 8 1 7 + 4 14 7 7 + 5 23 16 7 + 6 34 25 9 + 7 79 65 14 + 8 60 41 19 + 9 2 0 2 + 10 3 0 3 + 11 17 15 2 + 12 7 2 5 + Table 2. Probability distribution for the number of heads out 12 flips of a coin, Binomial distribution with n = 12 and p = 0.5 Heads Probability 0 0.00024 1 0.00293 2 0.01611 3 0.05371 4 0.12085 5 0.19336 6 0.22559 7 0.19336 8 0.12085 9 0.05371 10 0.01611 11 0.00293 12 0.00024 We can show these probabilities graphically, as in Figure 1. This shows each probability as a vertical line. It is done this way because only the integer values have any probability. If there were any subjects who had the same number of attacks on both regimes we would omit them, as they provide no information about the direction of any difference between the treatments. In this test, the number of subjects, n, is the number of subjects for whom there is a difference, one way or the other. Those for whom the difference is zero contribute no information. If they were coins which fell on their edge, we would flip them again. In the clinical trial all we can do is exclude them. 5 statistically significant. If the data do not support the null hypothesis, it is sometimes said that we reject the null hypothesis, and if the data are consistent with the null hypothesis it is said that we accept it. Such an ‘all or nothing’ decision making approach is seldom appropriate in medical research. It is preferable to think of the significance test probability as an index of the strength of evidence against the null hypothesis. The probability of such an extreme value of the test statistic occurring if the null hypothesis were true is often called the P value. It is not the probability that the null hypothesis is true. This is a common misconception. The null hypothesis is either true or it is not; it is not random and has no probability. Significance levels and types of error We must still consider the question of how small is small. A probability of 0.006, as in the example above, is clearly small and we have a quite unlikely event. But what about 0.06, or 0.1? Suppose we take a probability of 0.01 or less as constituting reasonable evidence against the null hypothesis. If the null hypothesis is true, we shall make a wrong decision one in a hundred times. Deciding against a true null hypothesis is called an error of the first kind, type I error, or alpha error. We get an error of the second kind, type II error, or beta error if we decide in favour of a null hypothesis which is in fact false. These errors are set out in Table 3. Now the smaller we demand the probability be before we decide against the null hypothesis, the larger the observed difference must be, and so the more likely we are to miss real differences. By reducing the risk of an error of the first kind we increase the risk of an error of the second kind. The conventional compromise is to say that differences are significant if the probability is less than 0.05. This is a reasonable guideline, but should not be taken as some kind of absolute demarcation. There is not a great difference between probabilities of 0.06 and 0.04, and they surely indicate similar strength of evidence. It is better to regard probabilities around 0.05 as providing some evidence against the null hypothesis, which increases in strength as the probability falls. If we decide that the difference is significant, the probability is sometimes referred to as the significance level. As a rough and ready guide, we can think of P values as indicating the strength of evidence like this:P value Evidence for a difference or relationship Greater than 0.1: Little or no evidence Between 0.05 and 0.1: Weak evidence Between 0.01 and 0.05: Evidence Less than 0.01: Strong evidence Less than 0.001: Very strong evidence Significant, real and important If a difference is statistically significant, then may well be real, but not necessarily important. For example, we may look at the effect of a drug, given for some other purpose, on blood pressure. Suppose we find that the drug raises blood pressure by an average of 1 mm Hg, and that this is significant. A rise in blood pressure of 1 mm Hg is not clinically significant, so, although it may be there, it does not matter. It is (statistically) significant, and real, but not important. 6 On the other hand, if a difference is not statistically significant, it could still be real. We may simply have too small a sample to show that a difference exists. Furthermore, the difference may still be important. ‘Not significant’ does not imply that there is no effect. It means that we have failed to demonstrate the existence of one. Presenting P values Computers print out the exact P values for most test statistics. These should be given, rather than change them to ‘not significant’, ‘NS’ or P>0.05. Similarly, if we have P=0.0072, we are wasting information if we report this as P<0.01. This method of presentation arises from the pre-computer era, when calculations were done by hand and P values had to be found from tables. Personally, I would quote P=0.0072 to one significant figure, as P=0.007, as figures after the first do not add much, but the first figure can be quite informative. Sometimes the computer prints 0.0000. This may be correct, in that the probability is less than 0.00005 and so equal to 0.0000 to four decimal places. The probability can never be exactly zero, so we usually quote this as P<0.0001. Whatever we do, we should never quote it as P<0.000, as I have seen. This is impossible. Multiple significance tests If we test a null hypothesis which is in fact true, using 0.05 as the critical significance level, we have a probability of 0.95 of getting a ‘not significant’ (i.e. correct) decision. If we test two independent true null hypotheses, the probability that neither test will be significant is 0.95 × 0.95 = 0.90. If we test twenty such hypotheses the probability that none will be significant is 0.95× 0.95 × 0.95 . . . × 0.95 = 0.36. This gives a probability of 1 – 0.36 = 0.64 of getting at least one significant result; we are more likely to get one than not. We expect to get one spurious significant result. Many medical research studies are published with large numbers of significance tests. These are not usually independent, being carried out on the same set of subjects, so the above calculations do not apply exactly. However, it is clear that if we go on testing long enough we will find something which is ‘significant’. We must beware of attaching too much importance to a lone significant result among a mass of non- significant ones. It may be the one in twenty which we should get by chance alone. This is particularly important when we find that a clinical trial or epidemiological study gives no significant difference overall, but does so in a particular subset of subjects, such as women aged over 60. If there is no difference between the treatments overall, significant differences in subsets are to be treated with the utmost suspicion. 7 In some studies, we avoid the problems of multiple testing by specifying a primary outcome variable in advance. We state before we look at the data, and preferably before we collect them, that one particular variable is the primary outcome. If we get a significant effect for this variable, we have good evidence of an effect. If we do not get a significant effect for this variable, we do not have good evidence of an effect, whatever happens with other variables. Other significant effects are only an indication that another study may be justified. J. M. Bland 15 August 2006
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved