Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Confidence Intervals, Null Hypothesis - Study Guide | STAT 541, Exams of Biostatistics

Material Type: Exam; Class: Introduction to Biostatistics; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Spring 2008;

Typology: Exams

Pre 2010

Uploaded on 09/02/2009

koofers-user-lms-1
koofers-user-lms-1 🇺🇸

10 documents

1 / 15

Toggle sidebar

Related documents


Partial preview of the text

Download Confidence Intervals, Null Hypothesis - Study Guide | STAT 541 and more Exams Biostatistics in PDF only on Docsity! Ismor Fischer, 8/17/2008 Stat 541 / 6-66 6.4 Problems 6-1. Suppose that a random sample of n = 51 children is selected from the population of newborn infants in Mexico. The probability that a child in this population weighs at most 2500 grams is presumed to be π = 0.15. Calculate the probability that six or fewer of the infants weigh at most 2500 grams, using… (a) the exact binomial distribution, (b) the normal approximation to the binomial distribution (with continuity correction). [[Technically, an assumption under which this method may be used is violated here. Why?]] (c) Suppose we wish to test the null hypothesis H0: π = 0.15 versus the alternative HA: π ≠ 0.15, and that in this random sample of n = 51 children, we find six whose weights are under 2500 grams. Calculate the associated p-value (with continuity correction) and 95% confidence interval, and use each to arrive at a decision whether or not to reject H0 at the α = .05 significance level. 6-2. A new “smart pill” is tested on n = 36 individuals randomly sampled from a certain population whose IQ scores are known to be normally distributed, with mean μ = 100 and standard deviation σ = 27. After treatment, the sample mean IQ score is calculated to be x = 109.9, and a two-sided test of the null hypothesis H0: μ = 100 versus the alternative hypothesis HA: μ ≠ 100 is performed, to see if there is any statistically significant difference from the mean IQ score of the original population. Using this information, answer the following. (a) Calculate the p-value of the sample. (b) Fill in the following table, concluding with the decision either to reject or not reject the null hypothesis H0 at the given significance level α. Significance Level α Confidence Level 1 − α Confidence Interval Decision about H0 .10 .05 .01 (c) Extend these observations to more general circumstances. Namely, as the significance level decreases, what happens to the ability to reject a null hypothesis? Explain why this is so, in terms of the p-value and generated confidence intervals. Ismor Fischer, 8/17/2008 Stat 541 / 6-67 6-3. Consider the distribution of serum cholesterol levels for all 20- to 74-year-old males living in the United States. The mean of this population is 211 mg/dL, and the standard deviation is 46.0 mg/dL. In a study of a subpopulation of such males who smoke and are hypertensive, it is assumed (not unreasonably) that the distribution of serum cholesterol levels is normally distributed, with unknown mean μ, but with the same standard deviation σ as the original population. (a) Formulate the null hypothesis and complementary alternative hypothesis, for testing whether the unknown mean serum cholesterol level μ of the subpopulation of hypertensive male smokers is equal to the known mean serum cholesterol level of 211 mg/dL of the general population of 20- to 74-year-old males. (b) In the study, a random sample of size n = 12 hypertensive smokers was selected, and found to have a sample mean cholesterol level of x = 217 mg/dL. Construct a 95% confidence interval for the true mean cholesterol level of this subpopulation. (c) Calculate the p-value of this sample, at the α = .05 significance level. (d) Based on your answers in parts (b) and (c), is the null hypothesis rejected in favor of the alternative hypothesis, at the α = .05 significance level? Interpret your conclusion: What exactly has been demonstrated, based on the empirical evidence? (e) Determine the 95% acceptance region and complementary rejection region for the null hypothesis. Is this consistent with your findings in part (d)? Why? 6-4. Consider a random sample of ten children selected from a population of infants receiving antacids that contain aluminum, in order to treat peptic or digestive disorders. The distribution of plasma aluminum levels is known to be approximately normal; however its mean μ and standard deviation σ are not known. The mean aluminum level for the sample of n = 10 infants is found to be x = 37.20 μg/l and the sample standard deviation is s = 7.13 μg/l. Furthermore, the mean plasma aluminum level for the population of infants not receiving antacids is known to be only 4.13 μg/l. (a) Formulate the null hypothesis and complementary alternative hypothesis, for a two-sided test of whether the mean plasma aluminum level of the population of infants receiving antacids is equal to the mean plasma aluminum level of the population of infants not receiving antacids. (b) Construct a 95% confidence interval for the true mean plasma aluminum level of the population of infants receiving antacids. (c) Calculate the p-value of this sample (as best as possible), at the α = .05 significance level. (d) Based on your answers in parts (b) and (c), is the null hypothesis rejected in favor of the alternative hypothesis, at the α = .05 significance level? Interpret your conclusion: What exactly has been demonstrated, based on the empirical evidence? (e) With the knowledge that significantly elevated plasma aluminum levels are toxic to human beings, reformulate the null hypothesis and complementary alternative hypothesis, for the appropriate one-sided test of the mean plasma aluminum levels. With the same sample data as above, how does the new p-value compare with that found in part (c), and what is the resulting conclusion and interpretation? Ismor Fischer, 8/17/2008 Stat 541 / 6-70 Recall that two events A and B are said to be independent if P(A ∩ B) = P(A) P(B). In the same spirit, two random variables X and Y are independent if their joint probability distribution P(X ≤ x ∩ Y ≤ y) = P(X ≤ x) P(Y ≤ y) for all x, y. It can be shown mathematically, that if X and Y are independent, then their covariance Cov(X, Y) = 0. Repeat the above calculations for the following ordered data sets X and Y, paying special attention to formula 2b. What conclusion can you draw about Var(X − Y) versus Var(X) + Var(Y), if X and Y are independent? (This property is crucial in § 6.2.1.) X 0 6 12 18 Y 3 9 3 5 6-8. The arrival time of my usual morning bus (B) is normally distributed, with a mean ETA at 8:00 AM, and a standard deviation of 4 minutes. My arrival time (A) at the bus stop is also normally distributed, with a mean ETA at 7:50 AM, and a standard deviation of 3 minutes. (a) With what probability can I expect to catch the bus? (Hint: What is the distribution of the random variable X = A – B, and what must be true about X in the event that I catch the bus?) (b) How much earlier should I arrive, if I expect to catch the bus with 99% probability? 6-9. In this problem, assume that population cholesterol level is normally distributed. (a) Consider a small clinical trial, designed to measure the efficacy of a new cholesterol- lowering drug against a placebo. A group of six high-cholesterol patients is randomized to either a treatment arm or a control arm, resulting in two numerically balanced samples of n1 = n2 = 3 patients each, in order to test the null hypothesis H0: μ1 = μ2 vs. the alternative HA: μ1 ≠ μ2. Suppose that the following data are obtained: Placebo Drug 220 180 240 200 290 220 Obtain the 95% confidence interval for μ1 − μ2, and the p-value of the data (use R if you wish), and use each to decide whether or not to reject H0 at the α = .05 significance level. Conclusion? (b) Now imagine that the same drug is tested using another pilot study, with a different design. Serum cholesterol levels of n = 3 patients are measured at the beginning of the study, then re- measured after a six month treatment period on the drug, in order to test the null hypothesis H0: μ1 = μ2 versus the alternative HA: μ1 ≠ μ2. Suppose that the following data are obtained: Baseline End of Study 220 180 240 200 290 220 Obtain the 95% confidence interval for μ1 − μ2, and the p-value of the data (use R if you wish), and use each to decide whether or not to reject H0 at the α = .05 significance level. Conclusion? (c) Compare and contrast these two study designs and their results. Ismor Fischer, 8/17/2008 Stat 541 / 6-71 6-10. In order to determine whether children with cystic fibrosis have a normal level of iron in their blood on average, a study is performed to detect any significant difference in mean serum iron levels between this population and the population of healthy children, both of which are approximately normally distributed with unknown standard deviations. A random sample of n1 = 9 healthy children has mean serum iron level 1x = 18.9 μmol/l and standard deviation s1 = 5.9 μmol/l; a sample of n 2x= 13 children with cystic fibrosis has mean serum iron level 2 = 11.9 μmol/l and standard deviation s = 6.3 μmol/l. 2 (a) Formulate the null hypothesis and complementary alternative hypothesis, for testing whether the mean serum iron level μ1 of the population of healthy children is equal to the mean serum iron level μ of children with cystic fibrosis. 2 (b) Construct the 95% confidence interval for the mean serum iron level difference μ − μ . 1 2 (c) Calculate the p-value for this experiment, under the null hypothesis. (d) Based on your answers in parts (b) and (c), is the null hypothesis rejected in favor of the alternative hypothesis, at the α = .05 significance level? Interpret your conclusion: What exactly has been demonstrated, based on the sample evidence? 6-11. Methylphenidate is a drug that is widely used in the treatment of attention deficit disorder (ADD). As part of a crossover study, ten children between the ages of 7 and 12 who suffered from this disorder were assigned to receive the drug and ten were given a placebo. After a fixed period of time, treatment was withdrawn from all 20 children and, after a “washout period” of no treatment for either group, subsequently resumed after switching the treatments between the two groups. Measures of each child’s attention and behavioral status, both on the drug and on the placebo, were obtained using an instrument called the Parent Rating Scale. Distributions of these scores are approximately normal with unknown means and standard deviations. In general, lower scores indicate an increase in attention. It is found that the random sample of n = 20 children enrolled in the study has a sample mean attention rating score of methylx = 10.8 and standard deviation s placebox = 2.9 when taking methylphenidate, and mean rating score methyl = 14.0 and standard deviation s = 4.8 when taking the placebo. placebo (a) Calculate the 95% confidence interval for μplacebo, the mean attention rating score of the population of children taking the placebo. (b) Calculate the 95% confidence interval for μmethyl, the mean attention rating score of the population of children taking the drug. (c) Comparing these two confidence intervals side-by-side, develop an informal conclusion about the efficacy of methylphenidate, based on this experiment. Why can this not be used as a formal test of the hypothesis H , vs. the alternative H0: μ = μplacebo methyl A: μ ≠ μplacebo methyl, at the α = .05 significance level? (Hint: See next problem.) Ismor Fischer, 8/17/2008 Stat 541 / 6-72 2 6-12. A formal hypothesis test for two-sample means using the confidence interval for 1μ μ− is generally NOT equivalent to an informal side-by-side comparison of the individual confidence intervals for and for detecting overlap between them. 1μ 2μ 1X 2X(a) Suppose that two population random variables and are normally distributed, each with standard deviation 0 1:H. We wish to test the null hypothesis 50σ = 2μ μ= versus the alternative 0 1:H 2μ μ≠ , at the .05α = significance level. Two independent, random samples are selected, each of size 100n = , and it is found that the corresponding means are 1 215x = 2 200x = and , respectively. Show that even though the two individual 95% confidence intervals for and 1μ 2μ overlap, the formal 95% confidence interval for the mean difference 1 2μ μ− does not contain the value 0, and hence the null hypothesis can be rejected. (See middle figure below.) (b) In general, suppose that 1 1~ ( , )X N μ σ and 2 2~ ( , )X N μ σ , with equal σ (for simplicity). In order to test the null hypothesis 0 1:H 0 1:H2μ μ= versus the two-sided alternative 2μ μ≠ at the α significance level, two random samples are selected, each of the same size n (for simplicity), resulting in corresponding means 1 CIμ and CI1x 2x and , respectively. Let 2μ be the respective 100 ( ) 1 2 / 2 | | / x xd z nα σ − = confidence intervals, and let (1 )%α− . (Note that the denominator is simply the margin of error for the confidence intervals.) Also let CI 1 2μ μ− be the 100 confidence interval for the true mean difference (1 )%α− 1 2μ μ− . Prove: If 2d < 1 2 0 CIμ μ−∈ (i.e., “accept” ), and CI, then 0H 1 2CIμ μ∩ ≠ ∅ (i.e., overlap). • 1x 2x • • 1 2x x− | 0 If 2 d< < 1 2 0 CIμ μ−∉ 1 2CI CIμ μ∩ ≠ ∅2 , then (i.e., reject ), but 0H (i.e., overlap)! If , then 2d > 1 2 0 CIμ μ−∉ (i.e., reject ), and CI0H 1 2CIμ μ∩ =∅ (i.e., no overlap). 2x • • • 1 2x x− | 0 1x • 1x 2x • • 1 2x x− | 0 Ismor Fischer, 8/17/2008 Stat 541 / 6-75 6-14. Consider the following 2 × 2 contingency table taken from a retrospective case-control study that investigates the proportion of diabetes sufferers among acute myocardial infarction (heart attack) victims in the Navajo population residing in the United States. MI Yes No Total Yes 46 25 71 D ia be te s No 98 119 217 Total 144 144 288 (a) Conduct a Chi-squared Test for the null hypothesis H0: π Diabetes | MI = π Diabetes | No MI versus the alternative HA: π Diabetes | MI ≠ π Diabetes | No MI. Determine whether or not we can reject the null hypothesis at the α = .01 significance level. Interpret your conclusion: At the α = .01 significance level, what exactly has been demonstrated about the proportion of diabetics among the two categories of heart disease in this population? (b) In the study design above, the 144 victims of myocardial infarction (cases) and the 144 individuals free of heart disease (controls) were actually age- and gender-matched. The members of each case-control pair were then asked whether they had ever been diagnosed with diabetes. Of the 46 individuals who had experienced MI and who were diabetic, it turned out that 9 were paired with diabetics and 37 with non-diabetics. Of the 98 individuals who had experienced MI but who were not diabetic, it turned out that 16 were paired with diabetics and 82 with non-diabetics. Therefore, each cell in the resulting 2 × 2 contingency table below corresponds to the combination of responses for age- and gender- matched case-control pairs, rather than individuals. MI Diabetes No Diabetes Totals Diabetes 9 16 25 N o M I No Diabetes 37 82 119 Totals 46 98 144 Conduct a McNemar Test for the null hypothesis H0: “The number of ‘diabetic, MI case’ - ‘non-diabetic, non-MI control’ pairs, is equal to the number of ‘non-diabetic, MI case’ - ‘diabetic, non-MI control’ pairs, who have been matched on age and gender,” or more succinctly, H0: “There is no association between diabetes and myocardial infarction in the Navajo population, adjusting for age and gender.” Determine whether or not we can reject the null hypothesis at the α = .01 significance level. Interpret your conclusion: At the α = .01 significance level, what exactly has been demonstrated about the association between diabetes and myocardial infarction in this population? (c) Why does the McNemar Test only consider discordant case-control pairs? Hint: What, if anything, would a concordant pair (i.e., either both individuals in a ‘MI case - No MI control’ pair are diabetic, or both are non-diabetic) reveal about a diabetes-MI association, and why? Ismor Fischer, 8/17/2008 Stat 541 / 6-76 6-15. The following data are taken from a study that attempts to determine whether the use of electronic fetal monitoring (“exposure”) during labor affects the frequency of caesarian section deliveries (“disease”). Of the 5824 infants included in the study, 2850 were electronically monitored during labor and 2974 were not. Results are displayed in the 2 × 2 contingency table below. Caesarian Delivery Yes No Totals Yes 358 2492 EF M Ex po su re 2850 No 229 2745 2974 Totals 587 5237 5824 (a) Calculate a point estimate for the population odds ratio OR, and interpret. OR (b) Compute a 95% confidence interval for the population odds ratio OR. (c) Based on your answer in part (b), show that the null hypothesis H0: OR = 1 can be rejected in favor of the alternative HA: OR ≠ 1, at the α = .05 significance level. Interpret this conclusion: What exactly has been demonstrated about the association between electronic fetal monitoring and caesarian section delivery? Be precise. (d) Does this imply that electronic monitoring somehow causes a caesarian delivery? Can the association possibly be explained any other way? If so, how? Ismor Fischer, 8/17/2008 Stat 541 / 6-77 6-16. The following data come from two separate studies, both conducted in San Francisco, that investigate various risk factors for epithelial ovarian cancer. Study 1 Disease Status Study 2 Disease Status Cancer No Cancer Total No CancerCancer Total None 31 93 124 None 39 74 T er m Pr eg na nc ie s T er m Pr eg na nc ie s 113 One or More One or More 80 379 459 149 465 614 Total 111 472 583 Total 188 539 727 (a) Compute point estimates and 1OR 2OR of the respective odds ratios OR and OR1 2 of the two studies, and interpret. (b) In order to determine whether or not we may combine information from the two tables, it is first necessary to conduct a Test of Homogeneity on the null hypothesis H0: OR = OR1 2, vs. the alternative HA: OR ≠ OR , by performing the following steps. 1 2 1OR 2OR Step 1: First, calculate l ) and l1 = ln( 2 = ln( ), in the usual way. Step 2: Next, using the definition of given in the notes, calculate the weights s.e. 2 1 1 s.e. 2 2 1 s.e. w and w = = . 1 2 Step 3: Compute the weighted mean of l and l : 1 2 L = w1 l1 + w2 l2 w1 + w2 . Step 4: Finally, calculate the test statistic Χ 2 2 2 = w (l1 1 − L) + w (l2 2 − L) , which follows an approximate χ 2 distribution, with 1 degree of freedom. Step 5: Use this information to show that the null hypothesis cannot be rejected at the α = .05 significance level, and that the information from the two tables may therefore be combined. (c) Hence, calculate the Mantel-Haenszel estimate of the summary odds ratio: = (a1 d1 / n1) + (a2 d2 / n2) (b1 c1 / n1) + (b2 c2 / n2)summaryOR .
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved