Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Hypothesis Testing and Linear Regression Practice Test, Lecture notes of Mathematics

Linear RegressionHypothesis TestingAnalysis of Variance

This practice test covers hypothesis testing, linear regression, and analysis of variance in the context of scientific research. It includes questions about the correct hypotheses to use, P-value interpretation, slope interpretation, association testing, and choosing the appropriate statistical test. The test also covers one-way ANOVA F test statistics, linear regression curve fitting, and lurking variable identification.

What you will learn

  • What hypotheses should be stated to test the theory of a positive linear relationship between HGMF and HEC readings?
  • What is the best interpretation of the slope of the regression line in the context of HGMF and HEC readings?

Typology: Lecture notes

2019/2020

Uploaded on 08/05/2021

dominikiko
dominikiko 🇺🇸

5 documents

1 / 14

Toggle sidebar

Related documents


Partial preview of the text

Download Statistical Hypothesis Testing and Linear Regression Practice Test and more Lecture notes Mathematics in PDF only on Docsity! Practice Test for Final Exam The concentration of Escherichia coli (E. Coli) in beef is tested in the laboratory using a method called HGMF. A new method that can be done in the field called HEC has been developed. A comparison of treatments needs to be done. The developers of the HEC method want to test their theory that HGMF readings and HEC readings are positively linearly related. To test their theory, they randomly selected 18 pieces of beef and measured the E. Coli concentration on each piece using both the HGMF and HEC methods. Use this to answer the next 3 questions. Mole: HEC= fe +P HaMP te Flee = b+ 6 HOME Scatter plot of HEC vs HGMF 20.04 + 0.42 H@ME Predictor Coef SE Coef T P Constant -0.09 © -0.78 0.4445 Oo Ww HGMF 0.92 0.285 ® 0.0052 = oO pe Ona gg bt gy Oe Meo ys poralue tor a@ MO one 8 fas pte 4 OS hee-sided) @T=- 4% _- ode 8 se(4) Os oapp | 822 1. What hypotheses should be stated to test his claim? a Ay :b; =O vs. Hy: b £0 b) Ho: 6, =O vs. Ha: 8, #0 (©) Ho: fi = 0 vs. A,: 6, >0 (d) Ho: p=Ovs. Ha: 40 (ec) Hy: =O0vs. Ha: >0 2. What is the correct P-value the researchers should use to determine whether or not to reject the null hypothesis? (a) 0.4445 (b) 0.22225 (c) 0.0052 0.0052 4))0.0026 4 ~— 3a e) None of the above is correct. 3. Suppose the model assumptions for linear regression are satisfied. What is the best inter- pretation of the slope of the regression line? (a) The estimated average change in HGMF when HEC increases by 1 is 0.92. (b) The estimated average change in HGMF when HEC in as by 1 is —0.09. © The estimated average change in HEC when HGMF in as by 1 is 0.92. (d) The estimated average change in HEC when HGMF increases by 1 is —0.09. (e) The estimated average change in HEC when HGMF increases by 1 is 0.83. fro. An interdenominational organization wanted to determine if clerics from different religions have a different level of awareness of mental illness. Three random samples were taken, one of 10 Methodist ministers, one of 10 Catholic priests and a third of 10 Pentecostal ministers. Each of the 30 subjects was given a standardized written test to measure knowledge of mental illness. Below is some of the relevant JMP output. They plan to use this output to test their theory that there is an association between religion and average score on the knowledge of mental illness exam. Use this to answer the next 3 questions. Level n Mean SD Catholic 10 26.20 20.6817 Methodist 10 30.50 21.6551 Pentecostal 10 12.40 10.1784 seem Dw ou) Gee VAM i 003.0102 05 0809 O08 Normal Quantile Plat Analysis of Variance Source df SS MS F p-value Religion 2(a) 1788.467 (d) (e) (f) Error 29 (b) (c) 333.426 C. Total 29 10790.967 4. Complete the above ANOVA table a) 2 J) PY, 223 f) 2.0PG6 (a) ~ (f): 4 a4 ©) 2bpo ¢) 9002.G02 5. Are the conditions met to do a test of ANOVA on this data? Choose the best answer. (a) Yes, because the subjects were randomly selected. (b) No, because the standard deviation of the Methodist sample is more than twice the standard deviation of the Pentecostal sample. (c) No, because there is an extreme outlier in the residual plot above right. No, because the standard deviation of the Methodist sample is more than twice the standard deviation of the Pentecostal sample and there is an extreme outlier in the box plots of the residuals above right. (e) Yes, because the sample size is 30 and the subjects were randomly selected. 6. Assume that the conditions are met and that it is appropriate to use an ANOVA to analyze this data. What should their conclusion be if a = 0.10? (a) The data provides no statistical evidence of an association between religion and score on the knowledge of mental illness. The data provides statistical evidence of an association between religion and score on the knowledge of mental illness. A new cholesterol treatment is in the developmental stage. The researchers want to test their theory that people taking their drug have lower cholesterol than people not taking the drug. Ten volunteers agree to take the drug. Their cholesterol level is measured, they are given the drug for 2 weeks and their cholesterol level is measured again. Assume all the conditions are met to analyze this data using a paired t test analysis. Use this scenario to answer the next 2 questions. Below is plotted 3 different sets of numbers. One set is everyone’s before drug cholesterol level. Another set is everyone’s after taking the drug cholesterol level. The other set is everyone’s difference in cholesterol level measured as before—after. Box Plot Comparison Defore ater Before- 3 : After be Pe After 1 tbo; E Baore- Aner Before | | 10 0 10 20 30 . 11. The correct test to analyze this data is a paired t test. The paired t test P-value is .004. If the data is incorrectly analyzed using 2-sample t test, the P-value is .0901. Explain why the P-value is larger in the 2-sample t test. (@yme standard deviation (SD) in the paired-t test measures only the variation due to the differences in how patients react to the drug. The SD in the 2-sample t test measures the variation due to differences in how the patients react to the drug plus the variation in cholesterol levels between patients. Consequently, the test statistic for the paired-t test is larger than the test statistic for the 2-sample t test and so the P-value will be smaller for the paired-t test. (b) The standard deviation (SD) in the paired-t test measures only the variation due to the differences in how patients react to the drug. The SD in the 2-sample t test measures the variation due to differences in how the patients react to the drug plus the variation in cholesterol levels between patients. Consequently, the test statistic for the paired-t test is smaller than the test statistic for the 2-sample t test and so the P-value will be smaller for the paired-t test. (c) The 2-sample t test measures a different theory and so the P-value will be different. (d) It may happen by chance without an explanation. (e) None of above. 12. In a matched-pair design study for effect of a certain treatment, researchers form a 2-sided hypotheses and found that the mean of difference in the before treatment — after treatment 5 13. 14. reading was 4.37. In other words, ¥g = 4.37. What would definitely change if instead the mean sample difference had been %q = 6.09? Assume the SD and sample size remained the same. Test Stet she will be larger, (a) The conclusion would change. (b) The power would decrease. > The area f the eet 4 tha value The P-value would decrease will be decreased The P-value would increase “Ls, (e) Nothing would change. A randomized controlled trial was designed to compare the effectiveness of splinting ver- sus surgery in the treatment of carpal tunnel syndrome. In the trial, 200 patients with carpal tunnel syndrome were randomly assigned to either receive splinting or surgery. The number of patients for whom the treatment was successful was noted in both groups of patients. Interest centers on whether there is a difference between treatment for carpal tunnel syndrome and outcome. What is the most appropriate test to analyze this data set? (a) 2 independent samples t test Me firpa vs Hee fi *p (b) Matched-pairs t test (c) Linear regression t test d) 2 independent sample proportions z test (e) Compare several means: one-way ANOVA F test A study of the effects of smoking on hours of sleeping classifies subjects as nonsmokers, moderate smokers, or heavy smokers. The investigators interview a sample of 100 people in each group. Assume all the conditions are met. The degrees of freedom for the one- way ANOVA F test statistic for comparing the mean hours of sleep among these groups are: Thee Prowps : Each yor has 100 vbseevetions (a) 2 and 97. Tadeak He by2 and 297. Treatment 4423422 ¢) 3 and 297. Evry | th=30-3 = 29) 0 a ne lel (Corrected) | n-1 =3av-1=249 e al . N=M+N~+N3 = lootiootied . A researcher is interested in testing for a relationship between BMI and number of hours of exercise done per week. A random sample of 100 men aged 20-30 was recruited for the study. The BMI was measured on each and each reported the number of hours of exercise they did the previous week. a) Chi-squared test ‘Linear regression c) Paired t test (d) Independent sample t test (e) None of the tests we’ve studied this semester can be used to analyze all of his data. A study randomly assigned adult subjects to 3 exercise treatments: (1) a single long exercise period five days per week; (2) several ten-minute exercise periods five days per week; and (3) several ten-minute periods five days per week using a home treadmill. The study report contains the following summary statistics about weight loss (in kilograms) after six months of treatment: Treatment Mean Std. dev. n Long periods 10.2 4.2 37 Multiple short periods 9.3 4.5 36 M=/) 3 Multiple short periods with treadmill 10.2 5.2 42 W421) 5-3 3-] Here is a partial ANOVA table based on these data. 2d..022/n = 10, 016 (J=2 @) 206), of 2 Source df Ss MS _ E-ratio - - (o.0tb 3 by, (7 = fro & ) 2084, oy. Group (1) 20.082 (6 (7) Free P7 y = - & Error (2) (4) 18.4560 —* ya ash = GY, = i / GB) l.01b ‘ 5 Total (3) (6) 2 4)= H2% 1456 As-| Q) o 529 S 2,089./0: 16. The alternative hypothesis for the ANOVA is that the population mean weight loss after six months is Hp: 4=4e=M3 vs Ha! not Ho. (not all the three groups hare he same mean & thyne wtsteot least one pair of dittecant mean. a) highest with multiple short periods of exercise using a. treadmill. b) different for each of the three exercise conditions. (c))not the same for the three exercise conditions. (d) the same for all three exercise conditions. (ec) None of above (5)= 20.032 + 206.092 = 2069.092 17. The value of the F statistic is 0.54. 1.09. 24. Bleaching chemicals are used in the pulp and paper industry to increase the brightness of the paper that is produced. Four chemical agents were studied to determine their effect on the brightness of paper produced from pulp treated with the chemical. The null hypothesis Ay : fa = Lp = fc = fp was established for the ANOVA. What is the appropriate alternative hypothesis? (a) Ha: Ha F Up F le F bp. (b) H, : All of the pairs of the p14, Wp, Lc, fp are different. (c) H, : At least one pair of the p14, Wp, Hc, Lp is different from each others. (d) H, : Not all of the are equal. (Both (c) and (d) are correct. The presence of bacteria in the urine has been associated with kidney disease (more bacteria leads to greater susceptibility to kidney disease). There is anecdotal evidence that oral contraceptive use (OC) is associated with kidney disease. A study was done to test if the average bacteria level of OC users differs from the bacterial level of non-OC users. Recognizing that the bacteria level could vary greatly by age group, the researchers first divided up the subjects (all volunteers who were non-pregnant premenopausal women under age 50) into 4 age categories. In each age category, the subjects were randomly assigned to either take an OC for 2 months or a placebo for 2 months. As the end of the 2 months, the bacteria level in the urine was measured. Use this scenario to answer the next 2 questions. 25. What are the variables of interest to the researchers? (explanatory: OC usage Response: bacteria level (b) Explanatory: age category Response: bacteria level (c) Explanatory: age category Response: OC usage ) ) Explanatory: OC usage Response: age (e) Explanatory: bacteria present Response: OC usage 26. What type of study was this? (a) Experimental: completely randomized design b)) Experimental: block design c) Experimental: matched pairs design (d) Observational: biased survey because the subjects were not randomly selected from a sample frame (e) Observational: comparative observational study 10 27. Researchers want to do a linear regression analysis on 22 subjects. Assume each subject’s response is only measured once implying the independence condition is met. What can be concluded from the following plots? Scatter plot Normal quantile plot Residual by predicted plot» a] 2 ey He 428. 15. . 2 aery|f $12 8 . 5 = e 3 8 pod] & x > oar. 0 40-2 0 2% 0 0 128. Explanatory ° 164 Y Predicted (@) re explanatory and response are not linearly related (b) The standard deviation of the response is not constant (variances are not constant) (c) The response is not normally distributed about the mean function (d) Both a) and b) are true (ce) All the linear regression assumptions are met 28. We randomize the assignment of treatments in an experiment so that: (a) The effect of external influences differs across treatments (b) Each person in the study will be equally likely to receive the best treatment (c) We can decide if there is a correlation between the explanatory and the response vari- ables he groups of subjects are similar in all respects before the treatments are applied e) None of the above 29. An independent research firm conducts a study to compare the taste of four new sports drinks. Five people are randomly assigned to each of the four drinks. Each person tastes the drink and judges it on a scale from one to five. How many degrees of freedom (df1) does the group sum of squares have? How many degrees of freedom (df2) for the error sum of squares? h=4 (a) dfl = 4; df2 = 17 hed. f=), 23.4 (b) dfl = 16; df2 = 3 ©) = 3; df2 = 16 (d) dfl = 3; df2 = 17 df l=4-)=3 ») dfl = 4; df2 = 16 (e) “fa 2 N-ke do-4 2/6 N20. 11 30. 31. During one of its broadcasts in the 1990’s the ABC program Nightline asked viewers to call a 800 or 900 phone number to state whether they believed the United Nations should continue to be located in the United States. Of the more than 186,000 callers, 67% wanted the United Nations out of the United States. At the same time, a political scientist using a random sample of about 500 Americans estimated the percentage of all Americans who want the United Nations out the United States to be about 28%. What caused this discrepancy? You should assume the sample frame used by the political scientist was a good sample frame. (D. Horvitz, et al., Chance, vol. 8, pp. 16-25, 1995) (a) The Nightline poll had 186,000 participants compared to the 500 in the political sci- entist’s poll so the Nightline poll better estimates the percent of Americans who want the United Nations out of the United States. One Nightline poll result is biased due to the use of a voluntary response sample and reflects the opinion of Nightline viewers with strong opinions on the subject compared to Americans in general. (c) The Nightline poll result is biased because it is based on a convenience sample and doesn’t accurately reflect the opinion of Americans in general. (d) The political scientist’s study suffers from non-response bias because there were only 500 subjects. (e) There is no bias and the difference in results of the two studies is likely due to chance. Laughter is often called the best medicine. Studies have shown that laughter can reduce muscle tension and increase oxygenation of the blood. In the International Journal of Obe- sity (2007), researchers at Vanderbilt University investigated the physiological changes that accompany laughter. Ninety subjects (18-34 years old) watched film clips designed to evoke laughter. During the laughing period, the researchers measured the heart rate (in beats per minute) of each of the n =90 subjects. It is well known that the average resting heart rate of adults is 71 beats per minute. The researchers used the data to test the claim that the average heart rate of laughing adults is not 71. The researchers set a = .05 correctly calculated a P-value = 0.08. Suppose we have additional information unavailable to the researchers and know that IN REALITY the mean heart rate during laughter is actually 76. Based on the calculations described, what conclusion did the researchers reach and did they make an error? (a) They will have concluded that the mean heart rate during laughter is different from 71 and made a Type I error (b) They will have concluded that the mean heart rate during laughter is different from 71 and made a Type II error (c) They will have concluded that the mean heart rate during laughter is different from 71 and made a correct decision hey will have concluded that they had insufficient statistical evidence to be able to conclude that the mean heart rate during laughter is different from 71 and made a Type II error (e) None of the above 12
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved