Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Review Problems on Statistical Analysis and Hypothesis Testing, Study notes of Statistics

Various review problems related to statistical analysis, hypothesis testing, and confidence intervals, covering topics such as business school rankings, charitable contributions, t-tests, regression analysis, and vaccine virulence testing. It includes calculations, interpretations, and explanations of concepts such as sample averages, standard deviations, confidence intervals, p-values, type i and type ii errors, and hypothesis testing.

Typology: Study notes

2009/2010

Uploaded on 03/28/2010

koofers-user-2t4
koofers-user-2t4 🇺🇸

10 documents

1 / 11

Toggle sidebar

Related documents


Partial preview of the text

Download Review Problems on Statistical Analysis and Hypothesis Testing and more Study notes Statistics in PDF only on Docsity! Review Problems on Last Part of the Material 1. Business Week’s rankings of business schools are in part based on a survey of students. For example, 150 MBA students at Wharton are randomly selected and asked to rate the quality of teaching on a 0-10 scale. The following summary statistics are calculated from the 90 students who responded to the questionnaire: Sample average = 7.3; sample standard deviation = 1.9, and 18 of the 90 students rated Wharton a 10. a. Find the 95% confidence interval for the true population mean of the rankings of quality of teaching. b. Find the 95% confidence interval for the true proportion of students who rate Wharton a 10. c. Do the data support the claim that the population mean  for the teaching quality at Wharton exceeds 7? Use  = .05 d. Do the data support the claim that the population proportion of ratings that are 10 exceed 15%? Use  = .01 e. How large a sample would we need for the confidence interval in b. to have a margin of error of .04? How would this change if we were certain that p < .30? f. What assumption(s) did you make in doing the calculations in parts a.-b. that is most suspect? 2. A company called ESP (extra special people) claims that they can predict the behavior of the stock market. A class action suit against ESP was launched by disgruntled former clients. The following experiment was used in the court proceedings. Stocks are randomly chosen on Monday morning. ESP can either predict that the price at the end of the trading day on Friday is above or below the opening price on Monday morning for each stock. For simplicity, assume that the probability is zero that the prices on Monday morning and Friday afternoon are the same. It is decided that ESP has to be correct more than 60% of the time to prove its claim. a. i) What are the null and alternative hypotheses in a test designed to show ESP’s claim? ii) What sample size would be required so that a 95% confidence interval for the probability that ESP is correct has a margin of error of .08? b. If a sample of size 400 stocks is taken, what proportion of them must ESP get correct to prove its claim? Use =.05 c. Would the error be greater in the test in b. if the true proportion were 63% or 65%? What kind of error would this be? 3. It is argued that a review course for the SAT tests does not increase SAT scores. The following test is performed. Twenty five high school juniors are randomly chosen on March 1. Each of the students takes the SAT exam on March 15, prior to any instruction. The twenty five students are then given the course to improve the verbal score on the SAT. They take the SAT exam after the course. Below are the observed data: (see next page) Student Verbal Score Verbal Score Difference Number Before After 1 460 630 -170 2 610 630 -20 3 510 520 -10 4 600 640 -40 5 390 440 -50 6 610 590 20 7 410 610 -200 8 530 520 10 9 620 640 -20 10 360 400 -40 11 590 580 10 12 550 600 -50 13 640 700 -60 14 510 590 -80 15 430 500 -70 16 380 460 -80 17 600 610 -10 18 590 560 30 19 620 670 -50 20 450 500 -50 21 480 450 30 22 510 600 -90 23 480 510 -30 24 600 590 10 25 630 650 -20 Average 526.4 567.60 -41.20 Sample Variance 7749.00 6135.67 3077.67 ai. Perform an appropriate test to see if the course improved the mean score. Use =.05 ii) A victory for the course will only be declared if it increases the mean score by more than 20 points. What conclusion would you draw now using  =.05? b. Do the data support the claim that the proportion of people who improved is greater than 50%? Assume  =.05 4. The government passed a law that receipts must accompany charitable contributions of $250 or more on an individual's 1994 IRS itemized deductions form. A study of 1993 returns shows the following population values. The mean (1993) and standard deviation (1993) for the percentage of an individual's income that was claimed as charitable contributions were 3% and 2.7% respectively and the proportion (p1993) that didn't claim any deductions for charitable contributions was 80%. Let 1994 1994 and p1994 denote the corresponding values for 1994. A sample of 150 returns for 1994 was randomly chosen. The sample average and Ha .2 .2 .3 .2 .1 a. If a person is classified as having a benign tumor if the ratings are 2,3,4 or 5, what are the values of  and ? What do these values mean in the context of this problem? b. The form of the test is to classify a tumor as benign if the rating is at least c. What value of c minimizes  + ? 8. Consider the regression analysis example discussed in class relating the number of hours a student works on school related functions (X) to the student’s GPA (Y). 2 3 4 G P A 0 10 20 30 40 50 60 70 School Hrs Linear Fit GPA = 1.7644708 + 0.0342453 School Hrs RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.468111 0.466774 0.402178 3.10175 400 Summary of Fit Model Error C. Total Source 1 398 399 DF 56.65629 64.37548 121.03177 Sum of Squares 56.6563 0.1617 Mean Square 350.2763 F Ratio <.0001 Prob > F Analysis of Variance Intercept School Hrs Term 1.7644708 0.0342453 Estimate 0.074228 0.00183 Std Error 23.77 18.72 t Ratio <.0001 <.0001 Prob>|t| Parameter Estimates Linear Fit Bivariate Fit of GPA By School Hrs a. Interpret the effect of increasing the number of school hours by one on GPA. b. What would be the predicted GPA for someone who only spent 20 hours per week on school related functions? Find the range that has a 95% chance of including the GPA for students who work 20 hours on school related functions. c. Consider the row labeled School Hrs in the Parameter Estimates part of the output. i) The column labeled Prob > |t| is the P-value for the test of the null hypothesis that the true population slope for school hrs is zero. The actual P-value is less than 1 in ten thousand. Interpret what this means. ii) The Std Error column is giving the standard deviation of the estimate of the slope (analogous to s/√n for the sample average). Construct a 95% confidence interval for the slope (since n is large you may use z) and interpret this interval in the context of this problem. Note: We did not cover material in Question 2 in Fall 2004 so omit this question for review. Statistics 101 Final Exam April 30, 2004 Notes: 1. The exam is open book and notes. Calculators are permitted, but not computers. 2. Please include all of your work in the blue book. 3. Please indicate the null and alternative hypotheses when appropriate. 1. (15 pts) Television shows are pre-tested by inviting 100 individuals to come to the studio, showing these individuals a pilot of the television show and asking each individual to rate the show on a scale of 0 (do not like it at all) to 100 (would be a faithful viewer of the show). It is decided that if the population mean of the rating exceeds 60 then the show is worth airing. For one particular show the average rating of the 100 individuals who observed the pilot was 65 with a sample standard deviation of 15. Of the 100 ratings, 59 were greater than 60 and the remaining 41 were 60 or below. A. Are the data sufficient in showing that the population mean exceeds 60? Use =.01 B. Are the data sufficient in showing that the proportion of people who give the show a rating of above 60 exceeds 50%? Use =.05 C. Are 100 individuals a sufficient sample size so that a 95% confidence interval for the proportion of individuals who rate a television show to be greater than 60 has a margin of error of .07? If not, what sample size would be required? 2. (20 pts) The television studio has an option of running one of two television shows. The first television show is the one described in problem 1. You are therefore to use the data that are provided on the ratings from television program 1 that are indicated in problem 1. The second television show is viewed by a different audience of 100 individuals. This show is the favorite of the professionals of the studio. The average ratings that this show gets is 70 with a sample standard deviation of 13. Of the 100 ratings, 69 were greater than 60 and the remaining 31 were 60 or below. A. Do the data justify that the mean rating for this show exceeds the mean rating for the show described in question 1? Use =.05 B. Do the data justify that the proportion of people in the population who rate this show higher than 60 is greater for this show than the one in problem 1? Use =.05 C. Assume each of the show’s ratings has a known population standard deviation of 15. All question refer to: H0:1 =2 versus Ha:1 ≠2 at =.05 Situation 1:  1 =60 and  2 =70 Situation 2:  1 =55 and  2 =75 For each of the four descriptions below i), ii), iii), and iv): Answer a) Situation 1 is higher b) Situation 2 is higher or c) no difference. Also include a one sentence explanation of your answer. i) The probability of making a Type II error. ii) The probability of making a Type I error. iii) The P-value in the test assuming given sample sizes and sample averages. iv) The sample size (assume equal sample size) for a Type II error of .05 3. (25 points) In a study of mutual funds (based on real data and analysis in the literature), 100 mutual funds are randomly chosen and their returns in 1992 and 1993 are found. Refer to Figure 1 below in answering parts A and B. A. i) What is the meaning of the R-squared value in the context of the problem? ii) The P-value for the test of the null hypothesis that the population R-squared value is zero versus the alternative that it is greater than zero is .0399 What does the p- value mean in the context of this problem? B. i) What is the predicted return in 1993 for a mutual fund that had a return of 10 percent in 1992? ii) What would be the range of returns that has a 95% chance of including the true return in 1993 for a mutual fund with a return of 10 percent in 1992 assuming a normal distribution? C. Refer to either Figure 2 or Figure 3 (whichever you think is more appropriate). i) Do the data show that the mean return in 1993 is different from the mean return in 1993-1992 -30 -20 -10 0 10 20 30 40 50 60 70 80 90 .001 .01 .05.10 .25 .50 .75 .90.95 .99 .999 -4 -3 -2 -1 0 1 2 3 4 Normal Quantile Plot Moments Mean 7.2567 Std Dev 15.549585 Std Err Mean 1.5549585 upper 95% Mean 10.342075 lower 95% Mean 4.171325 N 100 4. (24 points) This problem is based on a real court case, although the data have been changed because of confidentiality. A manufacturer of Polio vaccines is required by law to test the virulence of the vaccine on n animals. The scores on virulence are on a 1-10 scale. We need to feel comfortable (translated to be a type I error of .05) that the mean score is below four for the vaccine to be safe to be released. Assume that  =3 in answering all of the parts to questions 4. A. The average score and sample standard deviation from the five animals that were tested were 2.1 and 2.7 respectively. Should the vaccine be released? Use =.05 B. If the true mean score is 2, what is the probability of correctly releasing the vaccine using the test in Part A? C. Since we want to release vaccines that have population means of 2 frequently, more animals need to be tested. How many animals would have to be tested so that there is only a 5% chance of releasing vaccines that should not be released and a 10% chance of not releasing vaccines that should be released when the population mean is 2? D. There is a law suit against the pharmaceutical company claiming that they are not following statistical practice. The following data are collected on four lots of vaccine. Lot Sample Average Number of Animals Released? 1 2.7 18 Yes 2 2.5 8 No 3 2.9 16 Yes 4 2.6 11 No Is this consistent in that there is the same  (does not have to be .05) that gives rise to releasing lots 1 and 3, but not lots 2 and 4? 5. (16 points) For purposes of analysis, assume that movies are categorized into one of two groups: H0: Movie is non-profitable (“bust”) or Ha: Movie is profitable (“success”). From past experience it has been determined that the number of stars that a critic gives to a movie is related, although not perfectly, to whether the movie is a “bust” or “success”. Movies that are H0: respective probabilities of 1, 2, 3 and 4 stars are .4, .39, .16 and .05 Movies that are Ha: respective probabilities of 1, 2, 3 and 4 stars are .1, .2, .32 and .38 You decide to classify a movie as a “success” (i.e., reject the null hypothesis), if it receives at least c stars. A. i) What is the value of c so that the probability of a type I error is .05? ii) What is the probability of a making a type II error if the rule in Ai. is adopted? iii) What is the p-value associated with a movie that receives 3 stars? The following information is available on movies: 1. 25% of the movies are “successes” and the remaining 75% are “busts”. 2. i) Not releasing a movie gives zero profit regardless of whether the movie is a “bust or a “success. ii) Releasing a movie that is a “bust” loses 25 million dollars. iii) Releasing a movie that is a “success” makes 50 million dollars. B. i) What is the probability that a movie is a “success” if it receives a 3 star rating? ii) If a movie is not released it of course has an expected profit of zero. Compute the expected profit if a movie with 3 stars is released to determine if it is profitable on average to release a movie with 3 stars.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved