Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Probability Distributions & Statistical Inference: Earnings, Quantiles, & Hypothesis Testi, Exams of Data Analysis & Statistical Methods

Solutions to various probability and statistical problems. Topics include finding probabilities, expected values, and quantiles for different distributions, as well as hypothesis testing using normal distributions. The problems involve calculating probabilities for continuous random variables, such as the time a light stays green, the earnings from an insurance policy, and customer satisfaction ratings.

Typology: Exams

Pre 2010

Uploaded on 03/10/2009

koofers-user-50n
koofers-user-50n 🇺🇸

10 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Probability Distributions & Statistical Inference: Earnings, Quantiles, & Hypothesis Testi and more Exams Data Analysis & Statistical Methods in PDF only on Docsity! ISQS 5347 Fall 06 Final. Open notes, no book. Points out of 100 in parentheses. 1. Use the stoplight example from class. Let T=time the light stays green. Draw and annotate the graph of the pdf of T indicating the solution to each of the following problems. Draw a separate graph for each problem. 1.A. (5) Find P(T>1) and show it on the graph with complete annotation (ie, do not exclude crucial information from your graph). Solution: Draw the graph of the U(0,2) pdf. Shade the area to the right of 1.0. Note that the area is 0.5. 1.B. (5) Find P(|T-1| > .5) and show it on the graph with complete annotation (ie, do not exclude crucial information from your graph). Solution: Draw the graph of the U(0,2) pdf. Shade the area to the right of 1.5 and to the left of .5. Note that the area is 0.25+0.25=0.5. 1.C. (5) Find the .975 quantile of the distribution of T and show it on the graph with complete annotation (ie, do not exclude crucial information from your graph). Solution: Draw the graph of the U(0,2) pdf. Mark a point near 2.0 on the horizontal axis and shade the area to the left of this point, noting that the area is .975. Note that the point you marked is 1.95. 2. Employees who have received training are successful in a task 90% of the time; those who have not received training are successful 40% of the time. 30% of the employees in the company have received training. 2.A. (5) How often are employees, as a whole, successful in the task? Solution: P(Success) = P(Success|Training)P(Training) + P(Success|No training)P(No training) = .9*.3 + .4*.7 = .55. 2.B. (5) An employee was successful in the task. Did that employee receive training? Answer the question by finding the appropriate probability. Solution: P(Training|Success) = P(Success|Training)P(Training)/P(Success) = .9*.3/.55 = .27/.55 =.49. The employee is slightly more likely not to have received training than to have received training. 3. An insurance company sells a certain type of policy. Suppose the company’s earnings on the policy follow the following extremely simple model. Earnings Probability -$15 .2 $10 .8 3.A. (5) Find the expected earnings per policy. Solution: E(Earnings) =-15*.2 + 10*.8 = $5. 3.B. (5) Find the variance of earnings per policy. Solution: Var(Earnings) = (-15-5)2*.2 + (10-5)2*.8 = 100. 3.C. (5) Suppose that 10,000 are insured. Find the expected total earnings on the 10,000 policies. Solution: E(Total) = E(X1 + …+ X10000) = E(X1) + …+ E(X10000) = 5 + …+5 = 10000(5) = $50,000. 3.D. (5) Suppose that 10,000 are insured. Find the variance and standard deviation of the total earnings on the 10,000 policies. Solution: Var(Total) = Var(X1 + …+ X10,000) = Var(X1) + …+ Var(X10,000) (assuming the insured are independent) = 100 + …+100 = 10000(100) = 1,000,000. StdDev(Total) =sqrt(Var(Total))= sqrt(1,000,000)= $1,000. 3.E. (5) Draw, in detail, the approximate pdf of the total earnings on the 10,000 policies. Do not exclude crucial information from the graph. Solution: The distribution is approximately normal by the CLT. So draw a bell curve with peak at $50,000 and limits at $47,000 and $53,000. 4. Prior data suggests that 30% of managers are “very familiar” with the six sigma management paradigm. Prior data are not certain, however: the current percentage might be higher or lower because of incomplete past data, and because things change over time. 4.A. (5) Suggest and defend a specific prior distribution for p, the true current proportion of managers who are “very familiar”, and draw a graph of this prior distribution. The distribution should be completely specified and graphed as carefully as possible according to this specification. In other words, do not be generic and fuzzy in selecting this distribution, instead be as precise as possible. Be sure your prior distribution acknowledges the “30%” figure appropriately. Solution: The prior distribution of choice for the binomial parameter p is the beta distribution. Since the mean of the Beta distribution is α1/(α1+α2), you should pick the parameters α1 and α2 so that α1/(α1+α2)=.3. A simple choice is α1=3, α2=7. But another choice is α1=300, α2=700. The higher the values of α1 and α2, the less variance in the pdf, and the higher the prior certainty, since the variance of the beta random variable is µ(1-µ)/( α1+α2+1). We are not so certain about the value 0.3, so we should Solution: Following the identical logic as in 5.B., the graph will be a normal curve centered at 2.0 with limits 1.7 and 2.3. The power is the area under this curve to the left of 2.8 and to the right of 3.2. Since the curve extends from 1.7 to 2.3, the area to the left of 2.8 takes up the entire curve, and the power is extremely close to 1.0. 6. You decide to test a research hypothesis using a 5% Type one error rate. 6.A. (5) Why do you choose 5% and not 50%? Do not answer “Because 5% is standard,” instead give the logic. Solution: If you test at the 50% Type I error rate, then you have a 50% chance of rejecting the null hypothesis when the null hypothesis is actually true. In this case, the decision to “reject Ho” would not give you much confidence that you have reached the correct decision, since you can make the wrong decision half the time when the null hypothesis is actually true. On the other hand, if the rule is set at 5%, then it is still possible to make the wrong decision of rejecting the null when it is true, but there is much less chance. So, if you reject Ho, you can conclude either (a) Ho is false, or (b) Ho is true but a rare event has occurred. Since the probability of the rare event is small, it is logically reasonable to decide that the alternative is true, even while acknowledging that you still could have made a mistake. 6.B. (8) If the statistical assumptions required by the test are violated, then the true type I error rate is no longer 5%. How does one study this issue using simulation? Describe the simulation method in general terms, without specific reference to SAS or EXCEL. What outcome of the simulation study would tell you that the violation of assumptions is a problem? Solution: The simulation study will involve simulating data from a process where the null hypothesis is true, but the assumptions are violated. For example, if you are testing equality of means, then you simulate data where the means are equal. There are several assumptions that one can evaluate, including equality of variances, independence and normality. So pick an assumption you want to study, eg, independence, and generate data where the observations are non-independent, eg, with autocorrelation, but where the means are equal. Generate 1000’s of such data sets, and find the proportion of data sets for which the null hypothesis is rejected. If the assumptions are all satisfied, this proportion will be near .05. If the assumptions are violated, it will be different from .05. If the proportion determined from simulations is very far from .05, like 0.6 for example, then the violation of assumptions causes serious problems (see 6.A. where the problem with a .50 type I error rate is discussed). On the other hand, if the type I error rate is estimated to be very close to .05, eg .07, then the violation of the assumptions is not so troublesome. 6.C. (7) Suppose there are two different methods for testing the same research hypothesis, and these methods possibly could give different conclusions. Specifically, in some cases one test might yield the “reject Ho” conclusion and the other might yield the “FTR Ho” conclusion, even though both are used on the same research data. Suppose both tests have Type one error rate approximately 5%. Again, describe a simulation method in general terms, without specific reference to SAS or EXCEL, to study the two test procedures. What outcome of the simulation study will help you to decide which test to use? Solution: When tests have nearly equal type I error rates, then you choose the test with the higher power. So again, simulate data, but this time under the alternative hypothesis. For example, if you are studying the two-sample t test, then generate data where the means are different. Generate thousands of data sets, and find the proportion of data sets for which the null hypothesis is rejected, for each test. Thus you have two estimates of power, one for each test. Use the test with the higher power.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved