Download Statistical Hypothesis Testing Quiz for STAT 513, Fall 2007 and more Quizzes Statistics in PDF only on Docsity! STAT 513, FALL, 2007 QUIZ 2 Ground rules: You must work alone on this quiz. Do not collaborate with anyone, either within or outside the class, to obtain answers or even hints. Questions of clarification should be directed to the instructor; in addition, the use of the Internet should be avoided. This quiz contains 5 questions, each of equal weight, making the quiz worth 50 points. 1. The distribution of starting salaries (in $1000s) of USC undergraduates majoring in statistics is well modeled by a Pareto distribution. Suppose that Y1, Y2, ..., Yn is an iid sample of n recent graduates, modeled as arising from a Pareto pdf of the form fY (y; θ) = { θνθy−(θ+1), y > ν 0, otherwise. where θ > 1. The parameter ν represents the minimum starting salary; we will assume ν = 35 (known). Find the form of the most powerful level α rejection region to test H0 : θ = 2 versus Ha : θ = 3. Explain, in as much detail as possible, how to find the critical value for the test; you don’t have to express the critical value in terms of a well known probability distribution. Just tell me how one would find it. 2. Suppose that Y1, Y2, ..., Yn is an iid sample of size n from fY (y; θ), where fY (y; θ) = { θ2ye−θy, y > 0, θ > 0 0, otherwise. (a) Derive the UMP level α test for H0 : θ = θ0 versus Ha : θ < θ0. The rejection region for this test should depend on a χ2 quantile. (b) Plot the power function of the test, K(θ), when n = 15, α = 0.01 and θ0 = 10.4. Try to use R to plot the power function. 3. Suppose that Y1, Y2, ..., Yn is an iid sample of size n from a beta distribution with parameters α = 1 and β = θ. (a) Derive the level α likelihood ratio test of H0 : θ = 1 versus Ha : θ 6= 1. Explicitly state how all critical values are chosen. Don’t confuse the Type I Error probability α here with the “α” parameter in the beta model. (b) For extra credit, derive the power function K(θ) and plot it when n = 10 and the Type I Error probability is 0.10. If you can not do this, you can still take a guess of what you think the power function would look like. 4. Suppose that we have two independent samples: Sample 1 : Y11, Y12, ..., Y1n ∼ iid Poisson(θ1) Sample 2 : Y21, Y22, ..., Y2n ∼ iid Poisson(θ2), and that we are interested in comparing the population means θ1 and θ2. (a) Show that the loglikelihood function of θ ≡ (θ1, θ2)′ is given by ln L(θ|y1, y2) ≡ ln L(θ1, θ2|y1, y2) = n∑ j=1 y1j ln θ1 + n∑ j=1 y2j ln θ2 − n(θ1 + θ2) + c, PAGE 1 STAT 513, FALL, 2007 QUIZ 2 where c is a constant that depends neither on θ1 nor θ2, and yi = (yi1, yi2, ..., yin) ′, for i = 1, 2. (b) Derive the form of the rejection region for the level α LRT of H0 : θ1 = θ2 versus Ha : θ1 6= θ2. Here, the parameter space Ω = {θ : θ1 > 0, θ2 > 0} and the null space is Ω0 = {θ : θ1 > 0, θ2 > 0, θ1 = θ2}. Note that when H0 is true, we can envision Y11, Y12, ..., Y1n, Y21, Y22, ..., Y2n as an iid sample of size 2n from a Poisson distribution with mean θ, say, where θ = θ1 = θ2; that is, finding the MLE over the restricted space requires only a univariate maximisation procedure (over Ω, it requires a two-variate procedure since there are two free parameters). State explicitly how all rejection region critical values are obtained. Do not use the two-sample test based on the large-sample distribution of Y 1+ − Y 2+, as in Section 1.3 of the course notes. Do not use a test based on the t distribution either. (c) Use the large-sample approximation to the likelihood ratio statistic λ (actually to −2 ln λ) to test H0 : θ1 = θ2 versus Ha : θ1 6= θ2 with the following data: Left: 0 1 1 2 2 1 1 4 0 3 Right: 0 1 4 3 7 2 0 1 3 3 All theoretical regularity conditions are satisfied for the approximation to hold. What is your conclusion? Note: These data represent the number of weeds growing in square-foot plots of ground in my left and right “wooded” areas in my back yard. That is, I have two wooded areas in my back yard; one on the left side of the yard, and one on the right side. Ten plots (i.e., their locations) were randomly selected from each side. The counts correspond to the number of weeds per plot. Assume that the Poisson model holds (it’s likely reason- able). Do the left and right sides of my back yard have different weed levels? (I know the answer; the question is whether or not the data support it). 5. Oil-drilling technology is improving every day; however, finding productive wells among prospective sites is not an exact science. In region i, let p denote the probability of finding a productive well, and let Yi denote the number of sites in the region that are drilled to find the first productive well in that region. A large excavation study is conducted in n regions. Conditional on p, where 0 < p < 1, we will model the data Y1, Y2, ..., Yn as an iid sample from a geometric distribution with pmf fY (y; p) = { p(1− p)y−1, y = 1, 2, ... 0, otherwise. Furthermore, because of region-to-region variability, we assume that p is best regarded as a random variable with a beta(α, β) prior distribution, where α and β are both constants larger than zero. (a) Geologists know that p is not large; in fact, they have suggested that p is close to 0.10. Pick a beta prior that seems reasonable in this situation (that is, give me appropriate values of α and β). PAGE 2