Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Comparing Estimating Equations for Gamma and Normal Distributions, Assignments of Statistics

An analysis of the maximum likelihood estimation equations for gamma and normal distributions. The similarities and differences between the equations, including their linearity in the data and the presence of quadratic terms. The document also discusses the relationship between mean and variance for gamma distributions and its implications for the estimating equation.

Typology: Assignments

Pre 2010

Uploaded on 03/18/2009

koofers-user-f9h
koofers-user-f9h 🇺🇸

10 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Comparing Estimating Equations for Gamma and Normal Distributions and more Assignments Statistics in PDF only on Docsity! ST 762, HOMEWORK 1 EXTRA PROBLEM SOLUTIONS, FALL 2007 1. (a) The loglikelihood is log L = n ∑ j=1 {Yj log f(xj , β) − f(xj , β) − log Yj !}. Taking derivatives with respect to β and setting equal to zero gives the estimating equation ∂/∂β log L = n ∑ j=1 {Yjfβ(xj , β)/f(xj , β) − fβ(xj , β)} = n ∑ j=1 f−1(xj , β){Yj − f(xj , β)}fβ(xj , β) = 0. Note that, under the Poisson distribution, var(Yj |xj) = f(xj , β). (b) The loglikelihood is log L = n ∑ j=1 [Yj log f(xj , β) + (kj − Yj) log{1 − f(xj , β)}]. Thus, taking derivatives with respect to β gives ∂/∂β log L = n ∑ j=1 [Yjfβ(xj , β)/f(xj , β) − (kj − Yj)fβ(xj , β)/{1 − f(xj , β)}] = n ∑ j=1 [kjf(xj , β){1 − f(xj , β)}] −1{Yj − kjf(xj , β)}kjfβ(xj , β) = 0. Note that, under the binomial distribution, var(Yj |xj) = kjf(xj , β){1 − f(xj , β)}. (c) Both of these estimating equations are linear in the data Yj . In addition, they both have a specific form, that of the GLS-type equation in (3.2) of the notes. That is, they have the form of a deviation (response−mean) times a gradient and a “weight” equal to the inverse of the variance of the response. This is no accident, as we will see: Both of these distributions are members of a special class with this property. 2. (a) The likelihood is L = n ∏ j=1 Y 1/σ2−1 j exp[−Yj/{σ 2f(xj , β)}] Γ(1/σ2){σ2f(xj , β)}1/σ 2 , so that log L = n ∑ j=1 [(1/σ2 − 1) log Yj − Yj/{σ 2f(xj , β)} − (1/σ 2) log{σ2f(xj , β)} − log Γ(1/σ 2)]. Taking derivatives with respect to β yields the estimating equation ∂/∂β log L = n ∑ j=1 [Yjfβ(xj , β)/{σf(xj , β)} 2 − (1/σ2)fβ(xj , β)/f(xj , β)] = (1/σ2) n ∑ j=1 f−2(xj , β){Yj − f(xj , β)}fβ(xj , β) = 0. 1 (b) Now we have log L = −n log(2π)1/2 − n log σ − n ∑ j=1 log f(xj , β) − 1 2σ2 n ∑ j=1 [Yj − f(xj , β)] 2 f2(xj , β) . Thus, ∂/∂β log L = n ∑ j=1 λβ(xj , β) + (1/σ 2) n ∑ j=1 [Yj − f(xj , β)] 2 f2(xj , β) λβ(xj , β) +(1/σ2) n ∑ j=1 f−2(xj , β){Yj − f(xj , β)}fβ(xj , β) = (1/σ2) n ∑ j=1 f−2(xj , β){Yj − f(xj , β)}fβ(xj , β) + n ∑ j=1 ( [Yj − f(xj , β)] 2 σ2f2(xj , β) − 1 ) λβ(xj , β) = 0. (c) It is straightforward to derive the form of the lognormal density given only the information in the problem, which we do here. If Z = log Y , then the Jacobian of the transformation is 1/Y , and the density of Y is thus n(log Y ; m, γ2)Y −1, where n(·; m, γ2) is the normal density with mean m and variance γ2. Thus, the desired density is (2π)−1/2(γY )−1 exp { (log Y − m)2 2γ2 } . We would like this in terms of the mean and variance of Y . If E(Y ) = f , then using the moment generating function of a normal, we have E(Y ) = E(eZ) = em+γ 2/2 = f. We also have E(Y 2) = E(e2Z) = e2m+2γ 2 , so that var(Y ) = (eγ 2 − 1){E(Y )}2. Thus, σ2 = eγ 2 − 1, and we may deduce that γ2 = log(σ2 + 1), m = log f − log{(σ2 + 1)/2}. Applying this to our problem and ignoring constants, we have log L = − n ∑ j=1 log Yj−(n/2) n ∑ j=1 log{log(σ2+1)}− n ∑ j=1 [log Yj − log f(xj , β) + log{(σ 2 + 1)/2}]2 log(σ2 + 1) fβ(xj , β) f(xj , β) . Taking derivatives with respect to β and simplifying yields the estimating equation n ∑ j=1 [log Yj − log f(xj , β) + log{(σ 2 + 1)/2}] fβ(xj , β) f(xj , β) = 0. (d) The equation in (a) is of exactly the same “GLS” form as those in Problem 1(a) and (b) – linear in the data with “weighting” by the inverse of the variance under the distributional 2 = n ∑ j=1 g−2(β∗, θ, xj){Yj − f(xj , β ∗)}fβ(xj , β ∗) − n ∑ j=1 g−2(β∗, θ, xj)fβ(xj , β ∗)fTβ (xj , β ∗)(β − β∗) + n ∑ j=1 g−2(β∗, θ, xj){Yj − f(xj , β ∗)}fββ(xj , β ∗)(β − β∗) + n ∑ j=1 g−2(β∗, θ, xj)f T β (xj , β ∗)(β − β∗)fββ(xj , β ∗)(xj , β ∗) + n ∑ j=1 GT (β∗, θ, xj)(β − β ∗) × terms involving {Yj − f(xj , β ∗)}, (β − β∗). The first two terms involve the data and are linear in (β − β∗). The third depends on the product of {Yj − f(xj , β ∗)} and (β−β∗), which is expected to be “smaller” for β∗ close to β than the first two terms. The fourth term is quadratic in (β−β∗), so should also be “smaller” that the first two. The remaining terms involve at least products of {Yj − f(xj , β ∗)} and (β − β∗), so are also “smaller.” Thus, as in the argument in Section 3.2, we disregard these terms. Note that the presence of β in the “weights” really doesn’t affect the form of the linear approximation at β∗. We are left with n ∑ j=1 g−2(β∗, θ, xj){Yj−f(xj , β ∗)}fβ(xj , β ∗) ≈ n ∑ j=1 g−2(β∗, θ, xj)fβ(xj , β ∗)fTβ (xj , β ∗)(β−β∗), which may be rewritten in obvious matrix notation as {XT (β∗)W (β∗)X(β∗)}(β − β∗) ≈ XT (β∗)W (β∗){Y − f(β∗)}, yielding the required updating scheme. 6. (a) If you plot the data you will notice that there are two distinct phases of decay. It turns out that my algorithm uses the final 6 observations to fit the “second phase.” The remaining 5 observations are used to fit the “first phase.” The method I used is based on the hint. I start with the last 3 observations (the farthest out in time) and fit a straight line to them by simple linear regression, using log Y as the response. This is based on the fact that, in this region, Y ≈ β3e −β4x, or log Y ≈ log β3−β4x = β ∗ 3 +β ∗ 4x, say. Thus, I obtain estimates for β∗3 and β ∗ 4 . I then construct a 90% prediction interval for log Y at x0 corresponding to the time of the next observation (backward in time). For this, I use the standard prediction interval formula for simple linear regression based on the fit with the n = 3 final values. That is, the estimated standard deviation of the prediction error at x0 is σ̂ { 1 + n−1 (x0 − x̄) 2 ∑n j=1(xj − x̄) 2 } = SD, where x̄ is the mean of the n x values involved in the fit and σ̂2 is the usual estimator for constant variance in simple linear regression based on the n observations. I obtain the 90% prediction interval as (β̂∗3 + β̂ ∗ 4x0) ± t0.95SD, 5 where t0.95 is the t critical value with (n − 2) degrees of freedom with 0.95 area to its right. I then check whether log Y corresponding to x0 us is contained in the interval. If yes, repeat this process with n = 4, including this observation in the simple linear regression fit. If not, stop, and declare the current estimates exp(β̂∗3) and −β̂ ∗ 4 to be starting values for β3 and β4. Now, I note that the model implies that Y − β3e −β4x ≈ β1e −β2x. This suggests that log(Y − β3e −β4x) ≈ log β1 − β2x = β ∗ 1 + β ∗ 2x. I thus form “residuals” Yj − e β̂∗ 3 + ˆβ∗ 4 x = rj for all observations not included in the final second phase fit (thus thought to be in the first phase) and regress log(rj) on xj in the first phase to obtain estimates for β∗1 and β ∗ 2 and hence for β1 and β2. The starting values are thus (eβ̂ ∗ 1 ,−β̂∗2 , e β̂∗ 3 ,−β̂∗4). This is implemented in the program. (b) For these data, I obtained start values (2.608, 2.743, 0.309, 0.310). Using these, I obtained the IRWLS estimate for β within 7 iterations. Comparing these values to the final estimates shows that this ad hoc method did a pretty good job of producing starting values that are “in the ballpark.” 6
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved