Download Maximum Likelihood Estimation and Hypothesis Testing and more Study notes Econometrics and Mathematical Economics in PDF only on Docsity! MLE-1 REVIEW OF MAXIMUM LIKELIHOOD ESTIMATION [1] Maximum Likelihood Estimator (1) Cases in which θ (unknown parameter) is scalar. Notational Clarification: • From now on, we denote the true value of θ as θo. • Then, view θ as a variable. Definition: (Likelihood function) • Let {x1, ... , xT} be a sample from a population. • It does not have to be a random sample. • xt is a scalar. • Let f(x1,x2, ... , xT,θo) be the joint density function of x1, ... , xT. • The functional form of f is known, but not θo. • Then, LT(θ) ≡ f(x1, ... , xT, θ) is called “likelihood function”. • LT(θ) is a function of θ given x1, ... , xT. • The functional form of f is known, but not θo. Definition: (log-likelihood function) lT(θ) = ln[f(x1, ... , xT,θ)]. Example: • {x1, ... , xT}: a random sample from a population distributed with f(x,θo). • f(x1, ... , xT, θo) = ∏ . = T t ot xf 1 ),( θ → LT (θ) = f(x1, ... , xT, θ) = ∏ = T t t xf 1 ),( θ . → lT(θ) = ( )∏ =Tt txf1 ),(ln θ = ),(ln θtt xfΣ . Definition: (Maximum Likelihood Estimator (MLE)) MLE MLEθ̂ maximizes lT(θ) given data points x1, ... , xT. Example: • {x1, ... , xT} is a random sample from a population following a Poisson distribution [i.e., f(x,θ) = e-θθx/x! (suppressing subscript “o” from θ)]. • Note that E(x) = var(x) = θo for Poisson distribution. • lT(θ) = Σtln[f(xt,θ)] = -θT + (ln(θ))Σtxt - Σtxt! • FOC of max.: 01/ =Σ+−=∂∂ ttT xT θ θ . • Solving this, MLEθ̂ = T xttΣ = x . MLE-2 (3) Extension to Conditional density Definition: • Conditional density of yt: ( | , )t o tf y xθ , θ = [θ1,θ2, ... , θp]′. • 1( ) ( | , ) T T t tL tf y xθ θ== Π . • lT(θ) = 1( ) ln( ( | , )) T T t tL tf y xθ θ== Σ . Example: • Assume that ( , )t ty x′i iid and ( | ) ~ ( , )t t t of y x N x vβ′i i . • f(yt|xt,β,v) = 2 1 1( | , , ) exp ( ) 22t t t t f y v x y x vv β β π ⎛ ⎞′= − −⎜ ⎟ ⎝ ⎠ i i . • 2 ( , ) ln ( | , , ) 1ln(2 ) ln ( ) 2 2 2 1ln(2 ) ln ( ) ( ) 2 2 2 T t t t t t t l v f y v x T T v y x v T T v y X y X v β β π β π β β = Σ ′= − − − Σ − ′= − − − − − i . • Therefore, we have the following likelihood function of y. • FOC: (i) ∂lT(β,v)/∂β = -(1/2v)[-2X′y + 2X′Xβ] = 0k×1. (ii) ∂lT(β,v)/∂v = -(T/2v) + (1/2v2)(y-Xβ)′(y-Xβ) = 0. • From (i), X′y - X′Xβ = 0k×1 → MLEβ̂ = (X′X) -1X′y = . β̂ From (ii), MLEv̂ = SSE/T. • Thus, we can conclude that β̂ and s2 = SSE/(T-k) are asymptotically efficient. MLE-5 [2] Large Sample Properties of the ML estimator Definition: 1) Let g(θ) = g(θ1, ... , θp) be a scalar function of θ. Let gj = ∂∂g/∂θj. Then, 1 2 : p g gg g θ ⎛ ⎞ ⎜ ⎟ ∂ ⎜ ⎟= ⎜ ⎟∂ ⎜ ⎟ ⎝ ⎠ . 2) Let w(θ) =(w1(θ), ... , wm(θ))′ be a m×1 vector of functions of θ. Let wij = ∂wi(θ)/∂θj. Then, 11 12 1 21 22 2 1 2 ... ...( ) : : : ... p p m m mp m p w w w w w ww w w w θ θ × ⎡ ⎤ ⎢ ⎥∂ ⎢ ⎥= ′ ⎢ ⎥∂ ⎢ ⎥ ⎣ ⎦ . 3) Let g(θ) be a scalar function of θ where gij = ∂2g(θ)/∂θi∂θj. Then, 11 12 1 2 21 22 2 1 2 ... ...( ) : : : ... p p p p pp p p g g g g g gg g g g θ θ θ × ⎛ ⎞ ⎜ ⎟ ∂ ⎜ ⎟= ⎜ ⎟′∂ ∂ ⎜ ⎟ ⎝ ⎠ . → Called Hessian matrix of g(θ). MLE-6 Example 1: Let g(θ) = θ12 + θ22 + θ1θ2. Find ∂g(θ)/∂θ. ( )g θ θ ∂ ∂ = 1 2 2 1 2 2 θ θ θ θ +⎛ ⎞ ⎜ ⎟+⎝ ⎠ . Example 2: Let 2 1 2 2 1 2 ( )w θ θ θ θ θ ⎛ ⎞+ = ⎜ ⎟+⎝ ⎠ . 1 2 2 1( ) 1 2 w θθ θθ ⎛ ⎞∂ = ⎜ ⎟′∂ ⎝ ⎠ . Example 3: Let g(θ) = θ12 + θ22 + θ1θ2. Find the Hessian matrix of g(θ). 2 2 1( ) . 1 2 g θ θ θ ⎛ ⎞∂ = ⎜ ⎟′∂ ∂ ⎝ ⎠ Some useful results: 1) c′: 1×p, θ: p×1 (c′θ is a scalar) → ∂(c′θ)/∂θ = c ; ∂(c′θ)/∂θ′ = c′. 2) R: m×p, θ: p×1 (Rθ is m×1) → ∂(Rθ)/∂θ = R 3) A: p×p symmetric, θ: p×1 (θ′Aθ) → ∂(θ′Aθ)/∂θ = 2Aθ. → ∂(θ'Aθ)/∂θ′ = 2θ'A → ∂(θ′Aθ)/∂θ∂θ′ = 2A. MLE-7 2 2 2 2 3 ( ) ( ) ( ) ( ) 2 t t T t t t t T x vH x T x v μ νθ μ μ ν ν Σ −⎛ ⎞ ⎜ ⎟ − = ⎜ ⎟ Σ − Σ −⎜ ⎟− +⎜ ⎟ ⎝ ⎠ 2 0 ˆˆ( ) 0 ˆ2 ML T ML ML T v H T v θ ⎡ ⎤ ⎢ ⎥ ⎢ ⎥− = ⎢ ⎥ ⎢ ⎥⎣ ⎦ . Hence, 2 ˆ 0ˆˆ , ˆ ˆ20 ML oML oML ML v TN vv v T μμ θ ⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟⎛ ⎞⎛ ⎞ = ≈ ⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ ⎝ ⎠ ⎝ ⎠⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠ . MLE-10 [3] Testing Hypotheses Based on MLE General form of hypotheses: • Let w(θ) = [w1(θ),w2(θ), ... , wm(θ)]′, where wj(θ) = wj(θ1, θ2, ... , θp) = a function of θ1, ... , θp. • Ho: The true θ (θo) satisfies the m restrcitions, w(θ) = 0m×1 (m ≤ p). Definition: (Restricted MLE) Let θ be the restricted ML estimator which maximizes lT(θ) s.t. w(θ) = 0. Wald Test: 1ˆ ˆ ˆ ˆ( ) '[ ( ) ( ) ( ) ] ( )TW w W Cov W w ˆθ θ θ θ −′= θ . If θ̂ is a (unrestricted) ML estimator, 1 1ˆ ˆ ˆ ˆ( ) [ ( ){ ( )} ( ) ] ( )T TW w W H W w ˆθ θ θ θ − −′ ′= − θ . Note: Can be computed with any consistent estimator θ̂ and ˆ( )Cov θ . Likelihood Ratio Test: (LR) LRT = 2[lT(θ̂ ) - lT(θ )] . Lagrangean Multiplier (LM) test Define ( )( ) TT ls θθ θ ∂ = ∂ . Then, LMT = sT(θ )′[-HT(θ )]-1sT(θ ). MLE-11 Theorem: Under Ho: w(θ) = 0, WT, LRT, LMT →d χ2(m) . Implication: • Given significance level (α), find a critical value from χ2 table. • Usually, α = 0.05 or α = 0.01. • If WT > c, reject Ho. Otherwise, do not reject Ho. Comments: 1) Wald needs only θ̂ ; LR needs both θ̂ and θ ; and LM needs θ only. 2) In general, WT ≥ LRT ≥ LMT. 3) WT is not invariant to how to write restrictions. That is, WT for Ho: θ1 = θ2 may not be equal to WT for Ho: θ1/θ2 = 1. Example: (1) {x1, ... , xT}: RS from N(μo,vo) with vo known. So, θ = μ. Ho: μ = 0. • w(μ) = μ. • lT(μ) = -(T/2)ln(2π) - (T/2)ln(vo) - {1/(2vo)}Σt(xt-μ)2. • sT(μ) = (1/vo)Σt(xt-μ). • ( )T o TH v μ− = . MLE-12