Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Logistic Regression: Inference and Goodness of Fit, Exams of Statistics

Methods for testing hypotheses and checking goodness of fit in logistic regression. Topics include the likelihood ratio test, confidence intervals for logit(π(x)), pearson's chi-square and g2 statistics, and the hosmer-lemeshow test. The document also covers logit models with categorical predictors and the cohran-armitage test for trend.

Typology: Exams

Pre 2010

Uploaded on 07/30/2009

koofers-user-63q
koofers-user-63q 🇺🇸

10 documents

1 / 7

Toggle sidebar

Related documents


Partial preview of the text

Download Logistic Regression: Inference and Goodness of Fit and more Exams Statistics in PDF only on Docsity! Chapter 5: Logistic Regression (LR) Lecture 2 Inference for Logistic Regression: To test H0 : β = 0, the hypothesis of independence of Y and X. Use various large sample results based on Wald, likelihood ratio theory or score test to make inference regarding β and π(x). The Likelihood ratio based test is preferred over wald as it uses more infor- mation (Log-lik under H0 as well as well at β̂). One can draw inference regarding π(x) at x = x0. To find 100(1 − α)% for π(x0), we proceed as follows: 1) obtain an estimate of V ar( ̂logit(π(x0)), available from SAS, 2)Obtain the CI for logit(π(x0)), let L and U be the limits 3) The CI for π(x0) is ( eL 1+eL , eU 1+eU ). One can also get confidence band around the predicted curve for π(x). For grouped data, one can also use the simple binomial proportion to get the CI for πi = π(xi) at any given xi. For example, if the total number of successes at xi is yi out of the total number of trials ni, then π̂ = yi/ni. Then, a 100(1 − α)% CI for πi is π̂i ± zα/2 √ [π̂i(1− π̂i)/ni]. This interval is wider than the interval based on the logistic model provided that model the holds. 1 Checking Goodness of fit: Two methods to check goodness of fit of the model 1) compare the model with a more complex model involving nonlinear and interaction terms using likelihood ratio test 2) For grouped data (specially with categorical predictors), compute the esti- mated expected number of successes and failures at each level of x. Compute Pearson’s chi-square or G2 statistic to compare the observed and expected frequencies. With fixed number of levels of x, as the fitted frequencies in- crease, the X2 and G2 have asymptotic chi-square distributions. The df = # of levels of x − # of parameters estimated. 3)Test based on groupings of X variable If x variable is quantitative, one can group the data into intervals of equal length based on x. and obtain the observed and expected frequencies for each interval. The total expected frequency in a group is the sum of π̂(x) for all x in that group. Similarly, the expected # of failures is the sum of (1− π̂(x)) for all the x’s in that group. The Pearson’s or G2 statistic can be calculated. The df= Total # of groups - the # of parameters estimated. This procedure becomes impractical if the number of predictors increases. 2 Case 2: When the model has no effects, that is, when β1 = β2 = · · · = βI , then π1 = π2 · · · = πI , in that case Y and X are independent. Dummy Variable Representation:— The above model can also be writ- ten in terms of I − 1 dummy variables as logit(πi) = α + β1x1 + β2x2 + · · ·+ βI−1xI−1 where xi = 1 if an observation is in row i, and is zero otherwise. This corresponds to the constraint βI = 0. One could use the constraint β1 = 0 which would mean a model in terms of dummy variables x2 through xI with x1 = 0. One could also use the constraint ∑ βi = 0. Regardless of the constraint on β’s, the value of {α̂+ β̂i} are the same, hence {π̂} are the same. For the pair of rows (a, b) of X, β̂a − β̂b is the same and represents estimated log(odds ratio). eβ̂a−β̂b represents the estimated odds ratio of row a of x to row b of x. Linear Logit Model for I × 2 Table for Ordered X Categories Suppose the categories of X are ordinal, and scores {x1, x2, · · · , xI} describe distances between the categories of X. When there is a monotone effect of 5 X on Y , one fits the linear logit model logit(πi) = α + βxi For the independence β = 0. This is a more parsimonious model. Cohcran-Armitage Test of Trend: Consider I×2 table with I independent binomials: Yi ∼ bin(ni, πi). Cohcran and Armitage fitted a linear probability model by least-squares πi = α + βxi The hypothesis of independence is H0 : β = 0. Define x̄ = ∑ i nixi/n, pi = yi/ni, p = ∑ yi/n. The least squares-equation is π̂i = p + b(xi − x̄) with b = ∑ i ni(pi− p)(xi− x̄)/ ∑ i ni(xi− x̄)2. Let X2(I) denote the Pearson’s statistic for testing independence, then X2(I) = 1 p(1− p) ∑ i ni(pi − p)2 = z2 + X2(L) with X2(L) = 1 p(1− p) ∑ i ni(pi − π̂i)2 6 and z2 = b2 p(1− p) ∑ i ni(xi − x̄)2 When the linear probability model holds, X2(L) has an asymptotic chi-square distribution with I − 2 df, which tests the fit of the model. The statistic z2 tests H0 : β = 0 with 1 df, which is the test for linear trend in the proportions. This test is called Cochran-Armitage test for trend. This test is equivalent to the score test for testing H0 : β = 0 in the linear logit model. This statistic also relates to the statistic M 2 used to test for trend in I × J table introduced earlier. 7
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved