Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Regression II: Analysis and Diagnostics - Prof. Michae Hudgens, Study notes of Data Analysis & Statistical Methods

A portion of lecture notes from a regression ii course, covering topics such as anova, matrix formulation, two-sample t-test, diagnostics, and measurement error. It includes formulas, examples, and results from statistical analyses.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-8jv
koofers-user-8jv 🇺🇸

10 documents

1 / 44

Toggle sidebar

Related documents


Partial preview of the text

Download Regression II: Analysis and Diagnostics - Prof. Michae Hudgens and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity! Regression II Bios 662 Michael G. Hudgens, Ph.D. mhudgens@bios.unc.edu http://www.bios.unc.edu/∼mhudgens 2007-10-08 13:11 BIOS 662 1 Regression II Outline • ANOVA • Matrix formulation • Two-sample t-test • Diagnostics • Measurement error BIOS 662 2 Regression II Analysis of Variance • From Cochran’s theorem: If β1 = 0, SSR σ2 ∼ χ21 ⊥ SSE σ2 ∼ χ2N−2 cf Neter et al p.76 (1996) • Thus t2 = SSR/1 SSE/(N − 2) ∼ F1,N−2 BIOS 662 5 Regression II Analysis of Variance • For H0 : β = 0 vs HA : β 6= 0, can use F with Cα = {F : F > F1−α;1,N−2} • For two sided alternative F and t tests equivalent • For one sided alternative, use t BIOS 662 6 Regression II Analysis of Variance • ANOVA table: Source df SS MS F Regression 1 SSR SSR MSR/MSE Residual N − 2 SSE SSE/(N − 2) Total N − 1 SST BIOS 662 7 Regression II Matrix Formulation • Therefore β̂ = (X ′X)−1X ′Y • Can also show SST = Y ′Y − 1 N Y ′JY SSR = β̂ ′ X ′Y − 1 N Y ′JY SSE = Y ′Y − β̂′X ′Y where J is an n× n matrix of 1’s BIOS 662 10 Regression II Linear Regression and 2 Sample t-test • Define X =  1 if calcium 0 if placebo •X is called an indicator or dummy variable • Model Y = α + βX +  BIOS 662 11 Regression II Linear Regression and 2 Sample t-test • Suppose we have 2 groups of observations: Y1i for i = 1, . . . , n1 and Y2i for i = 2, . . . , n2 • Recall test statistic t = Ȳ1 − Ȳ2 sp √ 1/n1 + 1/n2 where s2p = (n1 − 1)s21 + (n2 − 1)s 2 2 N − 2 BIOS 662 12 Regression II Linear Regression and 2 Sample t-test • Therefore: t = β̂√ s2Y ·X/ ∑ i(Xi − X̄)2 = Ȳ1 − Ȳ2 sp √ N/(n1n2) BIOS 662 15 Regression II Linear Regression and 2 Sample t-test • Example: Body fat in Native American children • Percent body fact (PBF) measured by bioelectric impedance and skinfolds • Two tribes: Apache (mountains) and O’Odham (desert) • Question: Is the mean PBF the same in Apache and O’Odham children? • Samples: O’Odham (n = 63); Apache (n = 35) BIOS 662 16 Regression II Linear Regression and 2 Sample t-test • Two sample t-test results: Two-sample t test with equal variances 1: Number of obs = 35 2: Number of obs = 63 ------------------------------------------------------------------------------ Variable | Mean Std. Err. t P>|t| [95% Conf. Interval] ---------+-------------------------------------------------------------------- Apache | 32.77771 1.163163 28.1798 0.0000 30.41388 35.14154 O’Odham | 37.92245 1.09047 34.7762 0.0000 35.74263 40.10227 ---------+-------------------------------------------------------------------- diff | -5.144741 1.701678 -3.02333 0.0032 -8.522545 -1.766937 ------------------------------------------------------------------------------ Degrees of freedom: 96 BIOS 662 17 Regression II Diagnostics • Assumptions for linear regression 1. Relation between X and Y is linear 2. (Yi, Xi) ⊥ (Yj, Xj) for all i, j 3. X ’s are fixed constants 4. i independent ∼ N(0, σ2) for all i BIOS 662 20 Regression II Diagnostics • Assumptions: Linear model and homogeneity of vari- ance • Residual plot: Scatterplot of (Ŷi, ri) = (Ŷi, Yi − Ŷi) • If we see lack of homogeneity of variance or linearity, consider transformations; See Table 10.28 (page 399) of text BIOS 662 21 Regression II Diagnostics • The following three slides are prototypical residual plots indicating 1. linear regression model is appropriate 2. assumption of linearity questionable 3. assumption of constant variance questionable BIOS 662 22 Regression II Regression: Residuals ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 50 100 150 − 40 − 20 0 20 40 Fitted values R es id ua ls ( r) BIOS 662 25 Regression II Regression: Example • FEV and age if sex=1 Source | SS df MS Number of obs = 336 -------------+------------------------------ F( 1, 334) = 641.57 Model | 221.896 397 1 221.896397 Prob > F = 0.0000 Residual | 115.518401 334 .345863477 R-squared = 0.6576 -------------+------------------------------ Adj R-squared = 0.6566 Total | 337.414798 335 1.00720835 Root MSE = .5881 ------------------------------------------------------------------------------ FEV | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .2734776 .0107969 25.33 0.000 .2522391 .2947161 _cons | .0736006 .1127891 0.65 0.514 -.1482659 .2954671 ------------------------------------------------------------------------------ BIOS 662 26 Regression II Regression: Example ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 − 1. 5 − 1. 0 − 0. 5 0. 0 0. 5 1. 0 1. 5 2. 0 Fitted values R es id ua ls ( r) BIOS 662 27 Regression II Normality Diagnostics • Assumption: i’s are normally distributed • This assumption is not as important if N is large (CLT) • Inference robust to small departures from normality • Violations of other assumptions can suggest non-normality • Tests of normality on residuals; beware lack of power • qq-plot, histogram, boxplot of residuals BIOS 662 30 Regression II Normality Diagnostics: FEV ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 − 1. 5 − 1. 0 − 0. 5 0. 0 0. 5 1. 0 1. 5 2. 0 Normal Q−Q Plot Theoretical quantiles Q ua nt ile s of r es id ua ls ( r) Residuals D en si ty −1 0 1 2 0. 0 0. 2 0. 4 0. 6 0. 8 ● ● ● ● ● ● ● − 1. 5 − 1. 0 − 0. 5 0. 0 0. 5 1. 0 1. 5 2. 0 BIOS 662 31 Regression II Normality Diagnostics: log(FEV) ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● −3 −2 −1 0 1 2 3 − 0. 6 − 0. 4 − 0. 2 0. 0 0. 2 0. 4 0. 6 Normal Q−Q Plot Theoretical quantiles Q ua nt ile s of r es id ua ls ( r) Residuals D en si ty −0.6 −0.4 −0.2 0.0 0.2 0.4 0.6 0. 0 0. 5 1. 0 1. 5 2. 0 ● − 0. 6 − 0. 4 − 0. 2 0. 0 0. 2 0. 4 0. 6 BIOS 662 32 Regression II Regression: X random • Now βY ·X = Cov(Y,X) V (X) • Proof: recall Cov(a + bW,U) = bCov(W, U) and Cov(W, U + V ) = Cov(W, U) + Cov(W, V ) BIOS 662 35 Regression II Regression: X random • Thus Cov(Y,X) = Cov(α + βY ·XX + , X) = βY ·XCov(X, X) + Cov(, X) = βY ·XV (X) BIOS 662 36 Regression II Measurement Error • Instead of observing X , we observe W = X + U where U is a RV with E(U) = 0, V (U) = τ2 U ⊥ X, U ⊥ Y • Then Cov(W, Y ) = Cov(X + U, Y ) = Cov(X, Y ) + Cov(U, Y ) = Cov(X, Y ) BIOS 662 37 Regression II Measurement Error • Thus if X is not determined precisely, we underestimate the strength of association between X and Y • Reliability coefficient of X : Rel = δ2 δ2 + τ2 • If Rel is known, β̃ = R−1el β̂Y ·W is an unbiased estimator of βY ·X BIOS 662 40 Regression II Measurement Error • Since V (β̃) = R−2el V (β̂Y ·W ) the t-statistic for testing H0 : βY ·X = 0 is tY ·X = β̃√ V (β̃) = R−1el β̂Y ·W√ R−2el V (β̂Y ·W ) = tY ·W BIOS 662 41 Regression II Measurement Error • Suppose there are k measures of W made on each person in the study • It can be shown that V (W̄k) = δ 2 + τ2 k • Therefore βY ·W̄k = δ2 δ2 + τ2/k βY ·X → βY ·X as k →∞ BIOS 662 42 Regression II
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved