Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Lecture Notes on Outliers - General Statistics | PSYC 5741, Study notes of Statistics

Material Type: Notes; Professor: Judd; Class: GENERAL STATISTICS; Subject: Psychology; University: University of Colorado - Boulder; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-r4w
koofers-user-r4w 🇺🇸

10 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Lecture Notes on Outliers - General Statistics | PSYC 5741 and more Study notes Statistics in PDF only on Docsity! Handout from Psych 5741/5751 University of Colorado used with Judd, C.M., & McClelland, G.H. (1989). Data Analysis: A Model Comparison Approach. HBJ. Statistics:Grad Stat:Chapt9 — 1 — April 13, 1996 Brief Lecture Notes Chapter 9: Outliers Until now, DATA have been well-behaved In Chapt 16 we will deal with ill-behaved data with heterogeneous variances, non-normal distributions, etc. Here: We noted in Chapt 2 that SSE and estimators which minimize SSE are very sensitive to outliers or wild observations. We had best make sure we don't have any outliers. With outliers, regression estimates can be very misleading. Outliers are extreme observations that for one reason or another do not belong with the other observations in DATA. (vague!) Why they are a problem: bias or "grab" parameter estimates inflate SSE, thereby making it difficult to detect reductions in SSE due to other factors often not obvious that this has happened example from Chapter 2: 1 3 5 9 14 mean = 6.4, MSE=s^2 = 26.8 [0, 12.8] 1 3 5 9 140 mean=31.6, MSE=s^2=3680.8 [-43.7,106.9] Handout from Psych 5741/5751 University of Colorado used with Judd, C.M., & McClelland, G.H. (1989). Data Analysis: A Model Comparison Approach. HBJ. Statistics:Grad Stat:Chapt9 — 2 — April 13, 1996 Causes: 1. "Klinkers" (Abelson) data recording or data entry errors. use of computers make these more likely. Lou's energy study example. Should always be fixed. Need computers to help look for them. 2. Two kinds of cases. (or errors from two bags of error tickets) Math score example from Ex 5.2, p. 74 (typo, book says Ex 4.2). Outliers can provide clues to better MODELs. Need techniques for finding outliers so they can be examined with great care. 3. Thick tails of error distributions. Robust to non- normality but not thick tails, Ex. 9.1, p. 210. With thick tails, extreme observations occur more frequently than they should. What to do about outliers? CONTROVERSIAL! ignoring them is never acceptable. to do nothing is the equivalent of making a decision about their appropriateness in the analysis. will often end up with MODEL that describes essentially none of the DATA—neither the outliers nor the bulk of the DATA report MODEL with and without outliers included do analysis to see if outliers significantly different from others in MODEL Handout from Psych 5741/5751 University of Colorado used with Judd, C.M., & McClelland, G.H. (1989). Data Analysis: A Model Comparison Approach. HBJ. Statistics:Grad Stat:Chapt9 — 5 — April 13, 1996 LEVERAGE is how much an observation influences its own prediction. LEVER = h ii. For mean LEVER = h ii = 1/n For simple regression: hij = 1 n + (X i1 − X 1 )(X j1 − X 1 ) SSX So LEVER = hii = 1 n + (Xi1 − X 1 ) 2 SSX For Multiple Regression with two predictors LEVER = hii = 1 n + X1i 2 X2 2∑ − X1i X2 i X1X2∑ X1 2∑ X22∑ − X1X2∑( )2 + X2 i 2 X1 2∑ − X1i X2 i X1 X2∑ X1 2∑ X22∑ − X1X2∑( )2 (assuming mean deviation form for both predictors). If there is no redundancy, then this reduces to hii = 1 n + X1i 2 X1 2∑ + X2 i 2 X2 2∑ (Illustrate with two-sample t-test?) Handout from Psych 5741/5751 University of Colorado used with Judd, C.M., & McClelland, G.H. (1989). Data Analysis: A Model Comparison Approach. HBJ. Statistics:Grad Stat:Chapt9 — 6 — April 13, 1996 Evaluating LEVERs 0 ≤ h ii ≤ 1 h ii i=1 n ∑ = PA ⇔ h ii = PA n Tells us how much of a parameter is dedicated to the prediction of a single observation! 1/h "equivalent number of observations" involved in the determination of Yhat. (e.g., for two-sample t-test, half obs for Yhats from one group and half the obs for Yhats in the other group) ___________________________________________________ Is Y i Unusual? ei = Y i − ˆ Y i Difficult to interpret 1. need standardization 2. paradox in allowing outlier to determie model really want to ask if Y k is unusual WRT to a MODEL based on all the other observations such a statistic is the studentized deleted residual Handout from Psych 5741/5751 University of Colorado used with Judd, C.M., & McClelland, G.H. (1989). Data Analysis: A Model Comparison Approach. HBJ. Statistics:Grad Stat:Chapt9 — 7 — April 13, 1996 rationale: Outlier Model MODEL A: Y i = 0 + 1X i + i i ≠ k Yi = 0 + 1Xi + 2 + i i = k MODEL C: Y i = 0 + 1 Xi + i ∀i OR, equivalently, X2 = 1, if k - th observation; 0 otherwise A: Y i = 0 + 1X1 + 2 X2 + i C: Y i = 0 + 1X1 + i Example: leaving out 6th obs SAT = 6.71 + .50 HSRANK + 55.49 X[6] SAT = 96.55 - .50 HSRANK PRE = .68, F*[1,10] = 21.4, p < .01 - - - - - - - - - - - - - - - - - - - - - - - leaving out 1st obs SAT = 95.71 - .48 HSRANK - 10.67 X[1] SAT = 96.55 - .50 HSRANK PRE = .096, F*[1,10] = 1.06, n.s. (see Ex 9.6, p. 223)
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved