Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistics: Scales, Descriptive Statistics, and Regression - Prof. Christopher Hertzog, Study notes of Psychology

Various statistical concepts including nominal, interval, and ratio scales, descriptive statistics such as mean, median, mode, variance, and standard deviation, and regression analysis. Topics include central tendency, frequency histograms, confidence intervals, hypothesis testing, and correlation. Also discussed are concepts related to multiple regression, mediational analysis, and moderator analysis.

Typology: Study notes

2011/2012

Uploaded on 02/21/2012

ch01400
ch01400 🇺🇸

4.3

(3)

48 documents

1 / 19

Toggle sidebar

Related documents


Partial preview of the text

Download Statistics: Scales, Descriptive Statistics, and Regression - Prof. Christopher Hertzog and more Study notes Psychology in PDF only on Docsity!  A scale that represents different qualitative types or categories with numbers (for example, 1 = boys, 2 = girls) is a(n) nominal scale  An interval scale is one that has the property that a higher scale score represents a greater amount of attribute than a lower scale score and is a linear transformation of the true attribute dimension.  For continuous RV, the probability of occurrence of any exact value of X may be regarded as 0.  Consider a sample set of data = [0, 0, 0, 25, 50]. The mean of these data is 15.  . If one is concerned above sensitivity of a central tendency measure to outliers, the best descriptive index to use is the ________  Which of the following is a measure of dispersion? a. harmonic mean b. standard deviation c. 80th percentile d. a frequency histogram e. none of the above  The expected value of deviation scores Xi – M, is 0  The SD may be interpreted as the average deviation of a score from the mean.  After transformation to a z-score, any variable will have a normal distribution.  The expected value of the sample mean, M is o M= ∑x i N  The variance of the sampling distribution of the mean depends on  The expectation of the sample variance, S2, is  The standard error of the mean can be interpreted as σM= σ √N Absolute Deviation  The data value – the mean. Areas under (percentiles of) the Normal Distribution Boxplots  Made up of a box and 2 whiskers.  The box shows o The median o The upper and lower quartile o The limits within which the middle 50% of scores lie.  The whiskers show o The range of scores o The limits within which the top and bottom 25% of scores lie.  Top 25% Middle 50% Bottom 25% Top 25% Middle 50% Degrees of freedom  The concept of how many opportunities there are for data to vary independently.  Consider a deviation score, d i=X i−M  BecauseM= ∑X i N , there is a constraint placed on the deviation scores— once I know N-1 of them, the last deviation MUST ¿M−(∑d i)  There are N-1 opportunities for deviation scores to vary freely about the mean, M.  Because s2 is based on the sum of the squared deviation scores. where the deviation is computed as deviations from the sample mean, M, computed from the same data, there are N-1 degrees of freedom associated with s2 (or s). Descriptive Statistics  Central Tendency: o Mean: arithmetic average of scores o Median: 50% percentile score; the middle score when scores are ordered  Appropriate for ordinal scales and above. o Mode: Most frequent score  Dispersion: o Range: difference between high and low scores o Interquartile Range: difference between 25% and 75% o Variance: ∑squared deviations N−1 o SD: √Variance  SD=[ 1N ∑ (x i−M ) 2] 1 2 Discrete and Continuous RV:  Discrete: If a RV can assume only a particular finite or countably infinite set of values, it is said to be a discrete RV.  Continuous: associates probability distribution of measures variable X in a continuous form, P(a<X<b) defined by interval (a<X<b) Discrete and Continuous Probability Distributions Expected Values of Sample Test Statistics (Mean, Variance)  Expected Values: o Weighted average of possible values of x, each weighed by the probability x assumes that value. o If X is continuous. then:  P (X )=∫ −∞ ∞ xf ( x )dx Frequency Histograms   Floor and ceiling effects are a major problem for psychological scales  Difficulty effects: if a test is too hard, floor effects; if a test is too easy, ceiling effects  Skew: o 0 if symmetric distribution o Negative if left-talked o Positive if right-tailed  Kurtosis: o 0 if peaked as normal distribution o Flatter (platykurtic)- more info in tails; negative kurtosis o More peaked (leptokurtic)- less information in tails, more in center of the distribution; positive kurtosis. Logistic Regression  Use to predict an outcome variable that is categorical from one or more categorical or continuous predictor variables.  Used because having a categorical outcome variable violates the assumption of linearity in normal regression. Odds and log odds Transformations on dependent variable Relation to multiple regression  With multiple predictors, logistic regression is just like multiple regression Interpretation of regression coefficients Tests of model fit and individual regression coefficients Mediational Analysis Concept of mediator Tests of mediation in terms of multiple regression tests Causally spurious correlation Test of partial regression coefficients with complete mediation Partial Mediation Moderator analysis Tests of moderated regression using hierarchical regression Variable centering What is it? Why do it? Model Comparisons Compact Model Augmented Model Multiple Regression Regression equation  Linear Regression Equation o Y i=Β0+Β1 X1+Β2X2+Β3 X3+ϵ i o There is a ‘slope’ for each independent variable; the Β’s are termed partial regression coefficients, and are interpreted as the effect of each X on Y, controlling for the other X’s. Meaning of ‘partial regression coefficient’  The Β coefficients are standardized partial regression coefficients. Tests of Model R2 = 0 How to compute F-test Interpretation of F-test Tests of individual regression coefficients Confidence intervals for regression coefficients Power in multiple regression models Indices of predictor redundancy in multiple regression Model Testing in multiple regression Order of Entry of variables in equation Relation of order of entry to standardized partial regression coefficients Tests for increments to R2 (null hypothesis that increment = 0) Exploratory regression methods Forward inclusion Backwards elimination Stepwise regression Multicollinearity Consequences of Diagnostics of Regression diagnostics Plots of residuals Leverage and influence Sums of Squared Errors Tests of interaction in regression (moderated regression) Order of entry of variables in moderated regression Decomposing the significant interaction (simple slopes) Test of Model with Mean only / relation to one-sample t-test for mean = k Types of Scales: Nominal, Ordinal, Interval, Ratio:  Quantitative Scales: ordinal, interval, ratio o f(o) implies functional state of observed entity, mapped to variable o t(0) “true” but unobservable property of objecttrue” but unobservable property of object o m(o)  property of empirical scale or measure of object [assigned numbers]  Nominal Scale: categories, kinds (gender, major, religious affiliation)  Ordinal Scale: numbers can be used to define rank orders but they do not convey relative difference in amount of underlying attribute. o m (oi )≠m (o j )implies t (oi )≠t (o j ) o m (oi )>m (o j ) implies t (oi )>t (o j)  Interval Scale: ordinal properties apply, as well as: for any object oi, m is an interval scale iff t (oi )=x implie s m (oi )=ax+bwhere a≠0 o The #s assigned tell us about relative differences in the amount of underlying attribute, and this difference is equivalent across all levels of the interval scale (i.e., temperature)  In any sample the deviation of a score from M is d i=x i−¿M o ∑d i=0 Variance as Residuals of a Model with a Constant Likely sample problems (hand calculations from summary statistics – have the formulas ready!) Compute mean, median, and mode from raw data Compute variance and standard deviation from raw data Use normal distribution to calculate region under curve (tables provided) Use normal distribution to compute 95% or 99% confidence interval for mean, given N, mean, variance (or summary statistics that can be transformed to get the mean and variance) Compute one-sample t-test given N, mean, and variance (or equivalent) Compute Pearson correlation from covariance and variances of two variables Compute partial and semi-partial correlations from summary statistics Compute test of correlation coefficient = 0 (if provided r to z transformation table) Compute test of two independent correlations = to each other (if provided r to z transformation table) Compute simple regression equation from summary statistics (means, variances, and the covariance of x and y) Compute test of slope equal to 0 in simple regression Compute confidence interval of slope in simple regression Compute test of model R2 in multiple regression if given total sums of squares, residual Sums of Squares, regression sums of squares, N, and P (in needed combinations). Compute standard error of a multiple regression coefficient, t-test for a regression coefficient, and 95% confidence interval for the regression coefficient Compute tests of semipartial correlations as increments to R2 in multiple regression equation if given R2 at each step (along with N, and number of variables in each step) Compute tests of increments to R2 if given R2 at each step Compute test of interaction (moderated regression) as increment to R2 If given moderated regression equation and sample summary statistics, compute simple slopes and plot interaction QUIZZES: STANDARD NORMAL DISTRIBUTION 1. What is the probability of observing a z score less than or equal to 1.57? (2 ‐score less than or equal to ‐1.57? (2 ‐score less than or equal to ‐1.57? (2 points) p ( z≤−1.57 )=p ( z≥1.57 )=0.05821 2. What is the probability of observing a z-score between the value of -1.50 and 0.50? p (−1.50<z<0.50 )= p ( z<0.50 )−p ( z ≥1.50 )=0.69146−0.06681=0.62465 CONFIDENCE INTERVALS Professor Jobs completes an unemployment survey with a sample of 144 persons randomly drawn from the U.S. workforce. The mean satisfaction rating with their job was 3.5 (rated on a 1 10 Likert scale, with 1 being highly dissatisfied, 5 being slightly ‐score less than or equal to ‐1.57? (2 dissatisfied, 6 being slightly satisfied, and 10 being highly satisfied). The estimated sample variance is 6. 1. Compute a 90% confidence interval for the population mean of job satisfaction on this scale. M± zcriterion( s√N ) 3.5±1.64 ( √6√144 )=3.5±1.64 ( 2.45 12 )=3.5±1.64 (0.204 )=3.5±0.335 ¿ [3.165,3 .835] 2. Is professor Jobs justified in stating that, on average, the US worker is not satisfied with his/her job and why? Yes, the 90% CI of the mean incorporates scores from the lower range of the scale. This end of the scale corresponds to workers disliking their jobs. REGRESSION The ACME lottery company tracks the number of tickets sold and the estimated size of the jackpot over a one year period. The mean jackpot size is 18.2 million USD. The mean number of tickets sold is 10.7 million. The following summary statistics are provided to regress the number of tickets sold (Y) on jackpot size (X): stickets=8 s jackpot=14 s jackpot ,tickets=5.5 1. What is the regression equation? b1= sxy sx 2 = 5.5 142 = 5.5 196 =0.028 b0=Ý−b1 X́=10.7−(0.028∗18.2 )=10.7−0.5096=10.1904 HOMEWORKS HOMEWORK #1 1) A researcher collects the following set of data: [12.5 13.5 11.3 15.0 16.7 15.3 15.9 12.6 11.6 10.2] Show all work by hand: a) What is the mean of this sample? 10.2+11.3+11.6+12.5+12.6+13.5+15+15.3+15.9+16.7 10 =13.46 b) What is the median of this sample? 12.6+13.5 2 =13.05 c) What is the mode of this sample? No mode b/c each value occurs only once. d) Is the distribution of this sample skewed positively or negatively? This distribution is skewed positively e) What is the variance of the sample (use n – 1 as the denominator)? X X-μ (X-μ )2 10. 2 - 3.26 10.62 11. 3 - 2.16 4.67 11. 6 - 1.86 3.46 12. 5 - 0.96 0.92 12. 6 - 0.86 0.74 13. 5 0.04 0.002 15 1.54 2.37 15. 3 1.84 3.39 15. 9 2.44 5.95 16. 7 3.24 10.50 VAR= ∑ ( x−μ )2 N−1 = 42.624 9 =4.736 f) What is the standard deviation of the sample? SD=√VAR=√4.736=2.176 2) a) Construct the stem-and-leaf plot of the following values [35, 61, 58, 64, 74, 89, 87, 71, 56, 61, 49, 64, 78, 74, 80, 60, 81, 87, 70, 51, 70] STE M LEA F 3 5 4 9 5 1 6 8 6 0 1 1 4 4 7 0 0 1 4 4 8 8 0 1 7 7 9 b) Identify the Mean, Median, and Mode of the values above c) Is the distribution of these values skewed positively or negatively? 3) Given data of the following types, state the scale of measurement that each type appears most clearly to represent (nominal, ordinal, interval, ratio) a) Nationality of an individual’s father b) Memory ability, as measured by the number of words recalled from an initially memorized list c) Reading ability of fifth-grade children, as shown by their test performance relative to a national norm group d) Hand pressure, as applied to a flexible bulb (that is, on a dynamometer) e) Social Security numbers f) Taking the arithmetic difference between two values g) stating that one value represents a higher level of some property than does another value HOMEWORK #3: 1. An experimenter was interested in the possible linear relationship between the time spent per day in practicing a foreign language and the ability of the person to speak the language at the end of a 6-week period. A random sample of 12 students showed the results as follows. Studen t X(=Practice, in hours) Y(= Proficiency score in the foreign language) 1 .30 115 2 .30 83 3 .30 110 4 .50 107 5 .50 89 6 .50 77 7 1.5 82 8 1.5 99 9 1.5 125 10 2 140 11 2 127 12 2.25 109 Stude nt X(=Pra c) Xmea n Xdif= X(Prac )- Xmea n Xdif^2 Y(F L) Ymea n Ydif=Y(F L)- Ymean Ydif^2 Xdif*Y dif 1 0.3 1.096 -0.796 0.6333 51 115 105.25 9.75 95.0625 -7.759 2 0.3 1.096 -0.796 0.6333 51 83 105.25 -22.25 495.062 17.707 3 0.3 1.096 -0.796 0.6333 51 110 105.25 4.75 22.5625 -3.780 4 0.5 1.096 -0.596 0.3550 17 107 105.25 1.75 3.0625 -1.043 5 0.5 1.096 -0.596 0.3550 17 89 105.25 -16.25 264.062 9.682 6 0.5 1.096 -0.596 0.3550 17 77 105.25 -28.25 798.062 16.832 7 1.5 1.096 0.404 0.1633 51 82 105.25 -23.25 540.562 -9.397 8 1.5 1.096 0.404 0.1633 51 99 105.25 -6.25 39.0625 -2.526 9 1.5 1.096 0.404 0.1633 51 125 105.25 19.75 390.062 7.982 10 2 1.096 0.904 0.8175 17 140 105.25 34.75 1207.56 25 31.420 11 2 1.096 0.904 0.8175 17 127 105.25 21.75 473.062 19.666 12 2.25 1.096 1.154 1.3321 01 109 105.25 3.75 14.0625 4.328 SUM: 6.42 4342.25 83.11 a. Compute Sum of Squares for X, Y, and XY SS (X )=∑ (X− X́ ) 2 =6.42
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved