Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Research Methods: Hypothesis Testing and Regression Analysis - Prof. Jeffery A, Study notes of Psychology

An overview of statistical research methods, focusing on hypothesis testing and regression analysis. Topics include nominal and ordinal data, statistical hypotheses, t-tests, correlation and regression analysis, and anova. Research questions cover various scenarios, such as comparing group means, examining relationships between variables, and investigating the effect of interventions on attitudes.

Typology: Study notes

Pre 2010

Uploaded on 02/12/2009

koofers-user-g9u
koofers-user-g9u 🇺🇸

10 documents

1 / 12

Toggle sidebar

Related documents


Partial preview of the text

Download Statistical Research Methods: Hypothesis Testing and Regression Analysis - Prof. Jeffery A and more Study notes Psychology in PDF only on Docsity! Stats Review Data Analysis as a Decision Making Process I Levels of Measurement NOIR (See Whitley, 2001, pp. 350-351, for details) Nominal = Categories with Names; Yes vs. No (don’t ask sometimes vs. never), Sex, Religion, Relationship Status, Political Affiliation, Experimental Group Membership (Control Group, Manipulation Group, Comparison Group). Numbers represent groups (1 = Female, 2 = Male). Order is arbitrary. Ordinal = Nominal Categories with a Logical Order; Class Rank, Height from tallest to shortest, Responses on a Numerical Rating Scale (1 = strongly agree, 7 = Strongly disagree). Intervals between numbers is not standard from unit to unit. Interval = Numerical scales with logically ordered units that are equidistant, but the zero is artificial. E.g., Temperature in Centigrade and Fahrenheit (Zero does not represent an absence of temperature). Time of day (no 0 o’clock). Calendar Dates. Calendar Years. Ratio = Numerical scales with logically ordered units that are equidistant and have a true Zero (the zero represents a lack of that which is being measured). E.g., Elapsed Time, Temperature in Kelvin (0 degrees Kelvin = -273.15 degrees celcius), Length, Mass. Because it uses a true zero, numerical values can be used to define ratios: 5 inches is five times more length than 1 inch. 10 inches is twice as long as 5 inches. - Ratio Level Measures are Quite Rare in Psychology Continuous Vs. Discrete Variables Discrete Variables = Mutually Exclusive/Exhaustive Numerical Categories that can’t be broken down in to finer units (e.g., if sex is represented by 1:male and 2:female, there is no 1.5). All Nominal and Ordinal Variables are Discrete. However, many Ordinal variables will be treated as continuous (e.g., the Numerical rating scales are often averaged to form a single score which is treated as continuous). Continuous Variables = Numerical systems where there are an infinite number of possible points between each unit. Also the measurements can be broken down into finer units (e.g., elapsed time : Years, Months, Days, Hours, Minutes, Seconds, Milliseconds, Nanoseconds, etc..). II. Choosing your Statistics Knowing which statistic to use to test the relationship between each variable depends on the type of data you have (and sometimes the type of question you want to answer) -Note: This is often discussed with respect to the issue of statistical validity and there are different camps regarding the Appropriateness of certain levels of measurement for specific statistics (Michell, 1986). Also given specific conditions most parametric statistics can hand non-parametric data. A. Single Discrete Variable Goodness of Fit X2 = Allows us to test whether the group frequencies differ from chance patterns (base rate frequencies : the frequency instances naturally occur in the environment). (df = k -1, where k = number of groups) where Oi = observed frequency for each separate groupχ 2 2 = −∑ ( )O EE i i i Ei = expected group frequencies (based on chance) Statistical Hypotheses: Ho : Of = Ef Ha : Of …Ef Example Research Question: Does number of people who say they like cheezy poofs (Yes = 1) vs. those who do not like cheezy poofs (No = 0), differ significantly from the number expected by chance alone? B. Discrete X Discrete Pearson’s X2 (AKA: Test of Independence)= Allows us to test whether the cross tabulation pattern of two nominal variables differs from the patterns expected by chance. If one variable is ordinal then t or F are normally used. (df = (R-1)(C-1)) where R = # of rows & C = # of columns. where i = the different groups for Variable 1 χ 2 2 = − ∑ ( )O E E ij ij ij j = the different groups for Variable 2 where Ri = Row total of row iE R C Nij i j = Cj = Column total of column j Statistical Hypotheses: Ho : Of = Ef Ha : Of …Ef Example Research Question: Does the number of people who think they are Eric Cartman (yes = 1, no = 0), relative to whether or not they eat cheezy poofs, significantly differ from the frequencies that are expected by chance alone. Note: a significant Pearson’s chi square will not tell you which cells are different. The crosstabulation matrix must be examined to determine this. “Eye-balling” the standardized, corrected residuals seems to be the most useful. Limitations on X2: 1) Responses must be independent and mutually exclusive and exhaustive. Each case from the sample should fit into one and only one cell of the cross tab matrix. 2) Low expected Frequencies limit the validity of X2. If df = 1 (e.g., 2x2 matrix), then no expected frequency can be less than 5. Also, If df = 2, all expected frequencies should exceed 2. If df=3 or greater, then all expected frequencies except one should be 5 or greater and the one cell needs to have an expected frequency of 1 or greater. Phi Coefficient (if 2X2 matrix) correlation coefficient that estimates the strength of the relationship between two dichotomous nominal variables. Note: Phi can not estimate the direction (e.g., positive linear vs. negative linear) of the relationship between 2 nominal variables because the numerical values are arbitrary (direction is meaningless). -This correlation coefficient can be calculated exactly like Pearson’s r (below) or can be estimated using the X2 statistic. Thus any X2 can be converted to Phi or Phi can be converted to X2. (significance should be determined using X2 tables) and φ χ = 2 N χ φ2 2= N Independent Sample t-test = compare two the means of two unrelated groups. df = n-2 ( ) ( ) t X X n s n s n n n n = − − + − + − ⎛ ⎝ ⎜ ⎜ ⎞ ⎠ ⎟ ⎟ + ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ 1 2 1 1 2 2 2 2 1 2 1 2 1 1 2 1 1 Statistical Hypotheses: Ho : 01 = 02 Ha : 01 … 02 Example Research Question: Does a randomly assigned group exposed to 37 hours of South Park (group = 1) reruns report significantly more positive or negative attitudes toward Research Methods (measured using a 5 item questionnaire employing a 7 point rating scale: item averages range from 1-7) compared to a randomly assigned comparison condition exposed to 37 hours of Sally Struthers Feed the Children commercials (group = 2). Repeated Measure / Matched Sample t-test = Repeated Measure: test the significance of the averaged difference in scores between time 1 and time 2. Matched Sample: compare averaged difference in scores between group 1 and group 2 when the subjects from each group have been matched on some variable (e.g. age, intelligence, etc.). df = n-1 t X X n X X X X n n n dif t t t t t t = − − − − − 1 2 1 2 2 1 2 2 1 Σ Σ ( ) ( ( )) Statistical Hypotheses: Ho : 0t1 = 0t2 Ha : 0t1 … 0t2 Example Research Question: After being exposed to 37 hours of South Park reruns (Time = 2), do participants report significantly more positive or negative attitudes toward Research Methods (measured using a 5 item questionnaire employing a 7 point rating scale: item averages range from 1-7) compared to their pre-exposure scores (Time = 1). Biserial vs. Point Biserial (Artificial Dichotomies vs. Natural Dichotomies). Point Biserial Correlation Coefficient rpb= Same formula as Person’s r (see below), only one variable is a natural dichotomy often designated by 0 & 1. This correlation coefficient will indicate the strength of the relationship between category membership and the continuous score. Note that like Phi, the direction of the relationship (positive vs. negative) is arbitrarily based on the numerical labels assigned to the groups. Examination of the means is necessary to determine the direction of the group differences. df = n-2 Statistical Hypotheses: Ho : rpb = 0 Ha : rpb … 0 Example Research Question: What is the strength of the relationship between being randomly assigned to a group exposed to 37 hours of South Park reruns (group = 1) vs. being randomly assigned to a comparison condition exposed to 37 hours of Sally Struthers Feed the Children commercials (group = 0) and self-report attitudes toward Research Methods (measured using a 5 item questionnaire employing a 7 point rating scale: item averages range from 1-7). Biserial Correlation Coefficient rb = Used when a Dichotomy is developed from a continuous variable (e.g. mean split, or median split methods). Groups are often designated by 0 & 1. rb is estimated from a normal Pearson’s r (see below). This statistic will tell you the strength of the association between category membership (based on an artificial dichotomy) and a continuous score. Again, direction of the relationship will be arbitrary depending on the numerical category labels (however since they are based on continuous scores the numerical label with greater value should be given to the upper end of the continuum, making interpretation easier) df = n-2 r r Xb pearson belowcp abovecp cp = − % % μ σ where Xcp = the raw score cut point (raw score used to split distribution into 2 groups) Statistical Hypotheses: Ho : rb = 0 Ha : rb … 0 Example Research Question: What is the strength and direction of the relationship between watching more than 10 hrs. per week of South Park (Group = 1) vs. watching 10 or fewer hours of South Park per week (group = 0) and self-report attitudes toward Research Methods (measured using a 5 item questionnaire employing a 7 point rating scale: item averages range from 1-7), where 10 hrs. per week is the mean of self- reported South Park viewing habits. If Discrete Variable has more than 2 Levels (more than 2 groups). One Way ANOVA : Anova tests whether 2 or more group means are significantly different. -When using 2 groups F = t2. -See ANOVA handout for details on One Way Anova Statistical Hypotheses : Ho : Mean grp1 = Mean grp 2 = Mean grp j (for j groups) Ha : At least one group mean significantly different from one other group mean. Example Research Question: Are there any significant difference between three randomly assigned groups ( 1, exposed to 37 hours of South Park Episodes; 2, exposed to 37 hours of Sally Struthers Feed the Children commercials; & 3, no TV control condition) with respect to their self-report attitudes toward Research Methods (measured using a 5 item questionnaire employing a 7 point rating scale: item averages range from 1-7). If Discrete X Discrete X Continuous (Where both discrete variables are predictors) Two Way ANOVA : If we have 2 independent variables (or 1 IV and a Blocking Variable) then Two Way Anova (Or Factoral ANOVA) is called for. Again, this will tell us if one group mean (or matrix cell mean) is significantly different from one other group mean (or cell mean). Also, Factoral Anova can handle More than 2 IV’s (and or Blocking Variables). (See Two Way Anova Handout for Details). Moderation - when we find a significant interaction between two predictor variables we can say that one predictor moderates the relationship between the other predictor and the outcome (DV). Our decision about which predictor is the IV and which is the Moderating Variable is based on our theoretical perspective. Statistical Hypotheses - The two way ANOVA actually tests several hypotheses at once. 1) Main Effects IV1 : Ho : Mean Group 1. = Mean Group i. (for i groups) Ha : At least one group mean significantly different from one other group mean. IV2 : Ho : Mean Group .1 = Mean Group .j (for j groups) Ha : At least one group mean significantly different from one other group mean. 2) Interaction effects (moderation effects) IV1 : Ho : Mean grp11 = Mean grp 21 = Mean grp12 = Mean grp ij (for i and j groups) Ha : At least one group mean significantly different from one other group mean. Example Research Question: Do males (sex = 0) have significantly more positive or negative attitudes toward Research Methods (measured using a 5 item questionnaire employing a 7 point rating scale: item averages range from 1-7) compared to females (sex =1). Also, does a randomly assigned group exposed to 37 hours of South Park (group = 1) reruns report significantly more positive or negative attitudes toward Research Methods (measured using a 5 item questionnaire employing a 7 point rating scale: item averages range from 1-7) compared to a randomly assigned comparison condition exposed to 37 hours of Sally Struthers’ Feed the Children commercials (group = 2). Does participant sex (m = 0, f = 1) moderate the relationship between south park exposure (IV: random assignment to condition: 1) exposed to 37 hours of South Park Episodes; 2) exposed to 37 hours of Sally Struthers’ Feed the Children commercials.) and attitude toward Research Methods (measured using a 5 item questionnaire employing a 7 point rating scale: item averages range from 1-7). ANCOVA : Analysis of Covariance : Sometimes we want to remove the effects of a third variable (Covariate: Can be continuous or discrete). We may want to remove the effects of a nuisance variable that is correlated with our main variables of interest or we may want to test a mediational hypothesis (that the 3rd variable explains the relationship between the IV and DV. that is, the 3rd variable accounts for all the shared variance between the IV & DV). (see a good stats book for details) Statistical Hypotheses: Ho : Mean Group1 = Mean Groupj (for j groups) Ha : At least one group mean significantly different from one other group mean. Example Research Question: Do males (Sex = 0) have significantly more positive or negative attitudes toward Cheezy Poofs (measured using a 6 item questionnaire employing a 5 point rating scale: item averages range from 1-5) compared to females (Sex = 1) after the effects associated with IQ (range 70-150) are removed. NOTE - Multiple Regression : Anything Anova and Ancova can do, Multiple regression can do as well through the use of dummy coding, effects coding, and contrast coding. t = (b/standard error for b) = used to test the significance of the regression coefficient. - The research questions that you can ask with Multiple regression are quite flexible. - Single Step - identifies the unique association of each predictor with a criterion - Hierarchical Regression (Mulitple Steps) - identifies the unique contribution of a single variable or group of variables to the multiple correlation coefficient (R2)) - Mediation Analyses - (Because) A third variable accounts for/Explains the relationship between X and Y. Why is X related to Y, because of Z. -This is the ultimate goal of Science. - Moderation Analyses / Interaction effects - (It Depends) A third variable influences the strength and/or direction of the relationship between an IV and DV. What influence does X have on Y? It depends on Z. F. Multiple Dependent Variables 1. Discrete IVs and Multiple Continuous DV’s MANOVA : Multivariate Analysis of Variance - When you have multiple continuous outcome variables that reflect a related set of constructs and you want to test the association with 1 or more discrete IV’s you can use Multiple Analysis of Variance. - Returns a single F that indicates whether the IV (or IVs) are significantly associated with the DVs as a group. - This is most useful for keeping the Type I error rate down when conducting multiple analyses. - If it is significant then it is usually followed up with Univariate tests assessing one dependent variable at a time. - (see a good stats book for details) MANCOVA : Manova with a Covariate (a variable that is having it’s influence removed from the test). (see a good stats book for details) 2. One or more Continuous Independent Variables and Multiple Continuous Dependent Variables - Canonical Correlation or Set Correlation - Returns a single correlation coefficient that is the best fitting correlation between set 1 (IVs) and set 2 (DVs) determined through multiple iterations. - Path Analysis / Structural Equation Modeling - Theory/Model Testing Procedures - - Allows you to determine the degree to which causal relationship predicted by theory fit with the data; Goodness of Fit, which is expressed as a chi-square and other fit indicies. - Path Analysis / Causal Modeling - only considers Manifest Variables, which are directly measured variables. - Structural Equation Modeling / Latent Variable Models - include Latent Variables, which are not measured directly. For example SES, is not determined by a single indicator. It is a latent variable made up of manifest variables like income, education, job prestige, and obtained wealth. Sample Structural Model G. Continuous Predictors and Categorical Dependent Variables - Logistical Regression - Allows you to ask a variety of research questions with minimal statistical assumptions (e.g., assumptions regarding normal distributions). - The “most important” of which would be Strength of Association between predictors and outcomes and Prediction (predicting outcomes/group membership for future cases). - Logistic regression can handle multiple predictors that either continuous or discrete and combinations of both. G. Data Reduction - Reducing a larger number of variables down to more manageable set. - Selecting Items for scale development - Factor Analysis / Principle Component Analysis - Exploratory - - Identifies groups of variables that have been responded to in similar way when no a priori groups have been identified. - Confirmatory - - Similar to SEM and Path Analysis, identifies the “goodness of fit” between priori groups of items/variables and the data. - Cluster Analysis - Like Factor analysis but groups people instead of varaibles. I. There are others..... Many Many Others...... - Multi Level Modeling (for group level independent variables) - Hierarchical Linear Modeling (kind of a combination of cluster analysis and regression)
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved