Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Probability & Statistics: Week 7 - Relationships between Variables - Prof. Deborah S. Hosl, Study notes of Statistics

The learning objectives for week 7 of a probability & statistics course, focusing on relationships between variables. Topics include recognizing quantitative versus categorical variables, making appropriate graphical displays, identifying explanatory and response variables, looking for associations between quantitative variables using scatterplots and correlation, and building and assessing least-squares linear regression models. Additionally, the document covers looking for relationships between two categorical variables using two-way tables.

Typology: Study notes

Pre 2010

Uploaded on 08/13/2009

koofers-user-wip
koofers-user-wip 🇺🇸

10 documents

1 / 4

Toggle sidebar

Related documents


Partial preview of the text

Download Probability & Statistics: Week 7 - Relationships between Variables - Prof. Deborah S. Hosl and more Study notes Statistics in PDF only on Docsity! MATH-1530-04/07/15/17 Probability & Statistics Fall 2004 / Week 7 LEARNING OBJECTIVES (Chapters 4, 5 & 6) I. Relationships between variables A. Recognize the difference between quantitative versus categorical variables. B. The first step of bivariate analysis is to make an appropriate graphical display. 1. When you are looking for associations between two categorical variables, either a stacked bar chart or a clustered bar chart will often provide the best display. 2. On the other hand, a scatterplot is the tool of choice when looking for an association between two quantitative variables. 3. Identify the explanatory (x) and response (y) variables in situations where one variable may be used to explain or to predict another. II. Looking for associations between two quantitative variables. A. Scatterplots 1. Make a scatterplot for two quantitative variables, always placing the explanatory variable (if any) on the horizontal axis. 2. Add a categorical variable to the scatterplot by using different plotting symbols (or colors). 3. Recognize positive vs. negative association, a linear vs. a curved form, a weak vs. a strong relationship. 4. From scatterplots, identify linear associations for which further analysis should be carried out (i.e., correlation and linear regression). B. Correlation 1. Compute the linear correlation coefficient r given a set of bivariate data. 2. Recognize that r gives a numerical indication about the strength and direction of the linear association between two quantitative variables. 3. Compare values of r, along with the patterns shown in scatterplots, to pick the best explanatory variable out of a several alternatives that can be used to predict values of a given response variable. 4. Know the basic facts about correlation listed and discussed in class and on pages 90-91 of the text. III. Least-squares linear regression A. Build the model 1. Understand what is meant by slope (b) and y-intercept (a) in the equation of the Least Squares Regression Line. 2. Calculate the least-squares regression line of y on x from a set of bivariate data; that is, construct the linear model that best describes the data. (i.e., the best-fit line: y-hat = a + bx). 3. Give the equation of the regression line of y on x from the means x-bar and y-bar, the standard deviations sx and sy, and the correlation coefficient r. Use the formulas given on page 107 with these 5 statistics to find, first, the slope (b) and then the y-intercept (a). 4. Know that the regression line of y on x always passes through the point (x-bar, y-bar). 5. Be aware of the other facts about least-squares regression listed on pages 110 & 111 of text. C. Use the model 1. Use the least-squares regression equation to draw a graph of the “best-fit” line through the points in a scatterplot. 2. Predict y for a given value of x either from the equation or from a graph of the fitted line in a scatterplot. D. Assess the model 1. Recognize potential outliers and influential points in a scatterplot with the regression line drawn on it. 2. Compute a residual when given the observed value and the predicted value of y for some particular value of x. 3. Calculate all the residuals for a bivariate data set and plot them against the observed values of X and recognize unusual (non-random or non-linear) patterns when displayed in residual plots. 4. Use R2 to describe the extent to which the variation in the response variable can be accounted for by its straight-line relationship with the explanatory variable. E. Limitations of correlation and regression 1. Understand that r and the least-squares regression line are not resistant to the effects of outliers and extreme observations. Both can be strongly influenced by just a few extreme observations. 2. Understand that even a strong correlation does not necessarily mean that there is a cause-effect relationship between two variables. 3. Understand the dangers presented by extrapolation and lurking variables. ******************************************If you haven’t done so already, check out the “Links to Applets in the WWW to explore regression and correlation” located under the material for Chapter 4 & 5 in the course web page (not in your instructor’s page) for Math- 1530.**************************************************** IV. Looking for relationships between two categorical variables A. Two-Way Tables (a.k.a. contingency tables) 1. Recognize that two-way tables are useful because they help organize large amounts of data by grouping responses or outcomes into categories. 2. Notice that the categories pertaining to the row variable label the rows that run across the table. 3. Likewise, the categories of the column variable provide labels for the columns that run down the table. B. Marginal Distributions 1. Know how to compute the marginal distributions (in either frequencies or relative frequencies) for a two- way table. 2. Understand that the row totals give the distribution of the row variable and the column totals give the distribution of the column variable. C. Conditional Distributions 1. Realize that associations between two categorical variables shown in a two-way table are explored by examining the conditional distributions within the rows and columns of the table. 2. Know how to find the conditional distribution of the row variable for one specific category of the column variable by looking only at that one column in the table. Calculate the relative frequency for each cell in the column by dividing the count in the cell by the column total. 3. Know how to find the conditional distribution of the column variable for one specific category of the row variable by looking only at that one row in the table. Calculate the relative frequency for each cell in the row by dividing the count in the cell by the row total. 4. Be able to compare the conditional distributions in order to describe the association between variables. Notice that there is a conditional distribution of the row variable for each column in the table and there is a conditional distribution of the column variable for each row in the table. D. Limitations of two-way tables 1. Beware of lurking variables. 2. Understand that the term “Simpson’s paradox” refers to a change or reversal in the association between two variables that may occur when the influence of a third variable is taken into account. Warm-up Problems 1. An old study in Iowa produced the following data on corn yield (bushels per acre in 1910-1919) and value per acre (in 1920) for farmland in 10 counties. (This is actual data; times have changed since 1920.) County #: 1 2 3 4 5 6 7 8 9 10
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved