Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Measures of Center and Variation in Statistics, Exams of Probability and Statistics

An introduction to measures of center and variation in statistics, including the concepts of arithmetic mean, median, mode, standard deviation, and measures of relative standing. It also covers the use of z-scores and the empirical rule for interpreting data. Examples and exercises.

Typology: Exams

Pre 2010

Uploaded on 08/03/2009

koofers-user-eqi
koofers-user-eqi 🇺🇸

10 documents

1 / 17

Toggle sidebar

Partial preview of the text

Download Understanding Measures of Center and Variation in Statistics and more Exams Probability and Statistics in PDF only on Docsity! MATH 2600 Probability and Statistics Chapter 3 Section 3-1 Overview ___________________ Statistics __________________ or _________________ the important characteristics of a known set of data ___________________ Statistics use sample data to make ____________________ (or ___________________________) about a population Section 3-2 Measures of Center Key Concept - When describing, exploring, and comparing data sets, these characteristics are usually extremely important: ______________, variation, distribution, outliers, and changes over time. Measure of Center - the value at the ______________ or _______________ of a data set Measure of Center 1 Arithmetic Mean (or just Mean) - the __________________________ obtained by adding the values and dividing the total by the number of values Notation Σ denotes the ________ of a set of values. x is the variable usually used to represent the ___________________ data values. n represents the ___________________ of values in a _____________. N represents the ___________________number of values in a _____________________. Formula for the mean _______ is pronounced ‘x-bar’ and denotes the mean of a set of _________ values _____________________ _______ is pronounced ‘mu’ and denotes the mean of all values in a ___________________ _____________________ Measure of Center 2 _________________ the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude • often denoted by ____ (pronounced ‘x-tilde’) • is not affected by an _______________ value Finding the Median If the number of values is __________, the median is the number located in the ___________ middle of the list. If the number of values is __________, the median is found by computing the _______________________________ numbers. Example Data set 5.40 1.10 0.42 0.73 0.48 1.10 Put in order _______ _______ ________ ________ ________ ________ Median is __________________________________________________________________ Example Data set 5.40 1.10 0.42 0.73 0.48 1.10 0.66 _______ _______ ________ ________ ________ ________ ________ Median is __________________________________________________________________ Example Calculate the mean and median for the following test grades. 95, 24, 90, 88, 92 Measure of Center 3 ____________the value that occurs most frequently Mode is not always unique A data set may be: Bimodal Multimodal No Mode Mode is the only measure of central tendency that can be used with nominal data Example: Find the mode of the following data sets. Measure of Center 4 ______________________the value midway between the maximum and minimum values in the original data set Midrange = Round-off Rule for Measures of Center Carry _________ more decimal place than is present in the original set of values. a. 5.40 1.10 0.42 0.73 0.48 1.10 b. 27 27 27 55 55 55 88 88 99 c. 1 2 3 6 7 8 9 10 Section 3-3 Measures of Variation Key Concept Because this section introduces the concept of ______________, which is something so ______________ in statistics, this is one of the ________ important sections in the ___________ book. Place a high priority on how to ____________ values of standard deviation. ______________________________________________________________________________________________ The _____________ of a set of data is the ____________________ between the _____________ value and the _________________ value. Range = _________________________________________________ Example: Many banks once required that customers wait in separate lines at each teller’s window, but most have now changed to a single main waiting line. Why did they make that change? The mean waiting time didn’t change, because the waiting line configuration doesn’t affect the efficiency of the tellers. They changed to the single line because customers prefer waiting times that are more consistent with less variation. Thus thousands of banks made a change that resulted in __________________________ (and happier customers), even though the mean was not affected. Let’s look at an example of this: Jefferson Valley Bank (Single waiting line) 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 7.7 7.7 Bank of Providence (Multiple waiting lines) 4.2 5.4 5.8 6.2 6.7 7.7 7.7 8.5 9.3 10.0 Calculate the mean and median of both banks by using the TI-83. Jefferson Valley Bank Mean _______ Median________ Bank of Providence Mean _______ Median________ Did either bank do better on average? _______________ To SEE the variation difference, let’s create a stem and leaf plot of both data sets: Jefferson Valley Bank Bank of Providence Stem Leaves Stem Leaves 4 4 5 5 6 6 7 7 8 8 9 9 10 10 **Our measures of variation should indicate that Bank of Providence waiting times are more spread out or have greater variation. Let’s see if they do! Jefferson Valley Bank’s range = ____________ - ___________ = _____ Bank of Providence’s range = ____________ - ___________ = _____ The range shows that the Bank of Providence has a greater variability. The range is easy to calculate, but can you see a limit to its usefulness in describing a data set’s variation? _________________________________________________ _________________________________________________ ___________________________________________________________________________________________ The ________________________ of a set of sample values is a _____________________________ of values _____________________________. Notation Mean Standard deviation Formula Sample x s 1 )( 2 − − = ∑ n xx s Population µ σ n x∑ −= 2)( µ σ Finding the standard deviation on the TI-83 Use 1-var-stats, the same method you use to find the mean and median, etc. S = the sample standard deviation σ = the population standard deviation S for Jefferson Valley Bank = _________ S for Bank of Providence = _________ Standard Deviation - Important Properties 1. The standard deviation is a measure of variation of ______ values from the ________. 2. The value of the standard deviation s can _____________ dramatically with the inclusion of one or more ________________ (data values far away from all others). 3. The ________ of the standard deviation s are the __________ as the units of the original _____________. Round-off Rule for Measures of Variation 1. Carry _____ more decimal place than is present in the _______________ set of data. 2. Round only the ______________ answer, not values in the middle of a calculation. These 2 distributions have the same range. Which data set has greater variation? We need a better variation measurement than the range! Notice that the standard deviation does show different variations in the two banks. Specifically, that the ________________________ has greater variation in its waiting times. Finding the standard deviation by hand The following is an example of finding the standard deviation (by hand) for Jefferson Valley Bank 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 7.7 7.7 Step 1 Find the mean x = ______ Step 2 Use the formula = − − = ∑ 1 )( 2 n xx s 110 )15.77.7()15.77.7()15.77.7()15.74.7()15.73.7()15.71.7()15.78.6()15.77.6()15.76.6()15.75.6( 2222222222 − −+−+−+−+−+−+−−+−+− = _______ Now, you do the same thing for Bank of Providence. Work the following example: Consider the following two classes of Mr. Mean. Some say Mr. Mean is not very nice, while yet others say he is just average. Either way, we will take a look at the test results of two of his classes. 1st class 2nd class 42, 56, 65, 66, 67, 68, 71, 73, 74, 77, 77, 77, 91, 100 42, 43, 54, 54, 58, 62, 67, 77, 77, 85, 92, 93, 100, 100 Calculate the mean, median and range of both classes by using the TI-83. 1st class Mean _______ Median________ Range__________ 2nd class Mean _______ Median________ Range__________ Did either class do better on average? _______________ Does the range indicate that one class had grades more spread out than the other? _______________ This is an example of the range doing a poor job describing variability. Let’s create a stem and leaf plot of both data sets: 1) Which class has a greater variation in the grade distribution? ___________________________ 2) Calculate the standard deviation for each data set on the calculator. 3) Does this measure of variation agree with what you see from the stem-and-leaf plots? ______________ Why? Method 2 to interpret the standard deviation: Chebyshev’s Rule Chebyshev’s Rule ________________________________________________________________________________________ ________________________________________________________________________________________ i) ____________ information is provided on the percentage of data values that fall within __________________________ of the mean. ii) ________________________ of the data values will fall within ______________________________ of the mean. iii) ________________________ of the data values will fall within ______________________________of the mean. x x x x Bell-shaped, symmetric Not Bell-shaped, not symmetric Left-Skewed Mean Median Right-Skewed Median Mean Chebychev’s Rule Example 1: The mean, median, and standard deviation of the following test grades from my Fall 2008 Math Modeling class are given below. 74, 83, 81, 90, 75, 57, 98, 46, 85, 81, 33, 96, 66, 62, 86, 73, 87, 94, 87, 32, 80, 79, 95, 83, 49, 58, 63, 93, 58, 89, 90, 65, 86, 74, 91, 76 Mean _________ Median _________ Standard deviation _________ a) According to the mean and median, is this data set skewed to the right, skewed to the left, or mound-shaped? b) Interpret the test data using __________________________ Rule: 1 standard deviation away from the mean: 2 standard deviation away from the mean: 3 standard deviation away from the mean: *Chebychev’s rule applies to any shape of data Method 3 to interpret the standard deviation: The Empirical Rule The Empirical Rule The empirical rule applies _______ to data sets with frequency distributions that are approximately mound-shaped and symmetric. i) Approximately, 68% of the data values fall within 1 standard deviation of the mean or within the following intervals: Samples ),( sxsx +− Populations ),( σµσµ +− ii) Approximately, 95% of the data values fall within 2 standard deviations of the mean or within the following intervals: Samples )2,2( sxsx +− Populations )2,2( σµσµ +− x x - s x x + s 68% within 1 standard deviation 34% 34% x - 2s x - s x x + 2sx + s 68% within 1 standard deviation 34% 34% 95% within 2 standard deviations 13.5% 13.5% x iii) Approximately, 99.7% of the data values fall within 3 standard deviations of the mean or within the following intervals: Samples )3,3( sxsx +− Populations )3,3( σµσµ +− Empirical Rule Example 1 Test results on a certain brand of battery show that the battery life span averages 60 months with a standard deviation of 10 months. Assume that the distribution of data is mound-shaped. The company gives a warranty for 36 months, let’s see why. Use the Empirical rule to answer the following questions, but first fill in the values at the bottom of the diagram. 1. Approximately, what % of batteries last more than 60 months?___________________ 2. Approximately, what % of batteries last from 60 to 70 months? __________________ 3. Approximately, what % of batteries last greater than 50 months?__________________ 4. Approximately, what % of batteries last greater than 70 months?_________________ 5. Approximately, what % of batteries last from 60 to 80 months? __________________ 6. Approximately, what % of batteries last less than 40 months?____________________ x - 3s x - 2s x - s x x + 2s x + 3s x + s 68% within 1 standard deviation 34% 34% 95% within 2 standard deviations 99.7% of data are within 3 standard deviations of the mean 0.15% 0.15% 2.35% 2.35% 13.5% 13.5% x - 3s x - 2s x - s x x + 2s x + 3s x + s 68% within 1 standard deviation 95% within 2 standard deviations 99.7% of data are within 3 standard deviations of the mean =___=___ =___ =___=___ =___ =___ z-scores A ___________________ is the _______________________________________ that a given value x is above or below the mean Sample ____________________ Population _________________________ Recall, The Rule of Thumb All ‘usual’ values fall within 2 standard deviation from the mean. Whenever a value is less than the mean, its corresponding z score is negative Ordinary ‘usual’ values: z score between _________________ Unusual Values: ____________________________________ Recall, The Empirical Rule The empirical rule of thumb applies to data sets with frequency distributions that are approximately mound-shaped and symmetric. 1) Approximately, 68% of the data values fall within 1 standard deviation of the mean. _____ of the data values will have a z-score between -1 and 1. 2) Approximately, 95% of the data values fall within 2 standard deviations of the mean. _____ of the data values will have a z-score between -2 and 2. 3) Approximately, 99.7% of the data values fall within 3 standard deviations of the mean. _____ of the data values will have a z-score between -3 and 3. Z-Scores Example 1 Men’s heights are approximately normal (mound-shaped) with a µ = 70 inches and σ = 2.8 inches. • 1. What is the z-score of someone whose height is equal to the average? (show calculation) _______________________ • 2. What is the z-score of a basketball player who is 6’ 10”? ___________________ • 3. My wife is only 4’ 11”. If a man was the same height, would he be considered unusually short? How short do you have to be to be considered unusually short? Approximately, how many adults are we talking about? • 4. How tall is someone with a z-score of -2? Z-Scores Example 2 Example Suppose that Micah scored an 81 on his first Statistics test. His friend Jacob, who is in a different statistics class with a different professor, scored a 77 on his first test. Jacob said that he thinks his grade is better even though it is lower because his professor is known to give hard tests. The grades cannot be compared one to one because they are from different tests from different professors. How can we compare them? Micah’s class x = __________ Jacob’s class x = __________ Micah’s class s = ____________ Jacob’s class s = ____________ Micah’s z score = ___________ Jacob’s z-score = ___________ So who did relatively better? ___________________ Section 3-5 Exploratory Data Analysis (EDA) Key Concept This section discusses outliers, then introduces a new statistical graph called a boxplot, which is helpful for visualizing the distribution of data. ________________________________________________ Outliers An ______________ is a value that is located very far away(see specific class definition below) from almost all of the other values. *************************************************************************************** IMPORTANT: Our class definition of an _______________ will be a _________________________ ____________________________________________________________________ *************************************************************************************** An outlier can have a dramatic effect on the ______________. An outlier can have a dramatic effect on the _______________________. An outlier can have a dramatic effect on the _______________________________ so that the true nature of the distribution is ___________________________. Example 1 Now, suppose that I tell you that the test average on a test in statistics was 69 with a standard deviation of 12. 1. Would it be considered “unusual” if a student made a 47? 2. What about a score of 40? 3. What would you say about a score of 30? Example 2 From the class survey from the 1st day of class, average number of siblings was 1.6 and the standard deviation was 1.2. 1. Determine if there were any outliers in the data and list below. Do you think these are real data values or a mistake? 2. Were there any unusual values in the data? List them. Chapter 2 Assignment Section 3-2 1-5all, 7, 8, 12, 13, 15, 17, 21, 25, 28, 29, 32 Section 3-3 1, 2, 3, 5, 8, 9, 10, 13, 15, 17, 21, 31, 33, 36, 37 Section 3-4 1, 2, 3, 6, 7, 8, 9, 10, 12, 13, 14, 20, 21, 22 Section 3-5 4 Calculate the mean and standard deviation for the number of keys from the class survey data. 1. Are there any unusual values? 2. Any outliers?
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved