Download Confidence Intervals and Point Estimates in Statistics and more Exercises Business Statistics in PDF only on Docsity! Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 1 of 17 09 Confidence Intervals 9.1 Inference and Point Estimates Whenever we use a single statistic to estimate a parameter we refer to the estimate as a point estimate for that parameter. When we use a statistic to estimate a parameter, the verb used is "to infer." We infer the population parameter from the sample statistic. Some population parameters cannot be inferred from the statistic. The population size N cannot be inferred from the sample size n. The population minimum, maximum, and range cannot be inferred from the sample minimum, maximum, and range. Populations are more likely to have single outliers than a smaller random sample. The population mode and median usually cannot be inferred from a smaller random sample. There are special circumstances under which a sample mode and median might be a good estimate of a population mode and median, these circumstances are not covered in this class. The statistic we will focus on is the sample mean x. The normal distribution of sample means for many samples taken from a population provides a mathematical way to calculate a range in which we expect to "capture" the Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 2 of 17 population mean and to state the level of confidence we have in that range's ability to capture the population mean µ. Point Estimate for the population mean µ and Error The sample mean x is a point estimate for the population mean µ The sample mean x for a random sample will not be the exact same value as the true population mean µ. The error of a point estimate is the magnitude of estimate minus the actual parameter (where the magnitude is always positive). The error in using x for µ is ( x − µ ). Note that to take a positive value we need to use either the absolute value |( x -‐ µ )| or √( x -‐ µ )2. Note that the error of an estimate is the distance of the statistic from the parameter. Unfortunately, the whole reason we were using the sample mean x to estimate the population mean µ is because we did not know the population mean µ. For example, given the mean body fat index (BFI) of 51 male students at the national campus is x = 19.9 with a sample standard deviation of sx = 7.7, what is the error |( x -‐ µ )| if µ is the average BFI for male COMFSM students? We cannot calculate this. We do not know µ! So we say x is a point estimate for µ. That would make the error equal to √(x − x)2 = zero. This is a silly and meaningless answer. Is x really the exact value of µ for all the males at the national campus? No, the sample mean is not going to be the same as the true population mean. Point estimate for the population standard deviation σ The sample standard deviation sx is a reasonable point estimate for the population standard deviation σ. In more advanced statistics classes concern over bias in the sample standard deviation as an estimator for the Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 5 of 17 9.18 Confidence Intervals for n > 30 where σ is known The sampling distribution of the mean is a normal distribution with the standard error replacing the standard deviation. The diagram above shows the 95% area under the curve. The NORMINV function can find the left and right values for the range in which we expect the mean to be found 95% of the time. This range is called the 95% confidence interval. In the diagram the ends of the range are indicated by the lower and upper limits. =NORMINV(p;µ;σ/√(n)) The NORMINV function uses the area to the left of the lower limit to find the lower limit. That area can be determined by noting that the whole area under the curve is 100%. This means that 5% is distributed in two equal tails. Each tail is half of 5%. Each tail is 2.5 or 0.025 in decimal notation. Thus the lower limit can be found by using the area 0.025. The upper limit can be found by using the area to left of the upper limit. The area to the left of the upper limit is 2.5% + 95%. This is 97.5% or 0.975 in decimal notation. Example Find the 95% confidence interval for the population mean number of cups of sakau en Pohnpei consumed by a customer. The sample consists of 227 customers who drank an average 3.65 cups of sakau with a standard deviation of 2.52. While we lack the population standard deviation, the sample is large enough and the underlying data is sufficiently heap-‐like that the sample standard deviation is a good point estimate for the population standard deviation. In this example n = 227, x = 3.65; and sx = 2.52. Note that x and sx are being used to estimate µ and σ The lower ("left") limit for the population mean: Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 6 of 17 =NORMINV(0.025;3.65;2.52/SQRT(227)) The result is 3.32 cups. The upper ("right") limit for the population mean: =NORMINV(0.975;3.65;2.52/SQRT(227)) The result is 3.98 cups. Remember: the p in the NORMINV function is the area to the left of the x-‐axis value. For 95% of the area under the curve, the amount of area in the "tails" is 5%. Half in the left, half in the right. The right tail is 2.5% or 0.025. The left tail is also 2.5%, but the area to the LEFT of this 2.5% is 97.5% or 0.975. Margin of Error E of the mean The Margin of Error E for the mean is the distance from the sample mean x to either one of the ends of the confidence interval. The margin of error E is always calculated to come out positive. For the example above: =3.65 − 3.32 =3.98 − 3.65 The margin of error E is 0.33. This represents an uncertainty at a 95% level of confidence of one third of a cup of sakau. The confidence interval is often written as: x -‐ E ≤ µ ≤ x + E For the sakau cup study the 95% confidence interval would be written 3.32 ≤ µ ≤ 3.98. Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 7 of 17 Another common notation you will sometimes see is to write the sample mean x ± margin of error. For the example above we could write: 3.65 ± 0.33 A third notation is related to probability notation: p(3.32 ≤ µ ≤ 3.98) = 0.95 This is related to the first format above and is rarely seen in publications. Standard of Error of the mean, Margin of Error for the mean Do not confuse these two terms. The Standard Error of the mean is ± σ/√(n). The Margin of Error for the mean is the distance from either end of the condifendence interval to the middle of the confidence interval. Example: Given that n = 219 CHS students took the TOEFL examination with a sample mean score of x = 369 and a sample standard deviation sx = 50, construct a 90% confidence interval for the population mean TOEFL score for CHS. The point estimate for the population mean µ is 369. Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 10 of 17 so large that the estimate is without useful meaning. A basic rule in statistics is "the bigger the sample size, the better." The spreadsheet function used to find limits from the Student's t-‐ distribution does not calculate the lower and upper limits directly. The function calculates a value called "t-‐critical" which is written as tc. t-‐critical muliplied by the Standard Error of the mean SE will generate the margin of error for the mean E. Do not confuse the standard error of the mean with the margin of error for the mean. The Standard Error of the mean is sx/√(n). The Margin of Error for the mean (E) is the distance from either end of the condifendence interval to the middle of the confidence interval. The margin of error is produced from the Standard Error: Margin of Error for the mean = tc*standard error of the mean Margin of Error for the mean = tc*sx/√n The confidence interval will be: x -‐ E ≤ µ ≤ x + E Calculating tc The t-‐critical valus will be calculated using the spreadsheet function TINV. TINV uses the area in the tails to calculate t-‐critical. The area under the whole curve is 100%, so the area in the tails is 100% − confidence level c. Remember that in decimal notation 100% is just 1. If the confidence level c is in decimal form use the spreadsheet function below to calculate tc: =TINV(1−c,n−1) If the confidence level c is entered as a percentage with the percent sign, then make sure the 1 is written as 100%: Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 11 of 17 =TINV(100%−c%,n−1) Degrees of Freedom The TINV function adjusts t-‐critical for the sample size n. The formula uses n − 1. This n − 1 is termed the "degrees of freedom." For confidence intervals of one variable the degrees of freedom are n − 1. Example 9.2.1 Runners run at a very regular and consistent pace. As a result, over a fixed distance a runner should be able to repeat their time consistently. While individual times over a given distance will vary slightly, the long term average should remain approximately the same. The average should remain within the 95% confidence interval. For a sample size of n = 10 runs from the college in Palikir to Kolonia town, a runner has a sample mean x time of 61 minutes with a sample standard deviation sx of 7 minutes. Construct a 95% confidence interval for my population mean run time. Step 1: Determine the basic sample statistics sample size n = 10 sample mean x = 61 [61 is also the point est. for the pop. mean µ] sample standard deviation sx = 7 Step 2: Calculate degrees of freedom, tc, standard error SE degrees of freedom = 10 -‐ 1 = 9 tc =TINV(1-‐0.95,10-‐1) = 2.2622 Standard Error of the mean sx/√n = 7/sqrt(10) = 2.2136 Keeping four decimal places in intermediate calculations can help reduce rounding errors in calculations. Alternatively use a spreadsheet and cell references for all calculations. Step 3: Determine margin of error E Margin of error E for the mean = tc*sx/√n = 2.2622*7/√10 = 5.01 Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 12 of 17 Given that: x -‐ E ≤ µ ≤ x + E, we can substitute the values for x and E to obtain the 95% confidence interval for the population mean µ: Step 4: Calcuate the confidence interval for the mean 61 − 5.01 ≤ µ ≤ 61 + 5.01 55.99 ≤ µ ≤ 66.01 I can be 95% confident that my population mean µ run time should be between 56 and 66 minutes. Example 9.2.2 Jumps 102 66 42 22 24 107 8 26 111 79 61 45 43 10 17 20 45 105 68 69 79 13 11 34 58 40 213 On Thursday 08 November 2007 a jump rope contest was held at a local elementary school festival. Contestants jumped with their feet together, a double-‐foot jump. The data seen in the table is the number of jumps for twenty-‐seven female jumpers. Calculate a 95% confidence interval for the population mean number of jumps. The sample mean x for the data is 56.22 with a sample standard deviation of 44.65. The sample size n is 27. You should try to make these calculations yourself. With those three numbers we can proceed to calculate the 95% confidence interval for the population mean µ: Step 1: Determine the basic sample statistics sample size n = 27 sample mean x = 56.22 sample standard deviation sx = 44.65 Step 2: Calculate degrees of freedom, tc, standard error SE The degrees of freedom are n − 1 = 26 Therefore tcritical = TINV(1-‐0.95,27-‐1) = 2.0555 The Standard Error of the mean SE = sx/√27 = 8.5924 Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 15 of 17 The confidence interval for the population proportion P is: p − E ≤ P ≤ p + E 0.8 − 0.1137 ≤ P ≤ 0.8 + 0.1137 0.20 − 0.1137 ≤ P ≤ 0.20 + 0.1137 0.0863 ≤ P ≤ 0.3137 The result is that the expected population mean for Marshall Island High School is between 8.6% and 31.2%. The 95% confidence interval does not include the 7% rate of the Chuuk public high schools. While the college entrance test is not a measure of overall academic capability, there are few common measures that can be used across the two nations. The result does not contradict the staffer's assertion that MIHS outperformed the Chuuk public high schools. This lack of contradiction acts as support for the original statement that MIHS outperformed the public high schools of Chuuk in 2004. Homework: In twelve sumo matches Hakuho bested Tochiazuma seven times. What is the 90% confidence interval for the population proportion of wins by Hakuho over Tochiazuma. Does the interval extend below 50%? A commentator noted that Tochiazuma is not evenly matched. If the interval includes 50%, however, then we cannot rule out the possibility that the two-‐ win margin is random and that the rikishi (wrestlers) are indeed evenly matched. Hakuho won that night, upping the ratio to 8 wins to 5 losses to Tochiazuma. Is Hakuho now statistically more likely to win or could they still be evenly matched at a confidence level of 90%? 9.4 Deciding on a sample size Suppose you are designing a study and you have in mind a particular error E you do not want to exceed. You can determine the sample size n you'll need if you have prior knowledge of the standard deviation sx. How would Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 16 of 17 you know the sample standard deviation in advance of the study? One way is to do a small "pre-‐study" to obtain an estimate of the standard deviation. These are often called "pilot studies." If we have an estimate of the standard deviation, then we can estimate the sample size needed to obtain the desired error E. Since E = tc*sx/√n, then solving for n yields = (tc*sx/E)² Note that this is not a proper mathematical solution because tc is also dependent on n. While many texts use zc from the normal distribution in the formula, we have not learned to calculate zc. In the "real world" what often happens is that a result is found to not be statistically significant as the result of an initial study. Statistical significance will be covered in more detail later. The researchers may have gotten "close" to statistical significance and wish to shrink the confidence interval by increasing the sample size. A larger sample size means a smaller standard error (n is in the denominator!) and this in turn yields a smaller margin of error E. The question is how big a sample would be needed to get a particular margin of error E. The value for tc from pilot study can be used to estimate the new sample size n. The resulting sample size n will be slightly overestimated versus the traditional calculation made with the normal distribution. This overestimate, while slightly unorthodox, provides some assurance that the error E will indeed shrink as much as needed. In a study of body fat for 51 males students a sample mean x of 19.9 with a standard deviation of 7.7 was measured. This led to a margin of error E of 2.17 and a confidence interval 17.73 ≤ µ ≤ 22.07 Suppose we want a margin of error E = 1.0 at a confidence level of 0.95 in this study of male student body fat. We can use the sx from the sample of 51 students to estimate my necessary sample size: Source URL: http://www.comfsm.fm/~dleeling/statistics/text.html#page-091 Saylor URL: http://saylor.org/courses/bus204 Attributed to: [Dana Lee Ling] Saylor.org Page 17 of 17 n = (2.0086*7.7/1)2 = 239.19 or 239 students. Thus I estimate that I will need 239 male students to reduce my margin of error E to ±1 in my body fat study. Other texts which use zc would obtain the result of 227.77 or 228 students. The eleven additional students would provide assurance that the margin of error E does fall to 1.0. That one can calculate a sample size n necessary to reduce a margin of error E to a particular level means that for any hypothesis test (chapter ten) in which the means have a mathematical difference, statistical significance can be eventually be attained by sufficiently increasing the sample size. This may sound appealing to the researcher trying to prove a difference exists, but philosophically it leaves open the concept that all things can be proven true for sufficiently large samples.