Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Confidence Intervals for a Proportion: Lecture Notes, Lecture notes of Statistics

Lecture notes on calculating confidence intervals for a proportion using R. It covers the formulas for 95%, 90%, and 85% confidence intervals, as well as ways to write and interpret the intervals. The document also includes examples and critical values for various confidence levels.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

explain
explain 🇺🇸

4

(2)

3 documents

1 / 23

Toggle sidebar

Related documents


Partial preview of the text

Download Confidence Intervals for a Proportion: Lecture Notes and more Lecture notes Statistics in PDF only on Docsity! 150 Chapter 4. Statistics (LECTURE NOTES 8) 4.5 Confidence Intervals for a Proportion Let Z be N(0, 1) and p be a number between 0 and 1; critical z-value zp is P (Z > zp) = 1− Φ(zp) = p. Let 0 < α < 1 and x be number of successes in n observed trials of a Bernoulli experiment with unknown probability of success p. For p̂ = x n , the 100(1 − α)% confidence interval for proportion p is p̂± zα 2 √ p̂(1− p̂) n = [ p̂− zα 2 √ p̂(1− p̂) n , p̂+ zα 2 √ p̂(1− p̂) n ] , where E = zα 2 √ p̂(1− p̂) n , and √ p̂(1− p̂) n are the margin of error and standard deviation of the proportion respectively and α is the level of significance. We assume a large random sample is chosen, both np ≥ 5 and np(1 − p) ≥ 5 and the conditions of a binomial distribution is satisfied. Also, one-sided confidence interval estimates for p include lower and upper bound respectively: [ p̂− zα √ p̂(1− p̂) n , 1 ] , [ 0, p̂+ zα √ p̂(1− p̂) n ] . Exercise 4.5 (Confidence Intervals for a Proportion) 1. Confidence interval (CI) for proportion, p, of purchase slips made with Visa. It is found 54 of 180 (or p̂ = 54 180 = 0.3) randomly selected from all credit card purchase slips are made with Visa where conditions of binomial distribution are satisfied. Calculate a 95% confidence interval (CI) of proportion p of purchase slips made with Visa. (a) Point estimate. Point estimate of population (actual, true) proportion of all credit card purchase slips made with Visa, p, is p̂ = (i) 0.3 (ii) 54 (iii) 180. Statistic p̂ = 0.3 probably does not exactly equal unknown parameter p. (b) Check assumptions. Since random sample chosen, conditions of binomial distribution are satisfied, and np(1− p) ≈ np̂(1− p̂) = 180(0.3)(0.7) = 37.8 ≥ 5, and np ≈ np̂ = 180(0.3) = 54 ≥ 5, assumptions (i) have (ii) have not been satisfied and so it is appropriate p̂± zα 2 √ p̂(1−p̂) n estimate parameter p. Section 5. Confidence Intervals for a Proportion (LECTURE NOTES 8) 151 (c) 95% Confidence Interval (CI) using R. The 95% CI for proportion of all credit cards made with Visa, p, is (i) (0.251, 0.349) (ii) (0.273, 0.367) (iii) (0.233, 0.367). prop1.interval <- function(x,n,conf.level) # function of 1-proportion CI for p { p <- x/n z.crit <- -1*qnorm((1-conf.level)/2) margin.error <- z.crit*sqrt(p*(1-p)/n) ci.lower <- p - margin.error ci.upper <- p + margin.error dat <- c(p, z.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("Mean", "Critical Value", "Margin of Error", "CI lower", "CI upper") return(dat) } prop1.interval(54,180,0.95) # 1-proportion 95% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.30000000 1.95996398 0.06694551 0.23305449 0.36694551 where this interval includes not only smallest possible proportion of 0.233 and largest possible proportion of 0.367, but also other proportions in between these two extremes such as point estimate, p̂ = 0.3. Length of this CI is L ≈ 0.367− 0.233 = 0.134. So, 95% confident population parameter p in (0.233, 0.367). (d) 90% CI using R. The 90% CI for proportion of all credit cards made with Visa, p, is (i) (0.251, 0.349) (ii) (0.244, 0.356) (iii) (0.233, 0.367). Length of this CI is L ≈ 0.356− 0.244 = 0.112. prop1.interval(54,180,0.90) # 1-proportion 90% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.30000000 1.64485363 0.05618245 0.24381755 0.35618245 (e) 85% CI using R. The 85% CI for proportion of all credit cards made with Visa, p, is (i) (0.251, 0.349) (ii) (0.273, 0.367) (iii) (0.233, 0.367). Length of this CI is L ≈ 0.349− 0.251 = 0.098. prop1.interval(54,180,0.85) # 1-proportion 85% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.30000000 1.43953147 0.04916936 0.25083064 0.34916936 (f) Comparing CI lengths. Length of 95% CI for p, L = 0.134, is (i) longer than (ii) same length as (iii) shorter than length of 90% CI for p, L = 0.112, which is (i) longer than (ii) same length as (iii) shorter than length of 85% CI for p, L = 0.098. Increasing confidence increases CI length. (g) Margin of error. Half of length, L, is margin of error, E = L 2 . Consequently, for 95% CI for p, 154 Chapter 4. Statistics (LECTURE NOTES 8) iii. 0.3 ± 1.44 × √ 0.3(1−0.3) 180 (l) Population, Sample, Statistic and Parameter. Match columns. terms credit card example (a) population (a) Visa or not, all purchase slips (b) sample (b) proportion of all slips made with Visa, p (c) statistic (c) Visa or not, 180 purchase slips (d) parameter (d) proportion of 180 slips made with Visa, p̂ terms (a) (b) (c) (d) credit card example 2. 95% CI, proportion of student heights over 6 feet tall. 37 of 102 students, chosen at random from PNW, over 6 feet tall. (a) Point estimate Point estimate of proportion, p, of student heights over 6 feet tall is p̂ = 37 102 ≈ (i) 0.363 (ii) 0.378 (iii) 0.391. (b) Check assumptions. Since np ≈ np̂ = 102 ( 37 102 ) = 37 ≥ 5, and np(1− p) ≈ np̂(1− p̂) = 102 ( 37 102 ) ( 1− 37 102 ) ≈ 23.6 > 5, assumptions (i) have (ii) have not been satisfied and so it is appropriate p̂± zα 2 √ p̂(1−p̂) n estimate parameter p. (c) Using R. The 95% CI for p is (i) (0.269, 0.456) (ii) (0.273, 0.367) (iii) (0.233, 0.367). prop1.interval(37,102,0.95) # 1-proportion 95% CI for p Mean Critical Value Margin of Error CI lower CI upper 0.3627451 1.9599640 0.0933051 0.2694400 0.4560502 (d) Using formula: critical value using R. Critical value for 95% = (1− α) · 100% = (1− 0.05) · 100% CI for p is zα 2 = z 0.05 2 = z0.025 = (i) 1.28 (ii) 1.96 (iii) 2.58. qnorm(0.975) # critical value z_0.05/2 for 95% CI > qnorm(0.975) # critical value z_0.05/2 [1] 1.959964 (e) Using formula: critical value using Table C.1. Critical value for 95% = (1− α) · 100% = (1− 0.05) · 100% CI for p is zα 2 = z 0.05 2 = z0.025 = (i) 1.28 (ii) 1.96 (iii) 2.58. (f) Using formula. Since p̂ = 37 102 and n = 102, the 95% CI for p is p̂± zα 2 √ p̂(1−p̂) n = Section 6. Confidence Intervals for a Mean (LECTURE NOTES 8) 155 (i) 0.36 ± 1.28 × √ 0.36(1−0.36) 102 (ii) 0.36 ± 1.96 × √ 0.36(1−0.36) 102 (iii) 0.36 ± 2.58 × √ 0.36(1−0.36) 102 ≈ (0.269, 0.456) (g) Length, L, of 95% CI is L = 0.456− 0.269 = (i) 0.176 (ii) 0.187 (iii) 0.354. Half of length, margin of error, E = L 2 = (i) 0.088 (ii) 0.0935 (iii) 0.177. Notice, margin of error also equals E = zα 2 √ p̂(1− p̂) n = 1.96× √ 37 102 (1− 37 102 ) 102 ≈ 0.0935. (h) Confidence Level and Sample Size. The larger the confidence level (critical value, zα 2 ) the (i) larger (ii) smaller the margin of error. The larger the sample size, n, the (i) larger (ii) smaller the margin of error. (i) One-sided confidence level with upper bound using R. The 95% CI for p with upper bound is( 0, p̂+ zα √ p̂(1− p̂) n ) = (i) (0.269, 0.456) (ii) (0.273, 1) (iii) (0, 0.441). prop1.interval <- function(x,n,conf.level) # function of 1-proportion CI with upper bound for p { p <- x/n z.crit <- -1*qnorm(1-conf.level) margin.error <- z.crit*sqrt(p*(1-p)/n) ci.lower <- 0 ci.upper <- p + margin.error dat <- c(p, z.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("Mean", "Critical Value", "Margin of Error", "CI lower", "CI upper") return(dat) } prop1.interval(37,102,0.95) # 1-proportion 95% CI with upper bound for p Mean Critical Value Margin of Error CI lower CI upper 0.36274510 1.64485363 0.07830411 0.00000000 0.44104920 4.6 Confidence Intervals for a Mean Let x̄ be the mean with sample of size n taken from a population with know variance σ2 and unknown mean µ and 0 < α < 1. The (1 − α) · 100% confidence interval for 156 Chapter 4. Statistics (LECTURE NOTES 8) µ is called a z-interval: x̄± zα 2 ( σ√ n ) . The (1− α) · 100% confidence interval for µ with unknown σ is called a t-interval: x̄± tα 2 ( s√ n ) , where T = X̄−µ S√ n has a Student-t distribution and where E = tα 2 ( s√ n ) and ( s√ n ) are the margin of error and standard error of the mean respectively and α is the level of significance. We assume a large random sample, where either the underlying distribution is normal with no outliers or if the sample size large (n > 30). Also, one- sided confidence interval estimates for µ include lower and upper bound respectively:( x̄− tα ( s√ n ) ,∞ ) , ( −∞, x̄+ tα ( s√ n )) . Exercise 4.6 (Confidence Intervals for a Mean) 1. Estimates for population average weight of PNW students. Average weight of simple random sample of 11 PNW students is x̄ = 167 pounds with sample SD s = 20.1 pounds. Weights normally distributed, no outliers. (a) Point estimate. Point estimate of population weight of all students, µ, is x̄ = (i) 11 (ii) 20.1 (iii) 167. Also notice σ is unknown and estimated by s = 20.1. (b) 95% CI i. Using R. The 95% CI for µ is (i) (143.5, 182.5) (ii) (151.5, 180.5) (iii) (153.5, 180.5). mean1.t.interval <- function(m,s,n,conf.level) { t.crit <- -1*qt((1-conf.level)/2,n-1) margin.error <- t.crit*s/sqrt(n) ci.lower <- m - margin.error ci.upper <- m + margin.error dat <- c(mean, t.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("Mean", "Critical Value", "Margin of Error", "CI lower", "CI upper") return(dat) } mean1.t.interval(167,20.1,11,0.95) # m: mean, s: SD, n: sample size, 95% t-interval Section 6. Confidence Intervals for a Mean (LECTURE NOTES 8) 159 iv. Using formula. The 95% CI for µ is x̄± tα 2 s√ n = (i) 21.6±2.15× 2.97√ 15 (ii) 21.6±2.15× 3.97√ 15 (iii) 21.6±3.15× 2.97√ 15 . (c) 99% CI i. Using R. The 99% CI for µ is (i) (19.23, 23.45) (ii) (19.96, 23.24) (iii) (19.32, 23.88). mean1.t.interval(m,s,n,0.99) # m: mean, s: SD, n: sample size, 99% t-interval Mean Critical Value Margin of Error CI lower CI upper 21.600000 2.976843 2.283786 19.316214 23.883786 ii. Using formula: degrees of freedom (df). The df, here, for 99% CI is (i) same as (ii) different from degrees of freedom calculated for 95% CI above because same sample size is used in both cases. iii. Using formula: critical value. Critical value 99% = (1− α) · 100% = (1− 0.01) · 100% CI, 14 df tα 2 = t 0.01 2 = t0.005 ≈ (i) 1.76 (ii) 2.98. qt(0.995,14) # critical value t, 14 df, for 99% CI [1] 2.976843 iv. Using formula. Thus, the 99% CI for µ is x̄± tα 2 s√ n = (i) 21.6±2.15× 2.97√ 15 (ii) 21.6±2.15× 3.97√ 15 (iii) 21.6±2.98× 2.97√ 15 . which equals (i) 21.6 ± 1.29 (ii) 21.6 ± 2.29 (iii) 21.6 ± 3.29 ≈ (19.32, 23.88). (d) Some comments i. (i) True (ii) False. Long 99% CI better than shorter 95% CI in the sense we are more confident 99% contains or “captures” unknown parameter µ. However, 95% CI better than longer 99% CI in the sense, if unknown parameter µ is 95% interval estimate, we are more certain of location of this unknown parameter. ii. Since sample size is small, we can (ii) cannot use central limit theo- rem. iii. Match columns. terms corn example (a) population (a) average length of 15 plants, X̄ (b) sample (b) average length of all plants, µ (c) statistic (c) lengths of all plants (d) parameter (d) observed lengths of 15 plants terms (a) (b) (c) (d) corn example 160 Chapter 4. Statistics (LECTURE NOTES 8) 3. Population, sample, statistic and parameter: CI for average corn cob length. Simple random sample of 15 corn cobs is taken. Assume sample SD in length is s = 2.97 and, although we typically don’t know it, population (not sample) length is µ = 22 inches. Assume normality. (a) Population µ = 22 length Population µ = 22 is a (i) statistic (ii) parameter. Population µ (i) changes (ii) remains same for every random sample. Population µ (usually) (i) known (ii) unknown to us, (although we are pretending for this question we do know it.) (b) Sample x̄ length Sample x̄ is a (i) statistic (ii) parameter. Sample x̄ (i) changes (ii) remains same for every random sample. Sample x̄ (usually)(i) known (ii) unknown to us: it may be x̄ = 21.6 for one sample, but x̄ = 29.8 for another sample. (c) A 95% CI for µ, if x̄ = 21.6, is x̄± tα 2 s√ n = 21.6± 1.96 2.97√ 15 = (i) (19.95, 23.24) (ii) (23.45, 27.80) (iii) (28.16, 31.44). mean1.t.interval(21.6,2.97,14,0.95) # m: mean, s: SD, n: sample size, 95% t-interval Mean Critical Value Margin of Error CI lower CI upper 21.600000 2.160369 1.714827 19.885173 23.314827 This 95% CI (i) contains (ii) does not contain µ = 22. (d) A 95% CI for µ, if x̄ = 29.8, is x̄± tα 2 s√ n = 29.8± 1.96 2.97√ 15 = (i) (19.60, 23.60) (ii) (23.45, 27.80) (iii) (28.16, 31.44). mean1.t.interval(29.8,2.97,14,0.95) # m: mean, s: SD, n: sample size, 95% t-interval Mean Critical Value Margin of Error CI lower CI upper 29.800000 2.160369 1.714827 28.085173 31.514827 This 95% CI (i) contains (ii) does not contain µ = 22. (e) If sample average length, x̄, changes, corresponding 95% CI, x̄± tα 2 s√ n , (i) changes (ii) remains the same. More than this, i. all possible 95% CIs contain µ = 22. ii. none of all possible 95% CIs contain µ = 22. iii. ninety–nine percent of all possible 95% CIs contain µ = 22, and so one percent of all possible 95% CIs do not contain µ = 22. iv. ninety–five percent of all possible 95% CIs contain µ = 22, and so five percent of all possible 95% CIs do not contain µ = 22. This is demonstrated in figure below. (f) Choose true or false. Section 7. Confidence Intervals for a Variance (LECTURE NOTES 8) 161 22 24 20 miss miss true unknow average length 95% confidence interval: out of many many CIs, exactly 95% capture the true unknown average and 5% miss “and so on” Figure 4.6: Interpreting confidence intervals (i) True (ii) False. 95% chance (19.95, 23.24) contains µ. (i) True (ii) False. 95% chance (19.95, 23.24) contains x̄ = 21.6. (i) True (ii) False. 95% confident (19.95, 23.24) contains µ. (i) True (ii) False. 95% confident (19.95, 23.24) contains x̄ = 21.6. 4.7 Confidence Intervals for a Variance Let s2 be the variance of a random sample of size n taken from a normally distributed population with unknown variance σ2 and 0 < α < 1. The (1− α) · 100% confidence interval (CI) for σ2 is ( (n− 1)s2 χ2 α/2 , (n− 1)s2 χ2 1−α/2 ) Also, one-sided confidence interval estimates for σ2 include lower and upper bound respectively: ( (n− 1)s2 χ2 α ,∞ ) , ( 0, (n− 1)s2 χ2 1−α ) . Exercise 4.7 (Confidence Intervals for a Variance) 1. Estimation for variance: car door and jamb. In a simple random sample of 28 cars, variance in gap between door and jamb is s2 = 0.7 mm2. Calculate 95% CI. Assume normality with no outliers. 164 Chapter 4. Statistics (LECTURE NOTES 8) Let x̄1 and x̄2 be the means of two independent samples of size n1 and n2 from two populations and means µ1 and µ2. The (1 − α) · 100% 2-sample z-interval for µ1 − µ2, with known variances σ2 1 and σ2 2, is (x̄1 − x̄2)± zα 2 √ σ2 1 n1 + σ2 2 n2 , or, with unknown variances but where it is assumed σ2 1 = σ2 2, (x̄1 − x̄2)± sp · tα 2 √ 1 n1 + 1 n2 , where the pooled standard deviation estimate is sp = √ (n1 − 1)s2 1 + (n2 − 1)s2 2 n1 + n2 − 2 , and tα 2 has n1 + n2 − 2 degrees of freedom or, with unknown variances but where it is assumed σ2 1 6= σ2 2, (x̄1 − x̄2)± tα 2 √ s2 1 n1 + s2 2 n2 , and tα 2 has the following r degrees of freedom (round down), r = ( s21 n1 + s22 n2 )2 1 n1−1 ( s21 n1 )2 + 1 n2−1 ( s22 n2 )2 where either the underlying distribution of both samples are normal with no outliers or if both random sample sizes large (n1 ≥ 30, n2 ≥ 30). Also, if the two samples are dependent or paired, the confidence interval for the difference in two means µd is d̄± tα 2 ( sd√ n ) , where either the underlying distribution of differences is normal with no outliers or the random sample size is large (n ≥ 30). Exercise 4.8 (Confidence Intervals for a Differences) 1. Inference p1 − p2, large independent samples: doctors. Calculate 95% 2-proportion z-interval of difference in proportions of male doc- tors in military and civilian hospitals. Section 8. Confidence Intervals for a Differences (LECTURE NOTES 8) 165 military (1) civilian (2) male doctors 358 6786 total doctors 407 7363 From above, p̂1 = 358 407 , p̂2 = 6786 7363 ; also critical value for 95% = (1− α) · 100% = (1− 0.05) · 100% CI, of zα 2 = z 0.05 2 = z0.025 ≈ (i) 1.65 (ii) 1.96 (iii) 2.09, qnorm(0.975) # critical value z, for 95% CI [1] 1.959964 and so 95% CI for p1 − p2 is p̂1 − p̂2 ± zα 2 √ p̂1(1− p̂1) n1 + p̂2(1− p̂2) n2 = ( 358 407 − 6786 7363 ) ± 1.96 · √ 358 407 ( 1− 358 407 ) 407 + 6786 7363 ( 1− 6786 7363 ) 7363 ≈ (i) (−0.054,−0.008) (ii) (−0.064,−0.009) (iii) (−0.074,−0.010) prop2.interval <- function(x, n, conf.level) { x1 <- x[1]; x2 <- x[2]; n1 <- n[1]; n2 <- n[2] p.hat1 <- x1/n1; p.hat2 <- x2/n2 z.crit <- -1*qnorm((1-conf.level)/2) margin.error <- z.crit*sqrt(p.hat1*(1-p.hat1)/n1+p.hat2*(1-p.hat2)/n2) ci.lower <- p.hat1-p.hat2 - margin.error ci.upper <- p.hat1-p.hat2 + margin.error dat <- c(p.hat1, p.hat2, z.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("p.hat1", "p.hat2", "z crit", "Margin of Error", "CI lower", "CI upper") return(dat) } prop2.interval(c(358,6786), c(407,7363), 0.95) # approx 2-proportion z-test for p, two-sided p.hat1 p.hat2 z crit Margin of Error CI lower CI upper 0.879606880 0.921635203 1.959963985 0.032205624 -0.074233948 -0.009822699 Since confidence interval does not include (is, in fact, smaller than) zero, this indicates population proportion of male military doctors (i) is less than (ii) equals (iii) is greater than (iv) is different from the population proportion of male civilian doctors. 2. CI for µ1 − µ2, independent samples, unknown σ2 1 = σ2 2: progesterone. A study is conducted to determine cellular response to progesterone in females. Blood cells from four females are injected with progesterone; blood cells from four different females are, for comparison purposes, left untreated. Calculate 95% CI. Assume normality with no outliers. 166 Chapter 4. Statistics (LECTURE NOTES 8) female progesterone (1) female control (2) 1 5.85 5 5.23 2 2.28 6 1.21 3 1.51 7 1.40 4 2.12 8 1.38 progesterone <- c(5.85, 2.28, 1.51, 2.12) control <- c(5.23, 1.21, 1.40, 1.38) From R, x̄1 ≈ 2.94, s1 ≈ 1.97, x̄2 ≈ 2.305, s2 ≈ 1.95, m1 <- mean(progesterone); m1; s1 <- sqrt(var(progesterone)); s1 m2 <- mean(control); m2; s2 <- sqrt(var(control)); s2 > mean(progesterone); sqrt(var(progesterone)) [1] 2.94 [1] 1.968163 > mean(control); sqrt(var(control)) [1] 2.305 [1] 1.95186 so pooled standard deviation is sp = √ (n1 − 1)s2 1 + (n2 − 1)s2 2 n1 + n2 − 2 ≈ √ (4− 1)1.972 + (4− 1)1.952 2 + 4− 2 ≈ (i) 1.95 (ii) 1.96 (iii) 1.97 (which not surprising since s1 ≈ 1.97, s2 ≈ 1.95) n1 <- length(progesterone); n2 <- length(control) s12 <- var(progesterone); s22 <- var(control) sp <- sqrt(((n1-1)*s12 + (n2-1)*s22)/(n1+n2-2)); sp [1] 1.96003 and critical value for 95% = (1− α) · 100% = (1− 0.05) · 100% CI, with degrees of freedom = n1 + n2 − 2 = 4 + 4− 2 = (i) 4 (ii) 6 (ii) 8, so tα 2 = t 0.05 2 = t0.025 ≈ (i) 2.31 (ii) 2.45 (iii) 3.09, qt(0.975,6) # critical t value, 95% CI, using r df [1] 2.446912 and so 95% CI for µ1 − µ2 is (x̄1 − x̄2)± sp · tα 2 √ 1 n1 + 1 n2 = (2.94− 2.305)± 1.96 · 2.45 · √ 1 4 + 1 4 = (i) (−2.52, 6.49) (ii) (−2.62, 6.39) (iii) (−2.76, 4.03) Section 8. Confidence Intervals for a Differences (LECTURE NOTES 8) 169 gentech <- c(62, 45, 53, 35, 71, 64, 63, 57, 43) control <- c(54, 43, 55, 39, 65, 62, 56, 50, 52) diff <- gentech - control; diff [1] 8 2 -2 -4 6 2 7 7 -9 d̄ ≈ (i) 1.41 (ii) 1.89 (iii) 2.52, sd ≈ (i) 5.47 (ii) 5.86 (iii) 6.52, mean(diff); sqrt(var(diff)) [1] 1.888889 [1] 5.861835 with n− 1 = 9− 1 = (i) 6 (ii) 7 (ii) 8 degrees of freedom, and critical value 95% = (1− α) · 100% = (1− 0.05) · 100% CI, so tα 2 = t 0.05 2 = t0.025 ≈ (i) 2.31 (ii) 2.53 (iii) 3.09, qt(0.975,8) # critical t value, 95% CI, nd - 1 = 9 - 1 = 8 df [1] 2.306004 and so 95% CI for µd is d̄± tα 2 sd√ n = 1.89± 2.31× 5.86√ 8 = (i) (−2.52, 6.49) (ii) (−2.62, 6.39) (iii) (−2.72, 6.29) mean1.t.interval <- function(m,s,n,conf.level) { t.crit <- -1*qt((1-conf.level)/2,n-1) margin.error <- t.crit*s/sqrt(n) ci.lower <- m - margin.error ci.upper <- m + margin.error dat <- c(mean, t.crit, margin.error, ci.lower, ci.upper) names(dat) <- c("Mean", "Critical Value", "Margin of Error", "CI lower", "CI upper") return(dat) } mean1.t.interval(1.889,5.8618,9,0.95) # m: mean, s: SD, n: sample size, 95% t-interval Mean Critical Value Margin of Error CI lower CI upper 1.889000 2.306004 4.505778 -2.616778 6.394778 Since confidence interval does include zero, this indicates gentech population mean milk yield (i) is less than (ii) equals (iii) is greater than (iv) is different from control population mean milk yield. 170 Chapter 4. Statistics (LECTURE NOTES 8) 4.9 Sample Size The length of a confidence interval (equivalently, margin of error) can be controlled by sample size, in particular, the larger the sample size, the smaller, more accurate, the confidence interval. The sample size necessary to achieve a required margin of error, E, with a given level of confidence in a confidence interval of proportion p is determined using formula, if prior p̂ available, n = p̂(1− p̂) (zα 2 E )2 , and if prior p̂ unavailable, n = 1 4 (zα 2 E )2 . The sample size necessary to achieve a required margin of error, E, with a given level of confidence in a confidence interval of mean µ is determined using formula n = (zα 2 σ E )2 . where, if σ2 is unknown, using approximation σ ≈ max−min 4 Exercise 4.9 (Sample Size) 1. Sample size for proportion p: credit card purchase slips. (a) With prior p̂: purchase slips. In an initial simple random sample, twenty–five (25) of 100 purchase slips chosen are Visa. What is sample size, n, required to estimate proportion Visa purchase slips, p, to within margin of error of E = 0.01 with 85% confidence? Here n = p̂(1− p̂) (zα 2 E )2 = ( 25 100 )( 75 100 )( 1.44 0.01 )2 ≈ (i) 3886 (ii) 5184 (ii) 5470. n.prop <- function(p.hat,margin.error,conf.level) { z.crit <- -1*qnorm((1-conf.level)/2) p.hat*(1-p.hat)*z.crit^2/margin.error^2 } n.prop(0.25,0.01,0.85) # n for prior p-hat = 0.25, E = 0.01, 85% confidence [1] 3885.47 Section 9. Sample Size (LECTURE NOTES 8) 171 (b) Sample size for proportion p without prior p̂: purchase slips. What is sample size, n, required to estimate proportion Visa purchase slips, p, to within margin of error of E = 0.01 with 85% confidence? Here n = 1 4 (zα 2 E )2 = 1 4 ( 1.44 0.01 )2 ≈ (i) 4409 (ii) 5181 (iii) 5470. n.prop(0.5,0.01,0.85) # n required no prior (max = 0.5), E = 0.01, 85% confidence [1] 5180.627 Without prior p̂ = 0.25, sample size (i) decreases (ii) remains same (iii) increases from n ≈ 3886 to n ≈ 5181. 2. Sample size for mean µ: corn cob lengths. (a) What sample size, n, required to estimate average corn cob length , µ, to within margin of error E = 0.08 with 95% confidence? Assume σ = 0.25. n = (zα 2 s E )2 = (z0.025σ E )2 = ( 1.96 · 0.25 0.08 )2 ≈ (i) 37 (ii) 38 (iii) 39. n.mean <- function(s,margin.error,conf.level) { z.crit <- -1*qnorm((1-conf.level)/2) s^2*z.crit^2/margin.error^2 } n.mean(0.25,0.08,0.95) [1] 37.51425 (b) Increase margin of error, E. What sample size, n, required to estimate average corn cob length , µ, to within margin of error E = 0.16 with 95% confidence? Assume σ = 0.25. n = (zα 2 s E )2 = (z0.025σ E )2 = ( 1.96 · 0.25 0.16 )2 ≈ (i) 9 (ii) 10 (iii) 11. n.mean(0.25,0.16,0.95) # n for sigma = 0.25, E = 0.16, 95% confidence [1] 9.378562 When margin of error doubled, from E = 0.08 to E = 0.16, sample size (i) quartered (ii) halved (iii) doubled from n = 38 to n = 10. Less data gives less accurate, wider, CI.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved