Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistics Test II: Solutions for Questions on Normal and Binomial Distributions, Exams of Statistics

Solutions for various statistics-related questions from a university test. The questions cover topics such as normal and binomial distributions, hypothesis testing, and confidence intervals. Students studying statistics at the university level may find this document useful for reviewing concepts and preparing for exams.

Typology: Exams

2012/2013

Uploaded on 02/26/2013

aamir
aamir 🇮🇳

3.5

(7)

65 documents

1 / 12

Toggle sidebar

Related documents


Partial preview of the text

Download Statistics Test II: Solutions for Questions on Normal and Binomial Distributions and more Exams Statistics in PDF only on Docsity! MAT 167: Statistics Test II Instructor: Anthony Tanbakuchi Spring 2008 Name: Computer / Seat Number: No books, notes, or friends. Show your work. You may use the attached equation sheet, R, and a calculator. No other materials. If you choose to use R, write what you typed on the test or copy and paste your work into a word document labeling the question number it corresponds to. When you are done with the test print out the document. Be sure to save often on a memory stick just in case. Using any other program or having any other documents open on the computer will constitute cheating. You have until the end of class to finish the exam, manage your time wisely. If something is unclear quietly come up and ask me. If the question is legitimate I will inform the whole class. Express all final answers to 3 significant digits. Probabilities should be given as a decimal number unless a percent is requested. Circle final answers, ambiguous or multiple answers will not be accepted. Show steps where appropriate. The exam consists of 13 questions for a total of 84 points on 9 pages. This Exam is being given under the guidelines of our institution’s Code of Academic Ethics. You are expected to respect those guidelines. Points Earned: out of 84 total points Exam Score: MAT 167: Statistics, Test II p. 1 of 9 1. An experiment consists of randomly sampling 10 students at Pima Community College, record- ing their heights and computing their mean height. If we repeat the experiment over and over (assume simple random samples, no human errors, no bias) we observe that the sample mean height varies each time. (a) (2 points) What is the name of the error that causes the mean to vary each time? (b) (2 points) Explain how it is possible for the sample mean height to vary each time? What is going on? (c) (2 points) In words, state what the population distribution represents in this experiment. (Be specific.) (d) (2 points) In words, state what the sampling distribution represents in this experiment. (Be specific.) 2. (2 points) Under what conditions can we approximate a binomial distribution as a normal distribution? 3. (2 points) Which distribution (normal, binomial, both, or neither) would be appropriate for describing: The distribution for the number of people who wear glasses in a random sample of 20 people where the probability an individual person wears glasses is 0.56. Instructor: Anthony Tanbakuchi Points earned: / 12 points MAT 167: Statistics, Test II p. 4 of 9 8. Engineers must consider the breadths of male heads when designing motorcycle helmets. Men have head breadths that are normally distributed with a mean of 6.0 in and a standard deviation of 1.0 in (based on anthropometric survey data from Gordon, Churchill, et al.). (a) (2 points) If 1 man is randomly selected, find the probability that his head breadth is less than 6.2 in. (b) (2 points) If 100 men are randomly selected, find the probability that their mean head breadth is less than 6.2 in. (c) (2 points) ACME motorcycle company is making a new adjustable helmet. In reality, it is not economical to make a helmet that fits everyone. You must design a helmet that will fit all but largest 5% of male head breadths. What is the largest size male head breadth that your new helmet will fit? Instructor: Anthony Tanbakuchi Points earned: / 6 points MAT 167: Statistics, Test II p. 5 of 9 9. (2 points) ACME helmet company needs to know the mean head breadth of women for a new helmet design. You conduct a study of 8 randomly selected women (via a simple random sample). Below is the data from the study. 5.1, 5.7, 5.5, 6.4, 5.7, 5.9, 5.1, 6.2 Construct a 95% confidence interval for the mean head breadth size for women (Assume σ is unknown.) 10. You believe that the true mean head breadth for women is less than that of men (6.0 in). Using the same study data from the previous question of 8 randomly women (shown below again), conduct a hypothesis test to test your claim. Use a significance level of 0.1 and assume σ is unknown and women’s head breadths are normally distributed. 5.1, 5.7, 5.5, 6.4, 5.7, 5.9, 5.1, 6.2 (a) (2 points) What type of hypothesis test will you use? (b) (2 points) What are the test’s requirements? (c) (2 points) Are the requirements satisfied? State how they are satisfied. Instructor: Anthony Tanbakuchi Points earned: / 8 points MAT 167: Statistics, Test II p. 6 of 9 (d) (2 points) What are the hypothesis? (e) (2 points) What α will you use? (f) (2 points) Conduct the hypothesis test. What is the p-value? (g) (2 points) What is your formal decision? (h) (2 points) State your final conclusion in words. (i) (2 points) What is the actual probability of a Type I error for this study data? (j) (2 points) If the researcher had an α of 0.005 and failed reject H0, have we proven that the mean head breadth size of women is 6.0in? 11. Over the past 55 years, data from the National Oceanic and Atmospheric Administration (NOAA) indicates the the probability of precipitation1 on a given day in Tucson is 0.146. (Use 365 days in a year.) (a) (2 points) Find the mean and standard deviation for the number of days per year with precipitation in Tucson. 1Data from http://www.wrcc.dri.edu/cgi-bin/clilcd.pl?az23160. Precipitation defined as 0.01 inches or more. Instructor: Anthony Tanbakuchi Points earned: / 16 points MAT 167: Statistics, Test II p. 9 of 9 Using the data from the study, you run the analysis in R. Below is the output. 1−sample propor t i on s t e s t with c o n t i n u i t y c o r r e c t i o n data : 78 out o f 100 , n u l l p r o b a b i l i t y 0 .5 X−squared = 30 .25 , df = 1 , p−value = 1.899 e−08 a l t e r n a t i v e hypothes i s : t rue p i s g r e a t e r than 0 .5 95 percent con f idence i n t e r v a l : 0 .6995942 1.0000000 sample e s t imate s : p 0 .78 (d) (2 points) What is the point estimate from the study for the proportion of people who support John McCain? (e) (2 points) What is the p-value. (f) (2 points) What is your formal decision? (g) (2 points) State your final conclusion in words based upon the analysis above. (h) (2 points) What is wrong with this study. ************ End of exam. Reference sheets follow. ************ Instructor: Anthony Tanbakuchi Points earned: / 10 points Basic Statistics: Quick Reference & R Commands by Anthony Tanbakuchi. Version 1.8 http://www.tanbakuchi.com ANTHONY@TANBAKUCHI·COM Get R at: http://www.r-project.org R commands: bold typewriter text 1 Misc R To make a vector / store data: x=c(x1, x2, ...) Get help on function: ?functionName Get column of data from table: tableName$columnName List all variables: ls() Delete all variables: rm(list=ls()) √ x = sqrt(x) (1) xn = x∧n (2) n = length(x) (3) T = table(x) (4) 2 Descriptive Statistics 2.1 NUMERICAL Let x=c(x1, x2, x3, ...) total = n ∑ i=1 xi = sum(x) (5) min = min(x) (6) max = max(x) (7) six number summary : summary(x) (8) µ = ∑ xi N = mean(x) (9) x̄ = ∑ xi n = mean(x) (10) x̃ = P50 = median(x) (11) σ = √ ∑(xi−µ)2 N (12) s = √ ∑(xi− x̄)2 n−1 = sd(x) (13) CV = σ µ = s x̄ (14) 2.2 RELATIVE STANDING z = x−µ σ = x− x̄ s (15) Percentiles: Pk = xi, (sorted x) k = i−0.5 n ·100% (16) To find xi given Pk , i is: 1. L = (k/100%)n 2. if L is an integer: i = L+0.5; otherwise i=L and round up. 2.3 VISUAL All plots have optional arguments: • main="" sets title • xlab="", ylab="" sets x/y-axis label • type="p" for point plot • type="l" for line plot • type="b" for both points and lines Ex: plot(x, y, type="b", main="My Plot") Plot Types: hist(x) histogram stem(x) stem & leaf boxplot(x) box plot plot(T) bar plot, T=table(x) plot(x,y) scatter plot, x, y are ordered vectors plot(t,y) time series plot, t, y are ordered vectors curve(expr, xmin,xmax) plot expr involving x 2.4 ASSESSING NORMALITY Q-Q plot: qqnorm(x); qqline(x) 3 Probability Number of successes x with n possible outcomes. (Don’t double count!) P(A) = xA n (17) P(Ā) = 1−P(A) (18) P(A or B) = P(A)+P(B)−P(A and B) (19) P(A or B) = P(A)+P(B) if A,B mut. excl. (20) P(A and B) = P(A) ·P(B|A) (21) P(A and B) = P(A) ·P(B) if A,B independent (22) n! = n(n−1) · · ·1 = factorial(n) (23) nPk = n! (n− k)! Perm. no elem. alike (24) = n! n1!n2! · · ·nk! Perm. n1 alike, . . . (25) nCk = n! (n− k)!k! = choose(n,k) (26) 4 Discrete Random Variables P(xi) : probability distribution (27) E = µ = ∑xi ·P(xi) (28) σ = √ ∑(xi−µ)2 ·P(xi) (29) 4.1 BINOMIAL DISTRIBUTION µ = n · p (30) σ = √ n · p ·q (31) P(x) = nCx pxq(n−x) = dbinom(x, n, p) (32) 4.2 POISSON DISTRIBUTION P(x) = µx · e−µ x! = dpois(x, µ) (33) 5 Continuous random variables CDF F(x) gives area to the left of x, F−1(p) expects p is area to the left. f (x) : probability density (34) E = µ = Z ∞ −∞ x · f (x)dx (35) σ = √Z ∞ −∞ (x−µ)2 · f (x)dx (36) F(x) : cumulative prob. density (CDF) (37) F−1(x) : inv. cumulative prob. density (38) F(x) = Z x −∞ f (x′)dx′ (39) p = P(x < x′) = F(x′) (40) x′ = F−1(p) (41) p = P(x > a) = 1−F(a) (42) p = P(a < x < b) = F(b)−F(a) (43) 5.1 UNIFORM DISTRIBUTION p = P(u < u′) = F(u′) = punif(u’, min=0, max=1) (44) u′ = F−1(p) = qunif(p, min=0, max=1) (45) 5.2 NORMAL DISTRIBUTION f (x) = 1√ 2πσ2 · e− 1 2 (x−µ)2 σ2 (46) p = P(z < z′) = F(z′) = pnorm(z’) (47) z′ = F−1(p) = qnorm(p) (48) p = P(x < x′) = F(x′) = pnorm(x’, mean=µ, sd=σ) (49) x′ = F−1(p) = qnorm(p, mean=µ, sd=σ) (50) 5.3 t-DISTRIBUTION p = P(t < t ′) = F(t ′) = pt(t’, df) (51) t ′ = F−1(p) = qt(p, df) (52) 5.4 χ2-DISTRIBUTION p = P(χ2 < χ2 ′) = F(χ2 ′) = pchisq(X2’, df) (53) χ2 ′ = F−1(p) = qchisq(p, df) (54) 5.5 F -DISTRIBUTION p = P(F < F ′) = F(F ′) = pf(F’, df1, df2) (55) F ′ = F−1(p) = qf(p, df1, df2) (56) 6 Sampling distributions µx̄ = µ σx̄ = σ√ n (57) µ p̂ = p σp̂ = √ pq n (58) 7 Estimation 7.1 CONFIDENCE INTERVALS proportion: p̂±E, E = zα/2 ·σ p̂ (59) mean (σ known): x̄±E, E = zα/2 ·σx̄ (60) mean (σ unknown, use s): x̄±E, E = tα/2 ·σx̄, (61) d f = n−1 variance: (n−1)s2 χ2R < σ2 < (n−1)s2 χ2L , (62) d f = n−1 2 proportions: ∆ p̂± zα/2 · √ p̂q̂ n1 + p̂q̂ n2 (63) 2 means (indep): ∆x̄± tα/2 · √ s21 n1 + s22 n2 , (64) d f ≈min(n1−1, n2−1) matched pairs: d̄± tα/2 · sd√ n , di = xi− yi, (65) d f = n−1 7.2 CI CRITICAL VALUES (TWO SIDED) zα/2 = F −1 z (1−α/2) = qnorm(1-alpha/2) (66) tα/2 = F −1 t (1−α/2) = qt(1-alpha/2, df) (67) χ2L = F −1 χ2 (α/2) = qchisq(alpha/2, df) (68) χ2R = F −1 χ2 (1−α/2) = qchisq(1-alpha/2, df) (69) 7.3 REQUIRED SAMPLE SIZE proportion: n = p̂q̂ ( zα/2 E )2 , (70) ( p̂ = q̂ = 0.5 if unknown) mean: n = ( zα/2 · σ̂ E )2 (71) 8 Hypothesis Tests Test statistic and R function (when available) are listed for each. Optional arguments for hypothesis tests: alternative="two.sided" can be: "two.sided", "less", "greater" conf.level=0.95 constructs a 95% confidence interval. Standard CI only when alternative="two.sided". Optional arguments for power calculations & Type II error: alternative="two.sided" can be: "two.sided" or "one.sided" sig.level=0.05 sets the significance level α. 8.1 1-SAMPLE PROPORTION H0 : p = p0 prop.test(x, n, p=p0, alternative="two.sided") z = p̂− p0√ p0q0/n (72) 8.2 1-SAMPLE MEAN (σ KNOWN) H0 : µ = µ0 z = x̄−µ0 σ/ √ n (73) 8.3 1-SAMPLE MEAN (σ UNKNOWN) H0 : µ = µ0 t.test(x, mu=µ0, alternative="two.sided") Where x is a vector of sample data. t = x̄−µ0 s/ √ n , d f = n−1 (74) Required Sample size: power.t.test(delta=h, sd =σ, sig.level=α, power=1 − β, type ="one.sample", alternative="two.sided") 8.4 2-SAMPLE PROPORTION TEST H0 : p1 = p2 or equivalently H0 : ∆p = 0 prop.test(x, n, alternative="two.sided") where: x=c(x1, x2) and n=c(n1, n2) z = ∆ p̂−∆p0√ p̄q̄ n1 + p̄q̄n2 , ∆p̂ = p̂1− p̂2 (75) p̄ = x1 + x2 n1 +n2 , q̄ = 1− p̄ (76) Required Sample size: power.prop.test(p1=p1, p2=p2, power=1−β, sig.level=α, alternative="two.sided") 8.5 2-SAMPLE MEAN TEST H0 : µ1 = µ2 or equivalently H0 : ∆µ = 0 t.test(x1, x2, alternative="two.sided") where: x1 and x2 are vectors of sample 1 and sample 2 data. t = ∆x̄−∆µ0√ s21 n1 + s 2 2 n2 d f ≈min(n1−1, n2−1), ∆x̄ = x̄1− x̄2 (77) Required Sample size: power.t.test(delta=h, sd =σ, sig.level=α, power=1 − β, type ="two.sample", alternative="two.sided") 8.6 2-SAMPLE MATCHED PAIRS TEST H0 : µd = 0 t.test(x, y, paired=TRUE, alternative="two.sided") where: x and y are ordered vectors of sample 1 and sample 2 data. t = d̄−µd0 sd/ √ n , di = xi− yi, d f = n−1 (78) Required Sample size: power.t.test(delta=h, sd =σ, sig.level=α, power=1 − β, type ="paired", alternative="two.sided") 8.7 TEST OF HOMOGENEITY, TEST OF INDEPENDENCE H0 : p1 = p2 = · · ·= pn (homogeneity) H0 : X and Y are independent (independence) chisq.test(D) Enter table: D=data.frame(c1, c2, ...), where c1, c2, ... are column data vectors. Or generate table: D=table(x1, x2), where x1, x2 are ordered vectors of raw categorical data. χ2 = ∑ (Oi−Ei) 2 Ei , d f = (num rows - 1)(num cols - 1) (79) Ei = (row total)(column total) (grand total) = npi (80) For 2×2 contingency tables, you can use the Fisher Exact Test: fisher.test(D, alternative="greater") (must specify alternative as greater) 9 Linear Regression 9.1 LINEAR CORRELATION H0 : ρ = 0 cor.test(x, y) where: x and y are ordered vectors. r = ∑ (xi− x̄)(yi− ȳ) (n−1)sxsy , t = r−0√ 1−r2 n−2 , d f = n−2 (81) 9.2 MODELS IN R MODEL TYPE EQUATION R MODEL linear 1 indep var y = b0 +b1x1 y∼x1 . . . 0 intercept y = 0+b1x1 y∼0+x1 linear 2 indep vars y = b0 +b1x1 +b2x2 y∼x1+x2 . . . inteaction y = b0 +b1x1 +b2x2 +b12x1x2 y∼x1+x2+x1*x2 polynomial y = b0 +b1x1 +b2x22 y∼x1+I(x2∧2) 9.3 REGRESSION Simple linear regression steps: 1. Make sure there is a significant linear correlation. 2. results=lm(y∼x) Linear regression of y on x vectors 3. results View the results 4. plot(x, y); abline(results) Plot regression line on data 5. plot(x, results$residuals) Plot residuals y = b0 +b1x1 (82) b1 = ∑(xi− x̄)(yi− ȳ) ∑(xi− x̄)2 (83) b0 = ȳ−b1x̄ (84) 9.4 PREDICTION INTERVALS To predict y when x = 5 and show the 95% prediction interval with regression model in results: predict(results, newdata=data.frame(x=5), int="pred") 10 ANOVA 10.1 ONE WAY ANOVA 1. results=aov(depVarColName∼indepVarColName, data=tableName) Run ANOVA with data in TableName, factor data in indepVarColName column, and response data in depVarColName column. 2. summary(results) Summarize results 3. boxplot(depVarColName∼indepVarColName, data=tableName) Boxplot of levels for factor To find required sample size and power see power.anova.test(...) 11 Loading and using external data and tables 11.1 LOADING EXCEL DATA 1. Export your table as a CSV file (comma seperated file) from Excel. 2. Import your table into MyTable in R using: MyTable=read.csv(file.choose()) 11.2 LOADING AN .RDATA FILE You can either double click on the .RData file or use the menu: • Windows: File→Load Workspace. . . • Mac: Workspace→Load Workspace File. . . 11.3 USING TABLES OF DATA 1. To see all the available variables type: ls() 2. To see what’s inside a variable, type its name. 3. If the variable tableName is a table, you can also type names(tableName) to see the column names or type head(tableName) to see the first few rows of data. 4. To access a column of data type tableName$columnName An example demonstrating how to get the women’s height data and find the mean: > ls() # See what variables are defined [1] "women" "x" > head(women) #Look at the first few entries height weight 1 58 115 2 59 117 3 60 120 > names(women) # Just get the column names [1] "height" "weight" > women$height # Display the height data [1] 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 > mean(women$height) # Find the mean of the heights [1] 65
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved