Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Final Exam Study Guide with Answers - Introductory Statistics | STAT 2000, Study notes of Statistics

Final Exam Study Guide with Answers Material Type: Notes; Professor: Morse; Class: Introductory Statistics; Subject: Statistics; University: University of Georgia; Term: Spring 2014;

Typology: Study notes

2013/2014
On special offer
30 Points
Discount

Limited-time offer


Uploaded on 05/01/2014

ssarabethh
ssarabethh 🇺🇸

3 documents

1 / 85

Toggle sidebar
Discount

On special offer

Related documents


Partial preview of the text

Download Final Exam Study Guide with Answers - Introductory Statistics | STAT 2000 and more Study notes Statistics in PDF only on Docsity! Important Terms • Population - Total set of subjects in which we are interested • Sample - A subset of the population for which we have data • Subject - Entities we measure (individuals) Histogram Interpretation (HW 2.2) - How many total students sampled? 60 + 80 + 60 + 40 = 240 - Which class has highest / lowest frequency? What are those frequencies? Highest: “100-109" with 80 Lowest: “120-129" with 40 - How many students have an IQ between 100 and 119? 80 + 60 = 140 Sampling Methods • Stratified Sampling - Taking some subjects from all possible groups. • Cluster Sampling - Taking all subjects from some possible groups. Skewness mean = median mean < median mean > median mean < median The < looks like an L (as in Left Skewed) mean > median The > looks like part of an R (as in Right Skewed) Outliers • The mean is sensitive to outliers • The median is resistant to outliers • When outliers are present it is best to use the median as the measure of center • Examples: - Earthquake magnitudes on the Richter Scale (skewed right since some, but very few, big earthquakes) - Ages of MENSA members at the time they joined (skewed left since most were adults, but a few children had high enough IQs) Box-Plot (HW 2.5-2.6) • What proportion of states have taxes ... - Greater than 31 cents? .75 - Greater than $1.05 (105 cents)? .25 - Between what two vales are the middle 50% of the data found? (31,105) - What is the range? Range = maximum - minimum = 206 - 2.6 = 203.4 Box Plot Outlier (HW 2.5-2.6) • Any point lying above Q3 + 1.5 × IQR is an outlier. • Any point lying below Q1 - 1.5 × IQR is also an outlier. • Are there any outliers on this box-plot? IQR = Q3 - Q1 = 1105 - 256 = 849 Q1 - 1.5*IQR = 256 - 1.5*849 = -1017.5; we have no lower outliers. Q3 + 1.5*IQR = 1105 + 1.5*849 = 2378.5; we have an upper outlier. Mean & Median (HW 2.3-2.4) • This chart shows the number of grams of protein in various brands of loafs of bread. Compute the mean and median of the data set. What can you say about the shape of the distribution? Protein (g) Count 0 15 1 16 2 21 3 4 Total : 56 mean = (0∗15)+(1∗16)+(2∗21)+(3∗4) 56 = 1.25 For the median, find half the total count (about 28), so we need to find where bread # 28 is. It’s not in Row 0 since we have the first 15 only After Row 1, we have 15 + 16 = 31 loafs Median = 1 since bread # 28 falls in Row 1 Mean > median→ somewhat skewed right Example (HW 2.3-2.4) • The weight of a house cat is bell-shaped with mean 14 pounds and standard deviation 2.5. - Find an interval within which about 95% of house cat weights will fall. By the Empirical Rule, we go out 2 deviations from the mean.3 (14− 2× 2.5,14 + 2× 2.5) = (9,19) - What z-score represents a house cat that is 2.8 standard deviations to the right of the mean? What weight is that? z = 2.8 z = x−x̄s ⇒ 2.8 = x−14 2.5 ⇒ x = (2.8× 2.5) + 14 = 21 Relative Risk (HW 3.1) relative risk = conditional proportion for first group conditional proportion for second group * the first group is the larger of the two proportions • Relative risk tells us how many times more likely the outcome is for one group than the other group. • The following three facts therefore follow: 1. Relative risk ≥ 1. 2. When the numerator and denominator proportions are very similar, relative risk will be very close to 1. 3. However, when the numerator is quite a bit larger, then relative risk will be quite a bit greater than 1. Relative Risk (HW 3.1) HIV+ HIV- Total age ≥ 30 39 816 855 age < 30 18 623 641 Total 57 1439 1496 - Find the proportion of people who are at least 30 that are HIV+. 39/855 = 0.0456 - Find the proportion of people under 30 who are HIV+. 18/641 = 0.0281 - Find the relative risk of being HIV+ for both groups. Look at the HIV+ proportions Larger / Smaller = 0.0456 / 0.0281 = 1.623 - People who are at least 30 are 1.623 times more likely to be HIV+ than people who are under 30. Scatter Plots Figure: * Strong Positive Correlation Weak Negative Correlation Least-Squares Regression ŷ = a + bx • x = given data point • ŷ = predicted response • a = intercept - Predicted response when x = 0 - May not always have a practical interpretation! • b = slope - Slope is how much the predicted response increases (or decreases) for every unit increase in x • residual = observed - predicted = y − ŷ Regression (HW 3.2-3.4) • Analysis says that we can use the length of an alligator (in feet) to predict its weight (in pounds). The equation is given by ŷ = 10 + 40x - Find the expected weight of an alligator that’s 10 feet long. ŷ = 10 + 40(10) = 410 pounds - Suppose an alligator that’s 10 feet long actually weighs 402 pounds. Calculate the residual. Observed - Predicted = 402 - 410 = -8 (so we overestimated) - Interpret the slope. For every additional foot in length, an alligator’s weight is expected to increase by 40 pounds. - Interpret the intercept. Literally: an alligator with length 0 will weigh 10 pounds, which makes no sense! So, the intercept has no interpretation here. Discrete Probability Distributions Two requirements: 1. Each individual p(x) is between 0 and 1, inclusive 2. All probabilities sum to 1 The mean of a discrete distribution: MEAN = ∑ x ∗ p(x) Also called the average or expected value Mean of a Distribution (HW 6.1-6.2) • Here’s a table for the probability of the number of annual hurricanes in Miami. Category Probability 1 0.15 2 0.32 3 0.36 4 0.12 5 0.05 • Find the missing value and the mean/expected value of this data set. The mean/expected value is (1 ∗ 0.15) + (2 ∗ 0.32) + (3 ∗ 0.36) + (4 ∗ 0.12) + (5 ∗ 0.05) = 2.6. Normal Distribution (Continuous) Three rules: 1. Total probability/area under the normal curve is 1 2. Normal curve is symmetric 3. X value / z-score goes in left box; probability goes in right box on StatCrunch Percentiles • The P th percentile is the x that gives P % below on the normal - Example: here the x is the 30th percentile because 30% of the data falls below x 3 Types of Distributions 1. Population - Distribution of all points in the population 2. Sample Data / Data - Distribution of one specific sample 3. Sampling Distribution of the Sample Means - Distribution of the sample means of a given size n - If you repeatedly draw samples, take their sample averages, then all the sample averages is the sampling distribution Means Problems Mean Standard Deviation Population µ σ Sample Data x̄ s Sampling Distribution µ σ/ √ n Mean & Standard Error Properties • As the sample size n increases ... - The mean of the sampling distribution does not change. - The standard error (s.d. of the sampling distribution) decreases. - Example: σ√ 4 > σ√ 9 (larger denominator, smaller overall fraction). • Similarly, as the sample size decreases ... - The mean of the sampling distribution does not change. - The standard error increases (the opposite). Distributions (HW 7.1-7.2) • The lifetime of a certain brand of plasma television (time until it quits working) has a distribution that is skewed right with a mean of 137 months and a s.d. of 26. A sample of 98 televisions is selected, and the sample has a mean of 129.8 and an s.d. of 15.5. - What is the center and spread of the population? center is 137 and spread is 26 - What is the shape of the population? Skewed Right - What is the center and spread of the sample data? center is 129.8 and spread is 15.5 - What shape is the sample data? Skewed Right Distributions (HW 7.1-7.2) • The television data.. - What is the center and spread of the distribution of the sample mean with a sample size of 98? center is 137 and spread is 26/ √ 98 = 2.626 - What is the shape of the distribution of the sample mean? Normal since sample size is greater than 30 (Central Limit Theorem) Proportions Problems Mean Standard Error Population p √ p Sample Data p̂ √ p̂ Sampling Distribution p √ p(1−p) n The sampling distribution of the sample proportion p̂ is normal when np ≥ 15 and n(1− p) ≥ 15. Proportions (HW 7.1-7.3) • 57% of students at UGA are female. In a random sample of 58 students, 31 of them are female. - What is the mean and standard error of the distribution of the sample proportion? Mean is .57 and the standard error is√ p(1−p) n = √ .57(1−.57) 58 = .06501 - Is the sampling distribution approximately normal? Yes because np = 58 ∗ .57 = 33.06 and n(1− p) = 58 ∗ (1− .57) = 24.94 which are both greater than 15. Proportions (HW 7.1-7.3) • Now given that the mean is .57 and the standard error is .06501, find the probability that we would randomly select a sample of size 58 with a sample proportion smaller than what we got (31/58 = .53448). Notation - What symbol is used to denote the population standard deviation? σ - population standard deviation - What symbol is used to estimate the population proportion? p̂ - sample proportion - What symbol is used to describe the center in one sample? x̄ - sample mean Confidence Intervals • Calculate the sample mean/proportion from data • Point Estimate = sample mean/proportion • Calculate the width based on level of confidence and standard error • You crete a range of plausible values for the true population mean/proportion Confidence Intervals for Proportions ( point estimate ) ± ( confidence level ) × ( standard error ) p̂ ± z × √ p̂(1− p̂) n︸ ︷︷ ︸ margin of error = width of C.I. p̂ = point estimate z depends on confidence level√ p̂(1− p̂) n = standard error z × √ p̂(1− p̂) n = margin of error Computing A Confidence Interval • First get the point estimator (p̂ or x̄) • Get the margin of error - Is it given to you? - z*(standard error) or t*(standard error) - Also (upper limit - point estimate) - this can only be used if the interval is already provided • Having obtained the necessary numbers, compute (point estimate ± margin of error) • You are 95% confident the population mean/proportion is in the interval Determining z • z = level of confidence - 95% C.I. : z = 1.96 (memorize) - For others: use at least 5 decimals • To get these numbers... - 95%, 5% is left over - Half of that is 2.5% - P(z >=?) = .025 in StatCrunch • What about 85%? 85%, 15% is left over Half of that is 7.5% So .075 in right box Determining t • Only new feature: degrees of freedom • However you do not need to specify a mean and s.d. • DF = n - 1 • Same strategy as before: with 95% - 5% is left over, and half of that is 2.5% - P(X ≥ 18?) = .025 with 18 observations, so DF =17 Another Way to Interpret a C.I. • A 95% C.I. also means that about 95% of all C.I.s constructed contain the true population mean/proportion, and about 5% do not • A 99% C.I. means that about 99% of all C.I.s constructed contain the true population mean/proportion, and about 1% do not • Example: 1000 intervals - At 95%, about 950 (give or take) contain the true proportion - At 99%, about 990 (give or take) contain the true proportion Proportions C.I.(HW 8.1-8.2) • A random sample of 970 people were asked if they owned a rat. 19 said yes. - Find a point estimate for the proportion of people who said yes. p̂ = 19970 = .01959 - If the margin of error is .00872, find the 95% confidence interal. .01959± .00872 = (.01087, .02831) - Will we get a valid confidence interval? Yes because: np̂ = 970× .01959 = 19.0023 ≥ 15 n(1− p̂) = 200× (1− .01959) = 950.997 ≥ 15 Proportions C.I.(HW 8.1-8.2) • Now suppose another, different sample for owning a rat gives a 95% confidence interval of (.03, .09). - If possible, find the population proportion and the new sample proportion. We can not compute the population proportion: it’s unknown The sample proportion’s in the center: .03+.092 = .06 = p̂ - What is the new margin of error? The distance between the center and an endpoint: .12− .06 = .06 - Can we conclude that fewer than 12% of people own a rat? Yes because .12 lies above the interval - Can we conclude fewer than 2% of people own a rat? No because the lowest value in the interval is .03 C.I. Properties • Increasing the sample size shortens the C.I. • Decreasing the sample size widens the C.I. • This is because standard error decreases as n increases, so the margin of error (width) decreases as well. p̂ ± z ∗ √ p̂(1−p̂) n or x̄ ± t s√ n • Intuition: a larger sample size gives a more accurate estimate and allows you to zero in on the true proportion. Summary of C.I. Width Factors Confidence Level (z) • As z or t increases, C.I. widens • As z or t decreases, C.I. shortens Sample Size (n) • As n increases, C.I. shortens • As n decreases, C.I. widens • Assumptions for proportion C.I. I Sample is randomly selected I For proportions : np̂ ≥ 15 and n(1 − p̂) ≥ 15 For means : Population is normal or n > 30 Proportions Summary Assumptions for a Valid Confidence Interval • Random Sample • Need: - np̂ ≥ 15 and - n(1 − p̂) ≥ 15 Finding Sample Size n = p̂(1− p̂)z2 m2 Confidence Interval • Point Estimate = p̂ • Standard Error = √ p̂(1−p̂) n • Level of Confidence = use z • Margin of Error = z × √ p̂(1−p̂) n • Lower Limit = p̂− z × √ p̂(1−p̂) n • Upper Limit = p̂ + z × √ p̂(1−p̂) n Example Hypotheses H0 : p = .31 H0 : p = .56 H0 : µ = 11 HA : p < .31 HA : p > .56 HA : µ 6= 11 left-tailed right-tailed two-tailed Hypothesis Testing Steps Proportions H0 : p = p0 HA : p >,<, 6= p0 z = p̂−p0se se = √ p0(1−p0) n p-value conclusions Means H0 : µ = µ0 HA : µ >,<, 6= µ0 t = x̄−µ0se se = s√n p-value conclusions Conclusions If p-value ≤ α (alpha) - Reject H0 - Test is significant - There is enough evidence to suggest a change, increase, etc. - Strong evidence against the null - Possible Type I Error If p-value > α - Fail to reject H0 (but never accept it!) - Test is insignificant - There is insufficient evidence to suggest a change, increase, etc. - No strong evidence against the null - Possible Type II Error Designed Experimental Study • Manipulates the subjects somehow • Can be used to prove causation • Subjects randomly divided into groups • Examples: - Does a coupon attached to a catalogue make recipients more likely to order? - Does a new medicine reduce the frequency of headaches? Observational Study • Measures qualities of subjects without manipulating them • Cannot be used to prove causation - only that the variables are related. • Cannot be randomly assigned to groups • Examples: - Whether or not smoking has an effect on heart disease (can’t assign groups) - Are higher SAT scores positively correlated with higher college GPAs? Designed Experiments • Experimental Unit (subject) - The person/object that receives the treatment • Treatment - A condition/drug/etc. applied to the subject • Response Variable - The categorical/quantitative variable of interest. We believe it’s affected by the explanatory variable • Explanatory Variable - Variable we believe to influence the response Blinding • Double Blind - Neither the subjects nor the researcher knows who is getting which treatment (this is preferred as bias is lowest here) - The key is only revealed afterwards • Single Blind - Subjects don’t know, but the researcher does - We have possible “researcher bias" here Designed Experiment (HW 4.1-4.4) • We are testing the effects of a new energy drink on heart rate. 50 subjects are randomly assigned to consume the energy drink while a different 50 drink a similar tasting drink that is not an energy drink. The subjects’ heart rates are recorded and the researchers know which drink each subjects get. - Response: Heart rate - Explanatory: Type of drink - Treatments: Energy and generic drinks - Experimental Units: The 100 subjects - Is this completely randomized or matched pairs? Completely randomized since groups randomly assigned - Is this single or double blind? Single since researchers know assignment Hypothesis Testing: 2 Dependent Samples • 2 populations and dependent samples • Ex: Twins used for two groups, or same person in both groups→ dependent • Use a matched pairs (just means) • Matched pairs is for dependent samples H0 : µd = 0 HA : µd 6= 0 or µd < 0 or µd > 0 t = µd−0sd/ √ n Comparing Two Independent Means • It’s incorrect to choose “Paired" here (since that’s for dependent means) • STAT > T-Statistics > 2 Sample H0 : µ1 = µ2 or µ1 − µ2 = 0 HA : µ1 > µ2 or µ1 − µ2 > 0 HA : µ1 < µ2 or µ1 − µ2 < 0 HA : µ1 6= µ2 or µ1 − µ2 6= 0 Two Proportions (HW 10.1-10.4) • We want to determine if the proportion of fire stations owning a Dalmatian is significantly different in the southeast than in the northeast. In the southeast, 141 out of 200 fire stations owned a Dalmatian, compared with 190 out of 260 in the northeast. Set up the hypotheses. H0 : p1 = p2 or p1 − p2 = 0 HA : p1 6= p2 or p1 − p2 6= 0 Two Proportions (HW 10.1-10.4) • Here is the output. We (do / do not) have strong evidence that there are different results. We (don’t reject / reject) the null at α = .05. The test (is / is not) significant. The 95% confidence interval for the difference in proportions (will / will not) contain 0. If we make the wrong conclusion, it would be a (Type I / Type II) error.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved