Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding AP Stats Exam Results: Regression, Confidence Intervals & Probabilities, Slides of Statistics

Solutions and explanations for six questions from the 2017 AP Statistics Exam. Topics covered include regression analysis, constructing and interpreting confidence intervals, calculating normal and conditional probabilities, and making decisions based on data. Students are encouraged to read carefully, use graphs for clarification, and complete calculations.

Typology: Slides

2021/2022

Uploaded on 08/05/2022

jacqueline_nel
jacqueline_nel 🇧🇪

4.4

(229)

506 documents

1 / 81

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding AP Stats Exam Results: Regression, Confidence Intervals & Probabilities and more Slides Statistics in PDF only on Docsity! Results from the 2017 AP Statistics Exam Jessica Utts, University of California, Irvine Chief Reader, AP Statistics jutts@uci.edu July 28, 2017 • State question • Present solution • Describe common student errors • Suggest teaching tips • Report average score (all at the end) Plan for each of the six questions a) For the situation described above, explain what is meant by each of the following words. i. Positive: ii. Linear: iii.Strong: Question 1 – Part (a) Solution Part (a): In the context of a scatter plot of y = weight and x = length: A positive relationship means that wolves with higher values of length also tend to have higher weights. A linear relationship means that when length increases by one meter, weight tends to change by a constant amount, on average. A strong relationship means that the data points fall close to a line (or curve). • Defining a positive relationship by simply saying that there is a positive correlation. • Using “correlation” to define a linear relationship. A correlation coefficient is a measure of the strength of a linear relationship, but it does not by itself explain the meaning of a linear relationship. It is more appropriate to use correlation to discuss a strong relationship. • Indicating that a relationship is strong when points in the scatterplot are close together or not too scattered. The response should indicate that a relationship is strong when the points in the scatterplot are close to a line, or more generally, a curve. 7 Common Student Errors, Q1(a) • Implying that the slope of a least squares regression line corresponds to an exact relationship between changes in observed values of y as x changes. • Failure to link the increase in the predicted response to an increase of a specific size in the explanatory variable. For instance, an unacceptable response is “For any change in length, the predicted weight increases by 35.02 kg.” 10 Common Student Errors, Q1(b) [The data collected from the wolves were used to create the least-squares equation �𝑦𝑦 = -16.46 + 35.02x] c) One wolf in the pack with a length of 1.4 meters had a residual of -9.67 kilograms. What was the weight of the wolf? Question 1 – Part (c) Solution Part (c): In general, residual = actual weight – predicted weight, or equivalently, actual weight = predicted weight + residual. In this situation: Predicted weight = –16. 46 + 35.02(1.4) = 32.568 kg Actual weight = predicted weight + residual So y = 32.568 – 9.67 = 22.9 kilograms. Q1 Teaching tips, continued • Things to reinforce • Don’t use deterministic wording in slope interpretation (include “predicted” or “estimated”) • Slope is for a 1 unit change in x, not just a “change” • “Strength” is a measure of how closely the data follow a pattern (line or curve), not how close the points are to each other • Small residuals can be an indication of the strength of the model • Use “backwards” questions in class • Write the regression equation associated with the following interpretation… • Explain what is meant when you have high correlation… Question #2: Asking for a water cup but filling it with soda Construct and interpret a confidence interval for a proportion; use it to get a confidence interval for the cost to the restaurant Question 2 Part (a): The manager of a local fast-food restaurant is concerned about customers who ask for a water cup when placing an order but fill the cup with a soft drink from the beverage fountain instead of filling the cup with water. The manager selected a random sample of 80 customers who asked for a water cup when placing an order and found that 23 of those customers filled the cup with a soft drink from the beverage fountain. (a) Construct and interpret a 95 percent confidence interval for the proportion of all customers who, having asked for a water cup when placing an order, will fill the cup with a soft drink from the beverage fountain. Solution Part (a) (Step 2 of 3): Step 2: Correct mechanics The sample proportion is �𝑝𝑝 = 23 80 = 0.2875. The confidence interval is: = 0.2875 ± 1.96(0.0506) = 0.2875 ± 0.0992 = 0.1883 to 0.3867 0.2875 ± 1.96 0.2875 (1−0.2875) 80 Solution Part (a) (Step 3 of 3): Step 3: Interpretation We can be 95% confident that in the population of all customers at this fast-food restaurant who ask for a water cup, the proportion that will fill it with a soft drink is between 0.1883 and 0.3867. Common Student Errors, Q2(a) • Identification of procedure missing or incorrect • Not checking conditions • Omitting the large sample condition or verifying only one of the two inequalities • Mislabeling conditions, e.g. “Independence” for the sample size condition • Vague reference to a normal distribution • Stating that the sample or population has a normal distribution (for a categorical variable) • Inappropriate large sample condition: n ≥ 30 • Using incorrect critical value (wrong z* or a t*) or using t and showing df • Using an incorrect formula for the standard error of a sample proportion Solution Part (b): Using the confidence interval in Part (a), a 95% interval estimate for the number of customers in June who ask for a water cup but then filled it with a soft drink is 3000 × 0.1883 to 3000 × 0.3867, or 565 to 1160. At a cost of $0.25 per customer, a 95% interval estimate for the cost to the restaurant in June is $141.25 to $290.00. Common Student Errors, Q2(b) • Calculating a single value (point estimate) in part (b) rather than an interval • Not using the interval from part (a) as directed • Not showing work in part (b) Q2 Teaching tips • In inference questions, ask students to identify the population and parameter of interest. Encourage students to use the language in the stem of the question when defining the parameter. • Discuss why each condition is being checked for an inference procedure as well as how to check the condition. Use applets and hands-on activities to demonstrate what happens when each condition isn’t met. • Insist on proper notation throughout the course—refer students to the formula sheet Question 3 A grocery store purchases melons from two distributors, J and K. Distributor J provides melons from organic farms. The distribution of the diameters of the melons from Distributor J is approximately normal with mean 133 millimeters (mm) and standard deviation 5 mm. (a) For a melon selected at random from Distributor J, what is the probability that the melon will have a diameter greater than 137 mm? Solution – Part (a) Let X denote the diameter of a randomly selected melon from Distributor J. X has an approximately normal distribution with mean 133 mm and standard deviation 5 mm. The z-score for a diameter of 137 mm is Therefore -= = =137 133 4 0.8.5 5z ( ) ( )137 0.8 1 0.7881 0.2119P X P Z> = > = − = • Thinking that a normal probability problem can be solved using the t-distribution • Thinking that a normal probability problem always involves using x-bar. • Thinking that normalcdf(137, 1000000, 133, 5) “shows work;” i.e. identifies parameters and boundary conditions. • Not knowing the difference between notation for parameters and statistics • Thinking that you need to adjust 137 as if the normal random variable is discrete and using 137.5 or 138 in place of 137 • Trying to use the Empirical Rule and interpolate probabilities for z values between two whole numbers (0, 1, 2, 3) 32 Common Student Errors, Q3(a) Solution to Part (b) using a tree diagram ( ) ( ) ( ) and and P G P G J P G K= + = 0.1483 +0.2524 = 0.4007 • Not being able to properly create a tree diagram, such as putting joint instead of conditional probabilities on the 2nd set of branches. • Not recognizing that two events are mutually exclusive, particularly the ends of the branches of a tree diagram. • Not being able to find the appropriate probabilities from a tree diagram. For example, thinking the conditional branch on the tree is actually the intersection probability. • Generally not knowing how much work is needed to justify probability calculations. It should be clear where each number in the response came from. 36 Common Student Errors, Q3(b) Question 3, Part (c) (c) Given that a melon selected at random from the grocery store has a diameter greater than 137 mm, what is the probability that the melon will be from Distributor J? Q3 Teaching tips • In solving problems, model good behavior by showing all work/steps in a probability problem (and inference) all the time. • Emphasize that the Empirical Rule gives only approximations of normal values and only for 1, 2, and 3 standard deviations from the mean; any sort of interpolation will give incorrect answers. • When introducing Student’s t distribution, emphasize that the only time the z distribution and t distribution are the same is when the t distribution is based on an infinite number of degrees of freedom. • In general, it is not a good idea to use “calculator speak” in answering any question. Q3 Teaching tips • Show the students problems where there are multiple parts and the answers for the later problems depend on the results from the earlier parts. • If any continuous approximation of a discrete random variable is taught, explain as much as possible the reason for any adjustment or continuity correction, and that they apply to discrete variables only. • Give students lots of practice with creating and using tree diagrams to solve word problems. Try “working backwards” by giving a tree diagram and asking them to answer probability problems using it. • Give lots of examples distinguishing conditional and joint probabilities. Have students practice setting up probability statements for word problems, even if they don’t solve them. Question #4: Chemical analysis of clay in pottery to assess origin Compare boxplots; answer contextual questions based on information in boxplots Question 4, Part (a) (a) For chemical Z, describe how the percents found in the pieces of pottery are similar and how they differ among the three sites. SOLUTION to Part (a) The median value for the percent of chemical Z in the pottery pieces is similar for all three sites, at about 7%. The ranges for the percent of chemical Z are much different for the three sites, with the smallest range being about 2% (from 6% to 8%) at Site II, a much higher range of about 6% (from about 4% to 10%) at Site I and the largest range of about 8% (from about 3% to 11%) at Site III. • Some students stated that symmetric boxplots indicate that the distribution is normal. • Some students described many attributes of the boxplots for Chemical Z at each site but never clearly stated what is similar and what is different. This is a “Laundry List.” • Some students referred to the interval from the minimum to the maximum as the range. • Some students compared Chemical Z to Chemicals X and Y within each site rather than comparing Chemical Z to itself across sites. Common Student Errors, Q4(a) Question 4, Part (b-i) Consider a piece of pottery known to have originated at one of the three sites, but the actual site is not known. (i) Suppose an analysis of the clay reveals that the sum of the percents of the three chemicals X, Y, and Z is 20.5%. Based on the boxplots, which site—I, II, or III—is the most likely site where the piece of pottery originated? Justify your choice. • Some students chose site III based on the sums of the medians instead of the sums of the minimums and maximums. An example of why that’s not enough – sum of medians is 20.5; Site III is least likely: • Some students correctly chose site III but did not state why sites I and II were not the best choices. Common Student Errors, Q4(b-i) Question 4, Part (b-ii) [Consider a piece of pottery known to have originated at one of the three sites, but the actual site is not known.] (ii) Suppose only one chemical could be analyzed in the piece of pottery. Which chemical - X, Y, or Z - would be the most useful in identifying the site where the piece of pottery originated? Justify your choice SOLUTION (See next slide for boxplot picture): Chemical Y would be most useful, because the distribution of the percentages of total weights at the three sites do not overlap. The distributions of Chemicals X and Z have substantial overlap. Site IIT Site IT Site I TSIOM [RIOL JO JUD0I0g JYSIOM [PIOL JO JUddIOg Na} st a oOo = = — = °° \e a a 2 t t t t t t t t i i i i i i i i i i I I i | | I i | | \ \ \ \ \ i I 1 pol I \ i | | \ + i i 1 Foy i | i I I i i i \ \ i \ | \ \ \ i I I \ H \ \ 4 i I I \ \ \ i \ \ \ \ \ \ \ i I \ \ \ } | \ i | \ \ \ 1! \ + i I I \ I \ \ i | \ \ \ \ \ i — Sl \ \ \ I I ipa Se ey] \ \ I \ \ ilo o ol]! \ \ \ \ \ i} 2 2 8] \ \ \ \ \ ln n wm]! | | | \ \ | i I i i \ \ \ i \ \ \ \ \ \ i \ \ \ \ i T | \ = a \ \ 4 i | I I \ \ i | | i i | i \ i I \ \ \ \ \ i \ I \ \ I Ly +] 4 i | | \ \ \ 1 i \ I I \ I I \ i I | I I \ \ i \ \ \ \ H \ \ 4 i | | I I \ I i I | I I 1 I \ i I | \ I I I I i I | \ \ I \ \ i \ \ \ \ \ \ \ i \ \ \ I \ \ \ i | | I \ \ \ \ i | \ \ \ I I \ | | | \ \ I I I i | I \ \ \ \ \ ee | i | \ \ +r i I I | T | \ eR i i \ \ \ \ | T ; it ry \ \ \ \ i | I I | | I I i | \ \ H H \ \ i i i I i i + i | | \ \ \ i I I I \ \ I \ i | | i i i I \ \o + nN So °° \o + NN o = = — = X Y Z Chemical Chemical Chemical Question #5: Gender and age at diagnosis of schizophrenia Chi-square test for independence The table and the bar chart below summarize the age at diagnosis, in years, for a random sample of 207 men and women currently being treated for schizophrenia. Question 5 56 Do the data provide convincing statistical evidence of an association between age-group and gender in the diagnosis of schizophrenia? Question 5 57 60 SOLUTION, Step 3: Step 3: Find the value of the test statistic and the p-value The test statistic is calculated as 𝜒𝜒2 = ∑ (𝑂𝑂−𝐸𝐸)2 𝐸𝐸 , or: 𝜒𝜒2 = 2.093 + 0.395 + 0.817 + 1.322 + 2.830 + 0.534 + 1.105 + 1.788 = 10.884 The p-value is 𝑃𝑃 𝜒𝜒2 ≥ 10.884 = 0.012, based on (4-1)x(2-1)= 3 degrees of freedom. 61 SOLUTION, Step 4: Step 4: State the conclusion in context, with linkage to p- value Because the p-value is very small (for instance much smaller than α = 0.05), we would reject the null hypothesis and conclude that the sample data provide strong evidence that there is an association between age group at diagnosis and gender for the population currently being treated for schizophrenia. • Using the idea of sufficient evidence (given in the stem of the problem) to state the hypotheses: “H0: there is sufficient evidence of no association” and “Ha: there is sufficient evidence of an association.” • Listing incorrect conditions like n > 30, or “both samples independent” • Stating the condition “expected counts > 5” but not verifying by computing them. • Writing statistical conclusions as definitive statements “we conclude …” or “we prove” • Stating conclusions about the sample data using the bar graph, but not carrying out any inference. Common Student Errors, Q5 Question #6: Coin vs chip method for randomizing Calculate probabilities for each method; decide which is best in a given situation 6. Consider an experiment in which two men and two women will be randomly assigned to either a treatment group or a control group in such a way that each group has two people. The people are identified as Man 1, Man 2, Woman 1, and Woman 2. The six possible arrangements are shown below. Question 6 Two possible methods of assignment are being considered: the sequential coin flip method, as described in part (a), and the chip method, as described in part (b). For each method, the order of the assignment will be Man 1, Man 2, Woman 1, Woman 2. Question continued ii) For the sequential coin flip, what is the probability that Man 1 and Man 2 are assigned to the same group? ii) Man 1 and Man 2 are assigned to the same group for arrangements A and D, so the probability is P(A) + P(D) = 1/4 + 1/4 = 1/2 Question Part (a-ii) SOLUTION Part (a-ii) b) For the chip method, two chips are marked “treatment” and two chips are marked “control.” Each person selects one chip without replacement. i) Complete the table below by calculating the probability of each arrangement occurring if the chip method is used. Arrangement A B C D E F Probability part (b) (i) Let T represent being assigned to the treatment group and C represent being assigned to the control group for each chip drawn. The process stops when either the treatment group or the control group has two members. The probabilities differ from the coin flip method because chips are drawn without replacement. The outcomes and their probabilities are as follows. Arrangement A B C D E F Chip outcomes TT TCT TCC CC CTC CTT Calculation (2/4)(1/3) (2/4)(2/3)(1/2) (2/4)(2/3)(1/2) (2/4)(1/3) (2/4)(2/3)(1/2) (2/4)(2/3)(1/2) Probability 1/6 1/6 1/6 1/6 1/6 1/6 SOLUTION, Part (b-i) • In part (ii), when attempting to find the combined probability of arrangements A and D, some students incorrectly multiplied the two probabilities rather than adding them. • TIP: Make sure students understand the difference between P(A or D) and P(A and D). • In part (ii), when attempting to find the combined probability of arrangements A and D, some students incorrectly subtracted P(A)×P(D) from P(A) + P(D). • TIP: When using the addition rule, remind students that P(A and D) = 0 when events are mutually exclusive. Common Errors and Teaching Tips: parts (a) and (b) Sixteen participants consisting of 10 students and 6 teachers at an elementary school will be used for an experiment to determine lunch preference for the school population of students and teachers. As the participants enter the school cafeteria for lunch, they will be randomly assigned to receive one of two lunches so that 8 will receive a salad, and 8 will receive a grilled cheese sandwich. The students will enter the cafeteria first, and the teachers will enter next. Which method, the sequential coin flip method or the chip method, should be used to assign the treatments? Justify your choice. Part (c) Use the chip method (from part b). The chip method gives equal probability to all possible arrangements, but the coin method does not, as shown in the tables from parts (a-i) and (b-i). Furthermore, the coin method is more likely to result in imbalanced treatment groups with regard to students and teachers, based on the probabilities in parts (a-ii) and (b-ii). If food preferences for teachers are different than for students, this imbalance is a problem. For example, if one treatment group consists entirely of students, it would be impossible to know if a difference in the response variable is due to the treatment (type of meal) or the role of the person at the school (teacher or student). SOLUTION, Part (c) Question averages 1. 1.72 (wolves, regression) 2. 2.22 (water cups, C.I.) 3. 1.72 (melons, probability) 4. 1.71 (pottery, boxplots) 5. 1.51 (schizophrenia, chi-sq.) 6. 0.99 (coins/chips, random assign) Thank You. A pdf version of these slides will be posted on my homepage within the next week. Scroll to “Representative presentations.” http://www.ics.uci.edu/~jutts/ (or search my name to easily find it) Presentations from 2014 to 2016 are there already.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved