Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Data Analysis: Comparing Graphs and Statistics for Different Scenarios - Prof. Nancy M. Pf, Exams of Data Analysis & Statistical Methods

Several scenarios where different data visualization methods and statistical analyses are used to compare and summarize data. The scenarios include comparing percentages, means and standard deviations, five number summaries, and correlations for different types of data such as exercise times, student preferences, and waiting times. The document also covers the use of stemplots and boxplots to compare distributions and determine which arrangement is faster or has more spread. Additionally, it discusses the role of seat position on buses in relation to passenger nausea and the relationship between cereal moisture content and shelf time.

Typology: Exams

Pre 2010

Uploaded on 09/09/2009

koofers-user-78j
koofers-user-78j 🇺🇸

10 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Data Analysis: Comparing Graphs and Statistics for Different Scenarios - Prof. Nancy M. Pf and more Exams Data Analysis & Statistical Methods in PDF only on Docsity! Name: Practice First Midterm Exam Statistics 1000 Spring 2007 (Pfenning) This is a closed book exam worth 150 points. You are allowed to use a calculator and a two-sided sheet of notes. There are 9 problems, with point values as shown. If you want to receive partial credit for wrong answers, show your work. Don’t spend too much time on any one problem. 1. (5 pts.) Suppose we are interested in finding out if students tend to sleep less, the older they are. (a) What would be an appropriate display? (i) bar graph (ii) histogram (iii)side-by-side boxplots (iv) scatterplot (b) Which of these would provide the best summary? (i) compare percentages (ii) compare means and standard deviations (iii) compare Five Number Summaries (iv) report the correlation 2. (5 pts.) Suppose we are interested in finding out if smokers exercise less than non- smokers. Data values for exercise times include some high outliers. (a) What would be an appropriate display? (i) bar graph (ii) histogram (iii)side-by-side boxplots (iv) scatterplot (b) Which of these would provide the best summary? (i) compare percentages (ii) compare means and standard deviations (iii) compare Five Number Summaries (iv) report the correlation 3. (5 pts.) Suppose we are interested in finding out if males are just as likely as females to prefer the color black. (a) What would be an appropriate display? (i) bar graph (ii) histogram (iii)side-by-side boxplots (iv) scatterplot (b) Which of these would provide the best summary? (i) compare percentages (ii) compare means and standard deviations (iii) compare Five Number Summaries (iv) report the correlation 4. (10 pts.) Words per minute typed by experienced typists follows a normal distribution with mean 60 and standard deviation 15. (a) According to the 68-95-99.7 Rule, 95% of experienced typists type between and words per minute. (b) Suppose an experienced typist can type 78 words per minute. What is his standard (z) score? 5. (20 pts.) Two banks each have three tellers helping customers. One bank requires customers to stand in separate lines for the three tellers, the other has customers stand in a single line and be called to the next available teller. Below are a back-to- back stemplot and side-by-side boxplot for waiting times (stems are minutes) of 10 customers at the bank with separate lines and 11 customers at the bank with a single line. Separate Single 1 4 6 2 5 7 2 6 6 7 7 9 7 7 7 1 2 3 4 7 8 8 5 8 3 9 10 0 11 (a) Judging from the looks of the stemplot, which arrangement seems to be faster? (i) separate lines (ii) single line (iii) both about the same (b) For which arrangement do the waiting times have more spread? (i) separate lines (ii) single line (iii) both about the same (c) One fourth of the customers in the bank with separate lines waited minutes or less. (Find Q1.) (d) The boxplots indicate that both distributions are (i) very left-skewed (ii) fairly symmetric (iii) very right-skewed 7. (30 pts.) Cereal manufacturers looked at the relationship between number of days x that 14 cereal boxes spent on the supermarket shelf, and moisture content y. Scatter- plot and regression output are given below. (a) What is the response variable? (b) Sitting on the shelf tends to make cereal (i) dryer (ii) soggier (iii) neither (c) Which of the following is the best guess for r? (i) -.95 (ii) -.55 (iii) -.15 (iv) .15 (v) .65 (vi) .95 (d) If we switched the roles of x and y, then which of the following would change? (i) the value of r (ii) the equation of the regression line (iii) both (iv) neither (e) Predict the moisture content of a cereal box that sat on the shelf for 10 days. (f) What is the residual for a shelf time of 10 days, if the actual moisture content was 3.40? (g) Suppose a supermarket accidentally kept a cereal box on the shelf for 100 days. What can we say about its moisture content? i. It should equal 7.29. ii. It should be very close to the predicted value because of the high x value. iii. It could be far from the predicted value because of extrapolation. (h) The box which spent 20 days on the shelf is an (i) outlier (ii) influential observation (iii) both (iv) neither (i) Taste tests indicated that the cereal is unacceptably soggy when the moisture content exceeds 4.1. Judging from the scatterplot, what would be a good time to remove unsold cereal from the shelf? After (i) a day (ii) a week (iii) a month (iv) a year Regression Analysis: moisture versus days The regression equation is moisture = 2.79 + 0.045 days Predictor Coef SE Coef T P Constant 2.78551 0.09485 29.37 0.000 days 0.044620 0.004113 10.85 0.000 S = 0.1962 R-Sq = 90.7% R-Sq(adj) = 90.0% Unusual Observations Obs days moisture Fit SE Fit Residual St Resid 8 20.0 3.1000 3.6779 0.0525 -0.5779 -3.06R R denotes an observation with a large standardized residual 403020100 5 4 3 days m oi st ur e 8. (20 pts.) 350 students at 18 Seattle schools in high-crime areas participated in a study during the 1980’s. About half of the students took part in a program throughout elementary school which trained them how to earn good grades and get along with others; the other half did not take part in the program. The pregnancy rate for young women in the program, by the time they reached the age of 21, was only 38 percent, compared with 56 percent for the women who had gotten no training. (a) What kind of study was this? (i) observational study (ii) experiment (iii) anecdotal evidence (iv) multistage sample (b) Which of these best describes the intended population of interest? i. 350 students at 18 Seattle schools in high crime areas ii. all students at Seattle schools iii. all students at schools in high crime areas (c) The treatment group’s pregnancy rate was how much lower than the rate for the control group? (d) Which of the following could be a possible lurking (confounding) variable? i. if students in one group had a different Health and Sex Ed teacher than those in the other group ii. if students in one group were trained to get along with others and students in the other group were not iii. if female students in one group tended to get pregnant and those in the other group did not (e) What would be the best way for researchers to assign some students to attend the program, others not? (i) put males in one group and females in the other (ii) ask for volunteers (iii) make a random assignment (f) This problem involves (i) one quantitative and one categorical variable (ii) two quantitative variables (iii) two categorical variables (g) To summarize differences, we (i) compare percentages (ii) compare means (iii) report the correlation r
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved