Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS, Assignments of Biostatistics

The chi-squared is the correct statistical test for comparing two population proportions based on information from two (large) samples – both the sample meet ...

Typology: Assignments

2021/2022
On special offer
30 Points
Discount

Limited-time offer


Uploaded on 08/01/2022

fioh_ji
fioh_ji 🇰🇼

4.5

(65)

824 documents

1 / 17

Toggle sidebar
Discount

On special offer

Related documents


Partial preview of the text

Download SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS and more Assignments Biostatistics in PDF only on Docsity! SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In fancy statistical notation, 7 X X 7 1i i∑ == = 2.10 7 5.123.65.102.75.135.90.12 = ++++++ years. b. To calculate the sample median, first rank the values from lowest to highest: 6.3 7.2 9.5 10.5 12.0 12.5 13.5 Since there are 7 values, an odd number, we can simply select the middle value, 10.5, to calculate the sample median. b. It’s a good thing we have calculated the sample mean- we ned this to calculate the sample standard deviation! Recall the formula for SD: 6 )2.105.12.....(..........)2.105.9()2.100.12( 17 )XX( SD 222 7 1i 2 i −+−+− = − − = ∑ = = 2.71 years d. 1. sample mean – Would decrease, as the lowest value gets lower, pulling down the mean. 2. sample median – Would remain the same since the middle value is still 10.5 By replacing the 6.3 with 1.5, the rank of the 7 values is not affected. 3. sample standard deviation – Would increase. Because our minimum value has now gotten smaller, while the rest of the data points remain unchanged, the spread or variability in our data has increased; since SD is a measure of spread, it too will increase (prove it to yourself!). e. While the sample mean and sample standard deviations of the 14 8. D is the correct answer. Remember, whether we calculate sample SD from a sample of 1,000 or a sample of 3,000, both are estimating the same quantity- the population standard deviation. These two estimates should be about the same, and we cannot predict which will be larger. BIOSTATISTICS SAMPLING DISTRIBUTIONS, CONFIDENCE INTERVALS SOLUTIONS QUESTION 1. a. It can not be determined which researcher will get the bigger standard deviation – both sample SDs from the sample with n = 100, and with n = 1,000 are estimating the same quantity – the population standard deviation. Therefore, the two estimates should be similar, and it is not possible to tell which will be larger , prior to calculating the values. Standard deviation does not depend on sample size, but will vary from random sample to random sample. b. Standard error does depends on sample size, however; the larger the sample size, the smaller the standard error of the mean (SEM). Therefore, the SEM calculated from the sample with n = 1,000 will likely be smaller the SEM calculated from the sample with n = 100. c. Extreme values are more likely in larger samples – therefore, the investigator with the sample of n = 1,000 is more likely to have the tallest man. d. Extreme values are more likely in larger samples – therefore, the investigator with the sample of n = 1,000 is more likely to have the shortest man. QUESTION 2. a. In this study of 60 year old women with glaucoma, n = 200, X =140 mmHg, and SD = 25mm Hg. Since n is large, we can use the Central Limit Theorem to aid us in constructing a 95% confidence interval for the population mean blood pressure, µ. Its “business as usual” via the formula: X±2*(SEM), where SEM = n SD = 200 25 = 1.77 mm Hg Plugging in our sample values gives us: 140 ±2*(1.77) (136.5 mm Hg, 143.5 mmHg) b. If a second study yielded the same sample statistic values, but were done with 100 women, what would happen to the width of the 95% confidence interval? Well, we know since this sample is smaller than the previous example, the SEM will be larger, leading to a wider confidence interval. In non-mathematical terms, our sample contains less information than a sample of 200 women, and therefore will yield a less precise (more uncertain) estimate of the population mean. The proof is as follows: X±2*(SEM), where SEM = n SD = 100 25 = 2.5 mm Hg Plugging in our sample values gives us: 140 ±2*(2.5) (135 mm Hg, 145 mm Hg) 3. A is the correct answer. Here the sample is of size n = 500, which is large enough to ensure that the Central Limit Theorem kicks in . By the Central Limit theorem, the sampling distribution the of the sample mean from a sample of 500 will be normally distributed. 4. D is the correct answer. No general statement can be made as we do not know whether or not the sample of 200 women who agreed to participate from the original random sample of 300 was still representative of all 18 year old females. If these 200 women are inherently different from the other 100 non-participants, the results shown are biased. 5. B is the correct answer. The more confident we want to be, the wider our confidence interval. Ninety-nine percent confidence is higher than ninety-five percent confidence; therefore the 95% confidence interval is not so wide as the 99% confidence interval. 6. C is the correct answer. The sample is random, i.e. representative – therefore, the sample distribution should mimic the larger population distribtion, which is right-skewed. 7. B is the correct answer. We would expect the two samples to have SD values that are similar. but, recall that the standard error (SE) is the standard deviation divided by the square-root of the sample size. Because Sample B is much larger (N=2000) than Sample A (N=100), we would then expect the SE of Sample B to be smaller than the SE of Sample A. 8. A is the correct answer. This question is asking about the shape of the sampling distribution of the sample mean, based on samples of size 100: As the sample size is large (n=100) the Central Limit Theorem applies and the sampling distribution should be normal: hence a histogram based on the sample means of 3,000 random samples should be approximately normal : note it is not the number of samples that determines whether the Central Limit Theorem “kicks in “ but the size of each of the samples. “However, because women were not randomized to take the vitamin supplements but were self-selected into the vitamin exposure groups, it is not possible to attribute the higher scores to Vitamin E. It is possible that the women taking Vitamin E differed on multiple factors when compared to the women who were not taking the supplement. The difference in test scores could be attributable, at least partially, to some of these other factors.” 3. The correct answer is C. Because the 95% confidence interval does not include zero, we would reject null hypothesis of a true mean difference of zero at the α = .05 level. Testing Ho: µ2 - µ1 = 0 is equivalent to testing Ho: µ2 = µ1, the equality of the two means. 4. The correct answer is A. The data collected in this example is paired data, and a p- value would be obtained from the paired t-test. The test statistics would be: )(____tan __ diff diff Xse X differencemeanoferrordards differencemeanobservedt == where X = 15, and se( X ) = 10 40 100 40 = = 4. So Z = 15/4 = 3.75. Since t > 2, we know p < .05 5. The correct answer is B. The standard error of a statistic is a measure of the variability of that statistic across different sample sizes – the variability of the sampling distribution. Therefore, the standard error of a statistic is the standard deviation of the sampling distribution. 6. B is the correct answer. Despite the fact that we are computing before/after differences we ultimately are comparing these differences between two independent groups: those randomized to the diet program, and those randomized to exercise. Since we are making a comparison of mean changes between two independent groups, the appropriate test is the 2 sample unpaired t-test. 7. The correct answer is E. This is a hard, but important question : choice a is just flat- out incorrect, based on the definition of the p-value, and choices b-c are impossible to ascertain from just a p-value, as it imparts no information about the direction/magnitude and clinical or scientific significance of the results of a study. 8. A is the correct answer. As the 95% confidence interval for the mean difference does not include, the resulting p-value would be less than .05. 9. C is the correct answer. The chi-squared is the correct statistical test for comparing two population proportions based on information from two (large) samples – both the sample meet the “large sample” criteria. BIOSTATISTICS PROPORTIONS SOLUTIONS Question 1. (a) To estimate the 95% confidence interval for each group, we need to know the estimated proportion in each group (150/262 = 0.57 in the vaccine group and 83/134 = 0.62 in the placebo group), and their standard errors. Recall that the formula for the standard error of a proportion is p p N ( )1− , so that the standard error of estimate p in the vaccine group is 0.030 and in the placebo group is 0.042. Now we can implement the formula for the confidence interval for a proportion (for large N): $ ( $)p se p± 2 Plugging into this equation for the vaccine group, we have: (0.57 – 2*0.03, 0.587+ 2*0.03) = (0.51, 0.63) Plugging into this equation for the placebo group, we have: (0.62 – 2*0.04, 0.62 + 2*0.04) = (0.54, 0.70) These confidence intervals do overlap in the range of 0.0.54 to 0.63, which seems to be a large fraction of the intervals. (b) To compute the 95% confidence interval for the difference in proportions, we use the general formula for the confidence interval, where we use the standard error for the difference of proportions provided: ( $ $ ) ( $ $ )p p se p p1 2 1 22− ± − Plugging into this equation: (0.57 – 0.62 -2*0.05, 0.57 - 0.62 + 2*0.05) = (-0.15, 0.05). The interpretation of this 95% CI basically suggests that the results from our samples indicates that the vaccine could be associated with a decrease in the proportion of children experiencing at least one episode of AOM of at most 15%, but could also be associated with an increase as large as 5%. (c) The null hypothesis would be that the underlying true proportions of children experiencing at least one episode of AOM are the same for the vaccinated and non-vaccinated children : in other words, there is no relationship between the BIOSTATISTICS LINEAR REGRESSION SOLUTIONS 1. The correct answer is D. The coefficient for weight is 0.10, indicating that the expected difference in SBP for two children of the same age who differ by one ounce in birth weight is 0.10 mmHg, the heavier child compared to the lighter child. So if we are comparing a child who weighed 120 ounces at birth to a child who weighed 90 ounces at birth, and both children were the same age, the estimated expected (mean) difference in SBP is 30*.10mmHG = 3.0 mmHg. 2. The correct answer is B. Well, the coefficient for age is an estimate of the difference in SBP between 2 infants with the same birthweight who differ by one day in age: the older compared to the younger will have SBP of 4 mmHg higher, one average (95% CI: 4±2*.6 = (2.8, 5.3)). To get the corresponding CI for the difference in SBP for equally weighted infants who differ by 2 days In age, we can just double the endpoints for the previously computed CI. 3. C is the correct answer. All that’s being changed is the units in which the weight is measured – the measurements themselves are not being altered, just the units in which the values are expressed – ergo, the correlation between SBP and a child’s age and weight should not be altered. 4. The correct answer is D. Recall, r tells us something about both the strength and the direction of a relationship. It is the appropriately signed value of 2R . Since the slope is negative, we know r must be negative: hence it is 76.0− = -.87. 5. D is the correct answer. This model relates wage as a function of a subject's sex, union membership status, and years of education via the following equation - ._*76.0_*9.1*9.13.0 educationyearsmemberunionsexy ++−+−= Male, non-union workers with 12 years of education have the following predictor values: sex = 0, union_member = 0, years_education = 12, so the resulting predicted value is ./_82.812.93.012*76.00*9.10*9.13.0 hrdollarsy =+−=++−+−= 6. A is the correct answer. What this is asking for in more user friendly terms is the 95% confidence interval for the coefficient of union_member in a model that also includes sex and years_education: recall the interpretation of this coefficient is that it estimates the adjusted mean hourly wage for union members compared to non-union members of the same sex and same years of education (ie: adjusted for sex and years of education). So this estimated coefficient is 1.9, and its standard error is 0.5: as we have a large sample, we can just employ the 5.0*29.1)ˆ(2ˆ 11 ±=± bSEb method to get the 95% CI of (.90, 2.90), or $0.90 to $2.90 per hour. 7. (a) Two possible phrasings: - a 1 year increase on age is associated with an .02 liter increase in FEV, on average - In two groups of men who differ by one year of age, the older groups will have average FEV of .02 liters higher than the younger group (b) Since we have a sample of 200 men, we need not fuss with pesky t- corrections, and can just employ the general formula )ˆ(2ˆ 11 bSEb ± , which gives a 95% CI of 0.02±2*(.005), giving a 95% CI of (.01, .03). So based on this sample of 200 men, the true increase associated with an 1 year increase in age is between .01 liters and .03 liters. (with 95% confidence, etc..) (c) The strength of the linear association can not be assessed without viewing a scatterplot and seeing an estimated correlation coefficient. (d) To find the difference between 60 and 50 year old men, we simply multiply the coefficient for age (representing a 1 year difference) by 10: 0.02*10 = 0.2. (e) No – these results are based on information from a sample of men aged 20 – 60: The results are not necessarily applicable to men outside this age range. BIOSTATISTICS SURVIVAL ANALYSIS SOLUTIONS 1. Survival analysis would be used. The outcome variable is “time to AIDS”, where some of the times are censored. When we have time to event data, the best choice is to use survival, and we could more specifically use the Kaplan-Meier approach to estimate the survival curve and the median time to AIDS. 2. The correct answer is D. The median time (i.e. the time at which S(t) = 0.50) is not shown on the plot. We see that S(t) only ranges from 0.90 to 1.00, meaning that the median time is not within the 180 days. 3. The correct answer is B. At 100 days, the height of the survival curve, S(t), is approximately 0.94. 4. C is the correct answer. By taking the average, we are treating the censored times as observed times of death. But, when an observation is censored, we know that the true time of death must be after the censored time of death. So, the censored times are underestimates of the true survival times. As a result, taking the mean of both the observed time of death and censored times of death, we get an underestimate of the true mean survival time.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved