Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Comparing Sunscreen Lotions with Histograms & Hypothesis Testing, Assignments of Statistics

Solutions to problem 1 of stat 421 homework 6, where the goal is to compare two sunscreen lotions, a and b, by examining their histograms and performing a hypothesis test using the average difference in scores. The document also includes the r code to generate the histograms, estimate the reference distribution, and calculate the p-values based on both the reference distribution and the normal approximation.

Typology: Assignments

Pre 2010

Uploaded on 03/18/2009

koofers-user-dqn
koofers-user-dqn 🇺🇸

10 documents

1 / 10

Toggle sidebar

Related documents


Partial preview of the text

Download Comparing Sunscreen Lotions with Histograms & Hypothesis Testing and more Assignments Statistics in PDF only on Docsity! Stat 421, Fall 2008 Fritz Scholz Homework 6 Solutions Problem 1: It is desired to examine the difference between two sun screen lotions, called A and B. We have n=100 volunteers, willing to try out the lotions. In order to eliminate the natural variation effects from subject to subject, it is decided to apply both lotions to each subject, one lotion on one arm, the other lotion on the other arm. Which arm gets which lotion is decided by a fair coin flip and the subjects do not know the lotion identities for their respective arms. This avoids possible bias coming from subjects, favoring one lotion over the other, by exposing their arms differently. It also avoids biases arising from left and right arm receiving different sun exposure. (For example, basal cell carcinomas occur more frequently on the left side, which is exposed more for a driver of a car). After sufficient exposure each arm is assigned a burn score. The scores associated for arms treated with lotion A are given below in the X.A vector of length 100, those associated with lotion B are given below in the X.B vector. The scores in position i of each vector belong to the same person i. On one page (using par(mfrow=c(3,1) prior to plotting) give the histograms of X.A, X.B and X.A-X.B (using nclass=20) and indicate by a vertical line the position of the mean for each histogram. Based on these histograms what would you say about A and B having different effects? Now test the hypothesis H0: no difference between A and B against the alternative hypothesis that there is a difference, using as test statistic the average Dbar of the D = X.A-X.B scores (which is the same as mean(X.A)-mean(X.B)). As reference distribution for the test statistic we take all 2n possible assignments of signs to the n=100 absolute differences |D| and find the average D value (Dbar) for each such assignment. If this looks familiar to you from a previous homework problem, it is meant to be, and you can reuse any code developed there. Of course, the complete reference distribution is out of our reach and you will need to use a simulated reference distribution to get an estimate for the p-value of the observed Dbar. Before getting the reference distribution invoke set.seed(27) and use Nsim=10000 simulations. The rationale for taking this reference distribution is as follows: Under H0 it does not matter whether A or B is applied to a specific arm. Whether we see for the first person (based on the data below) the difference X.A-X.B = 80.9 – 77.2 = 3.7 or whether we see X.A-X.B = -3.7 (because the assignment of lotions had gone the other way) would have equal chances ½ and ½. The score for the left arm and the score for the right arm would have remained the same (whether A or B was applied). The results for each person are due to all other factors that might have impacted the final scores for that person. Show the histogram (using nclass=100, xlim=c(-2.5,2.5)) with superimposed normal distribution for this estimated reference distribution and give the estimated p- values based on this estimated reference distribution and based on the normal approximation, either in your text or as part of your histogram plot. Write a function that does all this and provide the code. X.A = c(80.9, 73.1, 39.1, 29.6, 37.6, 55.8, 51.3, 69.1, 84.6, 53.7, 31.3, 55.5, 74, 51.2, 45.4, 38.8, 28.5, 37.6, 49.7, 54.5, 42.8, 29.3, 48.9, 51.4, 8.3, 31.8, 36.1, 57.2, 72.8, 57.2, 51.5, 40.1, 78.6, 26.4, 50.4, 66.2, 58.5, 60.3, 49.7, 63.4, 19.5, 40.5, 65.6, 85.9, 26.2, 30.1, 23, 55.5, 55.2, 50.6, 39.2, 50.9, 63.9, 29.1, 78.5, 63.7, 58.9, 63.3, 56.5, 70.2, 36.5, 58.1, 51.1, 92.4, 37.3, 34, 55.5, 53.9, 38.1, 61.5, 47.4, 60.2, 37.6, 63.4, 58.9, 43.8, 37.9, 48.4, 61.9, 80.2, 86.4, 58.5, 41.5, 55.2, 41.8, 64.1, 51.9, 68.7, 23.2, 27.4, 49.3, 45.6, 64.5, 49.2, 43.1, 62.8, 48.8, 69.6, 70.7, 48.2) and X.B = c(77.2, 65.9, 39.1, 25.4, 37.3, 53.4, 47.3, 64.5, 79, 52.5, 27.3, 45, 68.6, 50.2, 34.9, 32.7, 22.3, 38.8, 52.2, 51.9, 46.6, 32.2, 55.2, 58, 7.1, 31.5, 35.7, 52, 70.6, 54.8, 44.6, 38.9, 79.4, 29.1, 44.8, 67, 61.7, 59.6, 50.4, 66.1, 17.4, 45.6, 74.2, 77.7, 32.9, 33.9, 25.6, 56.9, 59.2, 42.3, 44.7, 50.5, 65.2, 31, 83.6, 62.9, 52.5, 67.3, 57.5, 62.2, 29.3, 55.6, 41.3, 89.7, 38.8, 27, 55, 46.2, 34.9, 56.7, 41.8, 55.7, 32.6, 63.9, 55.3, 44.9, 42.4, 44.6, 52.4, 78.5, 81, 58.5, 42.5, 58.2, 40, 60.8, 52.2, 71.7, 22.7, 23.8, 39.5, 45, 74.8, 45.3, 44.2, 59.7, 44.2, 72.5, 63.9, 43.5) Solution: The histograms for X.A, X.B and X.A-X.B are shown below. if(Dbar>=0){ text(1.02*Dbar,.97*high,"two-sided p-value",cex=1.3,adj=0) text(1.02*Dbar,.9*high,paste("p = ",format(signif(pval,6))),cex=1.3,adj=0) text(1.02*Dbar,.80*high,"normal approximation",cex=1.3,adj=0) text(1.02*Dbar,.73*high,"two-sided p-value",cex=1.3,adj=0) text(1.02*Dbar,.66*high,paste("p = ",format(signif(pval.normal,6))),cex=1.3,adj=0) }else{ text(1.02*Dbar,.97*high,"two-sided p-value",cex=1.3,adj=1) text(1.02*Dbar,.9*high,paste("p = ",format(signif(pval,6))),cex=1.3,adj=1) text(1.02*Dbar,.80*high,"normal approximation",cex=1.3,adj=1) text(1.02*Dbar,.73*high,"two-sided p-value",cex=1.3,adj=1) text(1.02*Dbar,.66*high,paste("p = ",format(signif(pval.normal,6))),cex=1.3,adj=1) } } Problem 2: As indicated on slide 98 of Stat421NormalPopulation.pdf modify the function sample.size2 (slide 68, same source) so that it can be used to determine the smallest combined sample size N=m+n (with m=n) necessary to get the desired power beta at an alternative delta0=|muY-muX|/sigmau. The hypothesis to be tested is H0: muY=muX against H1: muY  muX. Here sigmau is a known upper bound to the actual common sigma. It is assumed that you deal with random samples from normal distributions with same variance. Provide the code for your modified function. Apply this to the concrete case when you want power =.9 at delta0=.5. Take alpha = .05. Show the plots that helped you make your determination of that smallest N. (Note: N should be even!). Solution: The appropriately modified function is sample.size2samp2 = function (delta0 = 1, nrange = 10:100, alpha = 0.05) { power = NULL for (N in nrange) { tcrit = qt(1 - alpha/2, N - 2) power = c(power, 1 - pt(tcrit, N - 2, sqrt(N/4)* delta0)+ pt(-tcrit, N - 2, sqrt(N/4) * delta0)) } plot(nrange, power, type = "l", xlab = "total sample size N = N/2+N/2") abline(h = seq(0.01, 0.99, 0.01), col = "grey") abline(v = nrange, col = "grey") title(substitute(abs(mu[Y] - mu[X])/sigma[u] == delta0 ~ ", " ~ alpha == alpha0, list(delta0 = delta0, alpha0 = alpha))) lines(nrange, power, col = "red") } sample.size2samp2(delta0=.5,nrange=100:200) produced the following plot 100 120 140 160 180 200 0. 70 0. 75 0. 80 0. 85 0. 90 0. 95 total sample size N = N/2+N/2 p ow er Y  X u 0.5,  0.05 with magnification given by sample.size2samp2(delta0=.5,nrange=160:180) 160 165 170 175 180 0. 88 5 0. 89 0 0. 89 5 0. 90 0 0. 90 5 0. 91 0 0. 91 5 total sample size N = N/2+N/2 p ow er Y  X u 0.5,  0.05 We seem to be slightly below .9 in power at N=170. Thus we should take N=172, or m=n=86 to guarantee power at least .9. Problem 3: Assuming the data situation as in Problem 1, but assume that these scores were obtained as though lotion A had been assigned randomly to 100 of the 200 available arms, while the other arms got lotion B. Thus it would have been possible to have many persons with both arms treated with lotion A. It also is possible (but extremely unlikely) that each person got one of each lotion applied to his/her arms. How unlikely is it? Count the number of ways of splitting 200 into two groups of 100 each (use choose(…? …) ) and count the number of ways of giving each person one of each lotion. Again we use as test statistic Dbar = mean(X.A)-mean(X.B). The full reference distribution is unattainable. Write a function that (after using set.seed(27) ) estimates this reference distribution by generating Nsim=10000 splits of the 200 available numbers Z=c(X.A,X.B), draws the histogram with nclass=100, and gives the estimated two-sided p- value for the observed Dbar using the reference distribution. Show the location of the observed Dbar as a vertical line in the histogram. Also show the approximating normal distribution for this histogram (use probability=T in hist) and get the p-value from it. To find the appropriate normal distribution revisit slide 44 in Stat421DoeFlux.pdf.
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved