Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Analysis of Fire Station Aid Response Data and Worker Output Data - Prof. Frie, Assignments of Statistics

Solutions to homework problems related to statistical analysis of fire station aid response data and worker output data. The problems involve pairwise means analyses using different methods, anova testing, variance stabilizing transformations, and levene's test for homogeneity of variances. The r code is provided to perform the analyses and interpret the results.

Typology: Assignments

Pre 2010

Uploaded on 03/18/2009

koofers-user-a0h
koofers-user-a0h 🇺🇸

10 documents

1 / 12

Toggle sidebar

Related documents


Partial preview of the text

Download Statistical Analysis of Fire Station Aid Response Data and Worker Output Data - Prof. Frie and more Assignments Statistics in PDF only on Docsity! Stat 421, Fall 2008 Fritz Scholz Homework 9 Solutions Problem 1: For the fire station aid response data from the previous homework perform pairwise means analyses using 95% confidence intervals, comparing one fire station against another for all choose(4,2) possible comparisons. Do this using the Fisher protected LSD method, the Bonferroni method, the Tukey-Kramer method and Scheffe's method. Write a function that computes these sets of 4 confidence intervals for each of the choose(4,2) comparisons and plots them in groups next to each other as done in the class slides, each group identified as to which pair of means is compared. Draw a horizontal line at the zero level. Also give the confidence interval results in tabular form. Suggestion: you may want to use write.csv(out,"out.csv"), where out is the appropriate tabular output from your function and then you can cut and paste the view of out.csv from Excel into Word. Give the plots, state and explain your conclusion, and give your function code. Solution: Given that the last HW showed that the p-value was .0182 or .0205 (normal theory F- test or randomization F-test), we see that we are below alpha=.05. Thus we go beyond the first step in Fisher’s protected LSD method. The function call out=task1() produced the plot below, comparing the four types of intervals. The code of task1 is given below. co nt ra st s -6 -4 -2 0 2 4 6 Fisher's protected LSD Tukey-Kramer Bonferroni Scheffe ST1  ST2 ST1  ST3 ST1  ST4 ST2  ST3 ST2  ST4 ST3  ST4 The tabular output of the confidence intervals, cut and pasted from Excel, after exporting out to out.csv via write.csv(out,”out.csv”), and minor editing of the spreadsheet (headers and decimals shown): Diff lowerLSD upperLSD lowerTK upperTK lowerBonf upperBonf lowerScheffe upperScheffe -1.00 -3.07 1.07 -3.78 1.78 -3.91 1.91 -4.03 2.03 -1.78 -3.86 0.29 -4.57 1.00 -4.69 1.13 -4.82 1.25 1.57 -0.51 3.64 -1.22 4.35 -1.34 4.48 -1.47 4.60 -0.78 -2.86 1.29 -3.57 2.00 -3.69 2.13 -3.82 2.25 2.57 0.49 4.64 -0.22 5.35 -0.34 5.48 -0.47 5.60 3.35 1.28 5.42 0.57 6.13 0.44 6.26 0.32 6.38 Conclusion: The conclusion is that the difference between Station 3 and Station 4 seems to be mainly responsible for the significant F-test, because (almost) all the other intervals for the pairwise mean differences miss zero. There is one miss for the comparison of Station 2 and Station 4 by the Fisher LSD method. Since that method does not fully protect against multiple comparison issues and since the other intervals cover zero, we should not jump to a conclusion here. The code of task1: task1=function (alpha = 0.05) { y = read.csv("aidresponse.csv") response = as.vector(as.matrix(y)) station = c(rep("station1", 6), rep("station2", 6), rep("station3", 6), rep("station4", 6)) out.lm = lm(response ~ as.factor(station)) out.anova = anova(out.lm) s = sqrt(out.anova[[3]][2]) Ybar = out.lm$fitted.values[c(1, 7, 13, 19)] Diff = NULL lower = NULL upper = NULL lowerTK = NULL upperTK = NULL lowerBonf = NULL upperBonf = NULL lowerScheffe = NULL upperScheffe = NULL tfac = qt(1 - alpha/2, 20) qfac = qtukey(1 - alpha, 4, 20) Ffac = sqrt((4 - 1) * qf(1 - alpha, 4 - 1, 24 - 4)) delta = tfac * s * sqrt(1/6 + 1/6) deltaScheffe = Ffac * s * sqrt(1/6 + 1/6) 1 2 3 4 5 6 7 8 9 10 2 5 0 3 0 0 3 5 0 4 0 0 worker p a rt o u tp u t p e r d a y The outputs look quite variable from worker to worker and we would expect strongly significant results in our ANOVA. Also, the variability seems to be somewhat higher for the higher boxplots. b) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) as.factor(worker) 9 254380 28264 118.46 < 2.2e-16 *** Residuals 190 45335 239 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 The p-value of < 2.2e-16 is below the underflow level. Thus the means are significantly different. c) -3 -2 -1 0 1 2 3 -4 0 -2 0 0 20 40 Normal Q-Q Plot Theoretical Quantiles S am pl e Q ua nt ile s The normality looks quite good in the middle portion but the data seem to veer away from the fitted line, especially at the upper end. This could be due to the inhomogeneous variability from worker to worker. d) As for the modified Levine test: The ANOVA on the absolute deviations of the outputs from their respective group medians produced the ANOVA table below. Since .025 < .05 it is certainly significant at level .05, we should think about a variance stabilizing transform for the output data. Analysis of Variance Table Response: d Df Sum Sq Mean Sq F value Pr(>F) as.factor(worker) 9 1937.2 215.2 2.1798 0.02511 * Residuals 190 18762.1 98.7 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 e) k = 10 , n = 20 , Nsim = 10000 , a = 1/ 7 , Fmin.observed = 0.16694 mins1 2 sk 2 maxs1 2 sk 2 F re q u e n cy 0.0 0.2 0.4 0.6 0.8 1.0 0 1 0 0 2 0 0 3 0 0 4 0 0 0 .0 0 2 1 0 .0 0 7 7 The above distribution is the null-distribution of the test statistic Fmin=min(s12,…,sk2)/max(s12, …,sk2). Small values of Fmin would indicate that there is evidence that the standard deviations are not the same. The distribution above is simulated while using the same variance for each group, thus it is the null-distribution. The Fmin.test shows a p-value of .0077< .01, less than half of that in the Levene test, thus more strongly indicating that the variability of output is not sufficiently homogeneous from worker to worker. Thus we should think about transforming the response variable output. f) g) ( e) ) k = 10 , n = 20 , Nsim = 10000 , a = 1/ 7 , Fmin.observed = 0.49235 mins1 2 sk 2 maxs1 2 sk 2 F re q u e n cy 0.0 0.2 0.4 0.6 0.8 1.0 0 5 0 1 0 0 1 5 0 2 0 0 0 .0 0 2 9 0 .8 8 3 Here the Fmin.test also shows no significance whatsoever. g) ( f) ) 0.0030 0.0032 0.0034 0.0036 0.0038 0.0040 0.0042 0 .0 0 0 1 5 0 .0 0 0 1 6 0 .0 0 0 1 7 0 .0 0 0 1 8 0 .0 0 0 1 9 0 .0 0 0 2 0 0 .0 0 0 2 1 Y ~ i s~ i slope 0.067 Y ~ 1 Y s~ sample standard deviation of 1/Y After transforming the data to 1/Y and plotting again si versus Xbari on a log-log scale we don’t see much of a strong positive linear trend anymore, visually comparing the size of the slope with the magnitude of the scatter around the line. h) The F-statistic in the transformed case is even more extreme than in the untransformed case, i.e., 132.91 > 118.46. The analysis on the transformed data is preferable since it does not suffer from the variability inhomogeneity from worker to worker. Homogeneous variability is one of the assumptions of a proper ANOVA. The code producing the above analyses and plots follows. It is assumed that the data frame worker.data was read into the work space as follows worker.data=read.csv(“workerdata.csv”,header=T). worker.data.analysis=function () { { y=worker.data[,2] worker=worker.data[,1] boxplot(y~worker,xlab="worker",ylab="part output per day") readline("hit return\n") svec=NULL mvec=NULL svec1=NULL mvec1=NULL d=NULL d1=NULL for(i in 1:10){ med=median(y[worker==i]) med1=median(1/y[worker==i]) d=c(d,abs(y[worker==i]-med)) d1=c(d1,abs(1/y[worker==i]-med1)) mvec=c(mvec,mean(y[worker==i])) svec=c(svec,sqrt(var(y[worker==i]))) mvec1=c(mvec1,mean(1/y[worker==i])) svec1=c(svec1,sqrt(var(1/y[worker==i]))) } out.Levene0=anova(lm(d~as.factor(worker))) out.Levene1=anova(lm(d1~as.factor(worker))) Fmin.test(k=10,n=20,a.recip=7,Fmin.observed=min(svec^2)/ max(svec^2)) readline("hit return\n") out.lm=lm(y~as.factor(worker)) out.anova=anova(out.lm) qqnorm(out.lm$resid) qqline(out.lm$resid) readline("hit return\n") plot(mvec,svec,log="xy",xlab=expression(bar(Y)[i]),ylab= expression(s[i])) out=lsfit(log10(mvec),log10(svec)) text(min(mvec),max(svec),substitute("slope"~alpha==alph,list(alp h= format(signif(out$coef[2],3)))),adj=0) abline(out) readline("hit return\n") plot(mvec1,svec1,log="xy",xlab=expression(bar(tilde(Y)) [i]),ylab= expression(tilde(s)[i])) out1=lsfit(log10(mvec1),log10(svec1)) text(min(mvec1),max(svec1),substitute("slope"~alpha==alph,list(a lph= format(signif(out1$coef[2],3)))),adj=0) abline(out1) text(median(mvec1),max(svec1),expression(tilde(Y)==1/Y),adj=0.5) text(median(mvec1),.97*max(svec1),expression( tilde(s)==~"sample standard deviation of 1/Y"),adj=0.5) readline("hit return\n") out.lm1=lm(1/y~as.factor(worker)) out.anova1=anova(out.lm1) qqnorm(out.lm1$resid) qqline(out.lm1$resid) readline("hit return\n") Fmin.test(k=10,n=20,a.recip=7,Fmin.observed=min(svec1^2)/ max(svec1^2)) readline("hit return\n") boxplot(1/y~worker,xlab="worker",ylab="days per part") list(out$coef,out1$coef,out.Levene0,out.Levene1,out.anova,out.an ova1) }
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved