Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Analysis of Library Subject Headings Understanding: One-Way ANOVA and Paired t-Test - Prof, Study notes of Information Technology

An analysis of a study conducted by prof. Karen markey on people's understanding of library subject headings. The study involved 308 participants who were asked to identify subject headings presented in various ways, including just the heading, with preceding and following headings, and in a book record. The data is presented using r code, which calculates the percentage of correct answers and performs statistical tests to determine if there is a significant difference in performance based on education level and order of presentation. The study also explores the interaction between verbosity and correctness.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-ytl-1
koofers-user-ytl-1 🇺🇸

10 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Analysis of Library Subject Headings Understanding: One-Way ANOVA and Paired t-Test - Prof and more Study notes Information Technology in PDF only on Docsity! ANOVA: analysis of variance Lada Adamic November 2, 2006 SI 544 1 one-way ANOVA First we will work with Prof. Karen Markey’s data on people’s understanding of library subject headings (for complete info see http://www.si.umich.edu/∼ylime/NewFiles/morekmd.html#Anchor-Subject-54980). I’ll be passing out some of the surveys for you to look at. The surveys were conducted at 3 Michigan public libraries. Each person taking the survey was asked to fill out demographic info such as age, sex, occupation, etc., as well as to try and write a description for 8 library subject headings: e.g. “Cattle ∼ United States ∼ Marketing”. All subjects at each library were asked to interpret the same 8 headings, for a total of 24 headings across all three libraries. There were three ways the subject headings were presented: just the heading itself, the heading along with the headings directly preceding and following them in alphabetical order, and the headings in a particular book (bibliographic) record. Sometimes the headings were kept in their original order, and sometimes they were rearranged to a recommended standardized order. # load the demographic information for each person taking the survey demog = read.table("oclcdemographics.txt",head=T,sep=’\t’) # load the results of the survey surveyresults = read.table("dataoclcpub.txt",head=T,sep=’\t’,strip.white=T) # calculate the percent subject headings interpreted correctly scorebysurvnum = tapply((surveyresults$correct == "c"),surveyresults$survynum,mean) # combine them into one data frame demogandscore=data.frame(demog,scorebysurvnum) # get the size of the dataset > dim(demogandscore) [1] 308 7 # attach it attach(demogandscore) #look at what we have > summary(demogandscore) survynum sex age libuse eductn profssn scorebysurvnum Min. : 1.00 f :205 Min. : 9.00 a: 19 a :52 student :128 Min. :0.0000 1st Qu.: 77.75 m :101 1st Qu.:14.00 b:117 b :80 retired : 21 1st Qu.:0.1250 Median :154.50 NA’s: 2 Median :18.00 c:119 c :30 homemaker: 15 Median :0.3750 Mean :154.50 Mean :28.31 d: 41 d :55 teacher : 10 Mean :0.3449 3rd Qu.:231.25 3rd Qu.:42.00 e: 12 e :79 clerk : 6 3rd Qu.:0.5000 1 Max. :308.00 Max. :74.00 NA’s:12 (Other) : 77 Max. :0.8750 NA’s : 5.00 NA’s : 51 NA’s :1.0000 In essence we have 308 people who took the survey, 205 of whom were female, 101 of whom were male. Their ages ranged from 9 to 74, they had varying levels of library use (a:daily, b:weekly, c:monthly, d: 2 to 3 times/yr, e: ¡ 2 times/yr), with most of them coming to the library on a weekly or monthly basis. There were a lot of students (128), followed by retirees and homemakers, and then a variety of other professions. The score, in terms of the proportion of subject headings which were correctly identified ranges from 0 (a person who got none of the headings correct) to 0.875 (a person who got 7 out of 8 of them correct). No one had gotten all the headings right. elementary JHS HS college BS 0. 0 0. 2 0. 4 0. 6 0. 8 pr op or tio n co rr ec t The first thing we’ll do is create boxplots for the scores grouped by the education level of the person, ranging from having completed elementary school to having a college degree. From the boxplot, we can see that high school kids (those who have completed JHS, so the JHS box and whisker plot) do about as well as those who have had some college education (the boxplot labeled ”college”). Now we’d like to test if any of these means is significantly different from any of the others. Which means that we will be doing an F-test for the null hypothesis that all the means are equal. > anova(lm(scorebysurvnum ~ eductn)) Analysis of Variance Table Response: scorebysurvnum Df Sum Sq Mean Sq F value Pr(>F) eductn 4 0.5786 0.1447 2.5262 0.04149 * Residuals 238 13.6290 0.0573 --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 We see that we can reject the hypothesis that all the means are equal at the 0.05 level. Great. But which off all the pairs of means are actually different? As we’ve discussed in class, we can’t go and t-test all pairs against each other, because our probability of committing a type-I error (rejecting the null hypothesis when it is true) goes up with each additional test we make. So we need to use a correction, that will multiply the p-value by a factor corresponding to the number of tests made. The Bonferroni adjustment multiplies all p-values, while the Holm method (applied in R by default), corrects the smallest p by the full number 2 Welch Two Sample t-test data: original$x and recommended$x t = 0.3432, df = 45.982, p-value = 0.733 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: -0.0955977 0.1348940 sample estimates: mean of x mean of y 0.3677402 0.3480920 Notice that neither the paired t-test nor the regular one (which is incorrectly applied here), give us significant differences in people’s average ability to interpret the subject heading correctly. But the paired t-test does give a lower p-value and a narrower confidence interval, which shows that is it superior. Why is this? Well, there is a lot of variation in the interpretation difficulty of each subject heading. The paired test keeps the variation due to question difficulty separate from the variation due to the two different ways of presenting the subject headings. I’ve used the aggregate() command to calculate the proportion of people who answered the question correctly in that order. I did this by creating a TRUE/FALSE vector with (justattempted$correct == “c”) and grouping by order and subject heading, and then taking the mean. mean() appears to treat boolean (TRUE/FALSE) vectors as 0/1 vectors, which means that we can average them. In any case, since standardizing headings to be in a prescribed (recommended) order did not seem to impact people’s ability (and inability) to interpret subject headings, one of the recommendations resulting from this study was to standardize. 2 two-way ANOVA We may wish to consider two variables (factors) simultaneously, and for this we would do a two-way ANOVA. The example I am using here is made up. Let’s pretend that there’s an exam question that asks students to describe in a paragraph what ANOVA is. The students fall in two categories - those who essentially know the correct answer, and those who don’t. Students also decide to write answers of different lengths, represented in the data by the variable “verbosity”. In grading, the instructor grades on both how good the explanation is (a good explanation may need to be lengthy), and whether it is correct. In fact, writing too much while not really having a clue may not necessarily improve the score. > anova(lm(scores~verbosity*correctness)) Analysis of Variance Table Response: scores Df Sum Sq Mean Sq F value Pr(>F) verbosity 1 2.44 2.44 1.8319 0.187553 correctness 1 418.77 418.77 314.1236 4.894e-16 *** verbosity:correctness 1 17.53 17.53 13.1481 0.001230 ** Residuals 26 34.66 1.33 --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 We’ve told R that the formula we are considering is verbosity*correctness, because we would like to consider not just the two factors individually, but also how the two factors interact. The interaction term is significant at the 0.01 level. What if we were to consider verbosity and correctness separately? > anova(lm(scores~verbosity+correctness)) Analysis of Variance Table 5 Response: scores Df Sum Sq Mean Sq F value Pr(>F) verbosity 1 2.44 2.44 1.2634 0.2709 correctness 1 418.77 418.77 216.6477 2.032e-14 *** Residuals 27 52.19 1.93 --- Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1 Note that we have a + rather than a multiplication sign, indicating that we believe that the total score depends separately on the verbosity and correctness. We can see that the residuals have increased (meaning that our model does not explain the data as well), and correspondingly the mean squared error as well. This is an indication that there is some interaction occurring between verbosity and correctness. We definitely want to include the interaction term as well, and the interaction plot will allow us to examine what is going on more clearly: 2 4 6 8 10 verbosity m ea n of s co re s 1 2 3 correctness 1 0 Aha! So the student keeps writing more and more, but if they don’t know the right answer, their score will actually decrease, because the instructor is increasingly certain that the student does not know the answer. On the other hand, if the student knows the correct answer and writes more, their score will increase (presumably because their answer is more complete - it’s a made up example anyway :) ). 6
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved