Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Understanding Precision and Recall in Information Retrieval Systems, Study notes of Computer Science

An overview of precision and recall, two essential metrics used to evaluate the performance of information retrieval systems. How precision and recall are calculated, their significance, and their relationship. It also discusses the importance of these metrics in the context of user experience and system usability.

Typology: Study notes

Pre 2010

Uploaded on 08/30/2009

koofers-user-2vp
koofers-user-2vp 🇺🇸

10 documents

1 / 11

Toggle sidebar

Related documents


Partial preview of the text

Download Understanding Precision and Recall in Information Retrieval Systems and more Study notes Computer Science in PDF only on Docsity! 1 Performance Evaluation • How do you evaluate the performance of an information retrieval system? – Or compare two different systems? Cleverdon – The Cranfield Experiments 1950s/1960s • Time Lag. The interval between the demand being made and the answer being given. • Presentation. The physical form of the output. • User effort. The effort, intellectual or physical, demanded of the user. • Recall. The ability of the system to present all relevant documents. • Precision. The ability of the system to withhold non-relevant documents. Cleverdon says Precision & Recall • Measure ability to find the relevant information • If a system can’t identify relevant information, what use is it? Why not the others? • According to Cleverdon: – Time lag • a function of hardware – Presentation • successful if the user can read and understand the list of references returned – User effort • can be measured with a straightforward examination of a small number of cases. In Reality • Need to consider the user task carefully – Cleverdon was focusing on batch interfaces – Interactive browsing interfaces very significant (Turpin & Hersh) • Interactive systems – User effort & presentation very important In Spite of That • Precision & Recall – extensively evaluated • Usability – not so much 2 Why Not Usability • Usability requires a user-study – Every new feature needs a new study (expensive) – High variance – many confounding factors • Offline analysis of accuracy – Once a dataset is found • Easy to control factors • Repeatable • Automatic • Free – If the system isn’t accurate, it isn’t going to be usable Precision & Recall • For a query, documents are relevant or not – The Relevant Set – The Non-Relevant Set • A system retrieves a set of documents – The Selected Set – The Non-selected Set • Relevance is binary! – Either a document is valuable or not, nothing in between Precision and Recall Selected Not Selected Total Relevant Nrs Nrn Nr Irrelevant Nis Nin Ni Total Ns Nn N Precision • Percentage of documents selected that are relevant • Probability that a selected document is relevant • How well does the system filter out non-relevant stuff s rs N N P = Recall • Percentage of all relevant documents selected • Probability that a given relevant document will be retrieved • How complete are the results? r r s N N R = How Big is the Selected Set? • 1? 5? 10? 100? 5 A TREC Conference • Split into tracks – Each track evaluates a different user task • 2003 Tracks – Cross-language – Filtering – Genome – High accuracy – Interactive – Novelty – Question-Answering – Robust retrieval – Digital video – Web TREC Procedure (for the ad-hoc task) • Each participant is provided – information requests, known as topics – Document collection • Each participant must then – Convert information requests to queries – Return ranked results • NIST then – Identifies the relevant documents (using pooling) – Reports the precision and recall of each participating system • Participants – Write a paper describing their system and analyzing their results – Present the paper at the TREC workshop Example Topic From TREC-7 ad-hoc track <top> <num> Number: 363 <title> transportation tunnel disasters <desc> Description: What disasters have occurred in tunnels used for transportation? <narr> Narrative: A relevant document identifies a disaster in a tunnel used for trains, motor vehicles, or people. Wind tunnels and tunnels used for wiring, sewage, water, oil, etc. are not relevant. The cause of the problem may be fire, earthquake, flood, or explosion and can be accidental or planned. Documents that discuss tunnel disasters occurring during construction of a tunnel are relevant if lives were threatened. </top> Documents Used in TREC • Wall Street Journal, Federal Register, Associated Press, Department of Energy abstracts, Financial Times, Congressional Record, Foreign Broadcast Information Service, Los Angeles Times, .GOV web documents, references from MEDLINE Obtaining TREC Data • Not Free • Must sign data use agreement 6 TREC 2003 Web Track Data – 13GB, 1.25 Million Home page finding – Topic distillation • relevant home pages, given a broad query • R precision (precision at N, where N is the number of relevant documents) – Navigational Task • Named page location, e.g. find the “USDA home page” • 150 queries were for home pages, 150 were not. • Mean reciprocal rank, proportion of queries where answer appears in top-10 Mean Reciprocal Rank • Reciprocal of the rank of the first relevant document returned • If first relevant document is in third position: – Reciprocal rank = 1/3 • Mean across all queries Top Web Track Performers CSIRO Hummingbird Univ. Amsterdam Copernic Univ. Sunderland Univ. Glasgow Neuchatel Univ. CSIRO Documents scored via a linear combination of link in- degree, anchortext propagation, URL Length and BM25. Linear combination (and BM25 parameters) were tuned using a home page finding query set (same tuning as navigational csiro03ki02). Stemming improved R-Precision by a further 0.0198. Hummingbird Documents were given additional weight if their URL looked like a homepage URL, and also based on query word/phrase occurrences in HTML markup such as title. There was no use of link counts or anchor text. Stemming had little effect. UAmsterdam Used different representations and retrieval models. Okapi worked well on documents, titles and anchors. Language modelling worked very well on anchors and less well on documents and titles. Anchor text was important. Snowball stemming was used in all runs. Copernic URL information was important (length and presence of query terms). Representations were each treated differently and included documents, extracted summaries, text with formatting, URL and title. First results were from a boolean AND query, followed by OR results. Porter stemming was used in all runs. USunderland Used a novel document representation based on automatically assigned word senses as opposed to terms. The ranking algorithm consisted of a variation of Kleinberg’s model of hubs and authorities in association with a number of vector space techniques including TF*IDF, and Cosine Similarity. Taken from the TREC 2003 report. What worked? • Referring anchor text • Stemming. • URL information and link structure Other Data Sets 7 CACM • Articles about computer science research and practice • Includes metadata – Authors – Dates – Subject Categories – Bibliographic linkage and co- citation ISI • Documents about information science • Metadata – Bibliographic co-citations Bibliographic Citation Information in CACM • Direct references – A referenced B • Bibliographic coupling – [a, b, n] (n documents were cited by both a and b) – Indicates potential similarity • Co-citations – [a, b, n] (a and b appear together in a bibliography in n different documents) Many Other Datasets of Interest • Many other datasets are available besides the ones mentioned – Most don’t have relevance data • Others – EachMovie – collaborative filtering – Machine Learning Repository at U. of California – Irvine. • Includes a SPAM dataset • Most don’t have text Query Language Features • Important to understand the different query capabilities of the popular search engines Query Elements • Words (index terms) – Generally automatically extracted – What defines a word? • Separated by spaces? • What about other characters? • Contractions? • Hyphens? 10 Evaluating Relevance Feedback • Need to do a user experiment – Relevant documents selected by user depend on the original search results • Precision and Recall – Suppose you measure P&R of the results from the expanded query? Prec. & Recall with Relevance Feedback • User provides initial query • System provides initial results (say 10) • User identifies two relevant documents at ranks 8 and 10 • System provides updated results based on R. F. • The relevant documents that were originally at ranks 8 and 10 are now at ranks 1 and 2 – An easy cheat! Precision and Recall & Relevance Feedback • Residual collection precision and recall – Compute precision and recall after removing all documents used as feedback. • Consider the user interaction – Could compare relevance feedback to manual query reformulation Fully Automatic Query Expansion • Relevance feedback requires user input – User provides examples of relevant and non-relevant documents – System extracts words from those documents and adds them or removes them from the query • What can we do without user input? – Statistical thesaurus expansion Consider • User searches for “cat” • Won’t see documents that use the word “feline” instead of cat – Thus terms are not independent • A potentially better search – “cat OR feline” • If you had a thesaurus... – Look up each query term – Add all the synonyms to the query Construct a Thesaurus • May not have a useful thesaurus available – Research has shown that generic thesauri don’t fare well • Lose too much precision • Need a thesaurus specific to the topics in your query • Can you construct one? 11 Use Document Collection to Construct Thesaurus • Analyze the text in the document store • Global – Build a thesaurus from all documents • Local – First retrieve documents based on initial query – Build a thesaurus from those documents • Expand query using thesaurus – Rerun query Co-occurrences • If two words frequently co- occur – They might be related • For example – Any document that mentions “feline” frequently will include mention “cat” frequently Association Clusters ∑ ∈ = lj vu Dd jkjkvu ffc ,,, Correlation Document All Retrieved Documents Frequency of ku in document j
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved