Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Question Answering in Information Retrieval: From Document Retrieval to Answer Retrieval -, Study notes of Computer Science

The evolution of question answering (qa) in information retrieval (ir) from document retrieval to answer retrieval. It covers the history of trec qa, the challenges of judging answers, and various approaches to answer selection. The document also touches upon the use of external knowledge bases and the complexity of qa systems.

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-lpt
koofers-user-lpt 🇺🇸

5

(1)

10 documents

1 / 22

Toggle sidebar

Related documents


Partial preview of the text

Download Question Answering in Information Retrieval: From Document Retrieval to Answer Retrieval - and more Study notes Computer Science in PDF only on Docsity! 1 Information Retrieval James Allan University of Massachusetts Amherst Question Answering CMPSCI 646 Fall 2007 All slides copyright © James Allan Question answering motivation • IR typically retrieves or works with documents – Find documents that are relevant – Group documents on the same topic • People often want a sentence fragment or phrase as the answer to their question – Who was the first man to set foot on the moon? – What is the moon made of? – How many members are in the U.S. Congress? – What is the dark side of the moon? CMPSCI 646 Copyright © James Allan • Move IR from document retrieval to answer retrieval – Document retrieval is still valuable – Extends breadth of active IR research 2 Some TREC History • QA begun in TREC-8 (’99) and was similar in 2000 • First focused on “factoid” questions from unrestricted domain – Now includes other classes of questions (definitions, lists, …) • Run against a large collection of newswire • Guaranteed that answer exists in the collection • Return short text passage that contains and supports answer – 250- or 50-byte passages • Return 5 “answers” (passages) ranked by chance of CMPSCI 646 Copyright © James Allan having answer • Evaluation based on mean reciprocal rank of first correct answer Judgment issues • Correctness of answer not always obvious • Applied several rules to simplify problem • Lists of possible answers (“answer stuffing”) – Not considered correct even if correct answer in there • Answer had to be “responsive” – If “$500” was correct answer, than “500” was incorrect – If “5.5 billion” was correct, then “5 5 billion” was not • Ambiguous references refer to famous one “Wh t i th h i ht f th M tt h ?” th i th Al CMPSCI 646 Copyright © James Allan – a s e e g o e a er orn means e one n e ps – “What is the height of the Matterhorn at Disneyland?” is other 5 Main task • 500 questions – No “definition” questions (needed a pilot study first) – No answers required (49 of 500 ended up with no answer) – Taken from MSNsearch and AskJeeves logs donated in 2001 – Some spelling errors in questions corrected, but not all • When to stop: Is a misplaced apostrophe a spelling error? • Requirements on answers – Precisely one exact answer required (not five like before) – System must indicate confidence in answer – Could optionally submitted a justification string E l i i fid i h d i i CMPSCI 646 Copyright © James Allan • va uat on s con ence-we g te average prec s on – Rank answers to all questions by confidence TREC 2003 QA tasks • Main task (“factoid” question answering) – 413 questions posed against AQUAINT corpus – 54 runs from 25 groups (also did next two types) – Scored by fraction of responses that were correct (accuracy) • List task – 37 questions with no specification of how many answers in list • List the names of chewing gums • What Chinese provinces have a McDonald’s restaurant? – Scored by instance recall/precision and F1 measure • Definition task CMPSCI 646 Copyright © James Allan – 50 questions – Facet-based recall measure, length-based precision measure • Passages task – 250-byte extract containing answer or nil if none exists – 21 runs from 11 groups 6 General QA approach Close to traditional IR Find candidate passages T t Extract possible answers Rank answers Determine question type Question CMPSCI 646 Copyright © James Allan ex collection Answer(s) Key points for success • Good passage retrieval – QA included evaluation specifically on passage retrieval, too • Recognizing question type is critical – Requires having ability to recognize those entities • Some sample entities that a system might find – person, organization, location, time, date, money, percent – duration, frequency, age, number, decimal, ordinal, equation – weight, length, temperature, angle, area, capacity, speed, rate – product, software, address, email, phone, fax, telex, www CMPSCI 646 Copyright © James Allan – subtypes (company, government agency, school) • Better performing systems almost always have better entity recognizers and large numbers of entity types 7 Passage retrieval • Not every system depends on this, but most do • Given query, find passages likely to contain answer • Most successful approaches use question patterns to find alternative ways to phrase things – To greatly increase recall • Start with a question and a known answer – When was Bill Clinton elected President? 1992 • Look for all occurrences of that answer and declarative form of question throughout text – Bill Clinton was elected president in 1992 – The election was won by Bill Clinton in 1992 CMPSCI 646 Copyright © James Allan – Clinton defeated Bush in 1992 – Clinton won the electoral college in 1992 • Extract patterns that occur frequently • Now more likely to be able to answer similar questions – When did George Bush become president? Query expansion? • Question expansion – Process that adds related words to a query – Improves recall – Relevant documents using slightly different vocabulary • Seems appropriate here and it does work • Difficulty is need for answer justification CMPSCI 646 Copyright © James Allan 10 Putting those all together • Want to estimate P(correct|Q,A) • They did this by a mixture model • Easy to look up values in tables built from training CMPSCI 646 Copyright © James Allan BBN’s use of the Web (TREC 2002 and 2003) • Several systems used the Web to help – Huge source of text that might answer question • BBN formed two queries – One rewrites the question into a declarative sentence – Another just uses the content-based words • Mine the returned snippets rather than pages (for efficiency) for candidate answers – Must be of correct type • Select best answer (next slide) CMPSCI 646 Copyright © James Allan • To get justification, find TREC document that contains the selected answer 11 Using Web (cont) • First approach just uses Web results and q-type • Second approach boosts scores that were also retrieved by non-Web approach in TREC corpus – P(correct|F,in-trec) – Clear from training data that h i th i TREC in-trec true CMPSCI 646 Copyright © James Allan av ng e answer n corpus provides useful information in-trec false How well did it all work? • Decent performance (middle of the pack) • Confidence scores are fairly good – Upper bound shows impact of perfect estimates • Using the Web made a huge difference CMPSCI 646 Copyright © James Allan • Validating in TREC corpus helped some 12 Some systems more complex • U.Waterloo (Canada) incorporates much more (TREC 2002) – Stores known facts in a database – Includes a corpus of trivia – Uses Web to find candidate answers • Provides numerous sources of evidence – Early answers require CMPSCI 646 Copyright © James Allan justification in corpus • Combines candidate answers Waterloo’s AnswerDB • Collection of tables with information on a bunch of topics CMPSCI 646 Copyright © James Allan 15 Use of Cyc knowledge base • Used for answer “sanity” checking • Have system generate answer and ask Cyc if answer seems reasonable • If answer ± 10% of Cyc’s best guess, then “sane” • Only helped once – What is population of Maryland? – Answer: 50,000 – Justification: “Maryland’s population is 50,000 and growing rapidly.” – Valid on the surface except that it had to do with something called CMPSCI 646 Copyright © James Allan , Nutria (a rodent raised for its fur), not people – Cyc knew answer was about 5.1million, so second best (though less highly ranked) answer was accepted because it was “sane” • Follow-up work has made better use of Cyc – Didn’t help at all in TREC 2003, though Top performing systems at TREC 2002 CMPSCI 646 Copyright © James Allan Impact of confidence weighting 16 Ability of systems to estimate confidence All right answers first CMPSCI 646 Copyright © James Allan All wrong answers first [Voorhees, TREC 2002] Definition task (TREC 2003) • Sample questions – Who is Colin Powell? – What is mold? • Drawn from search engine logs, so they’re “realistic” – 50 questions – 30 had a “person” as target (Vlad the Impaler, Ben Hur) – 10 had an organization (Freddie Mac, Bausch & Lomb) – 10 had something else (golden parachute, feng shui, TB) • Answer to a definition has an implicit context CMPSCI 646 Copyright © James Allan – Adult, native speaker of English, “average” reader of US news – Has come across a term they want more information about – Has some basic ideas already (e.g., Grant was a president) – Not looking for esoteric details 17 Judging definitions • Phase one: creating truth – Assessor created a list of information “nuggets” – Used own question research – Combined with judgments of submitted answers – Vital nuggets—those that must appear—selected • Phase two: judging – Look at each system response – Note where each nugget appeared – If nugget returned more than once, only one instance is counted CMPSCI 646 Copyright © James Allan Example judging • What is a golden parachute? CMPSCI 646 Copyright © James Allan 20 Results for definitions • Table shows results of definitions for β=5 • Also shows what different values of β do • Note how good sentence baseline does – Return all sentences that mention the target (e.g., “golden parachute”) – But reduce it slightly by eliminating sentences that overlap too much CMPSCI 646 Copyright © James Allan – Provided by BBN • Does best when recall is heavily weighted BBN’s results • Did okay except for about 10 questions • Several result of faulty assumption of target – What is Ph in Biology? • Assumed “Ph in Biology” was a object – Who is Akbar the Great? CMPSCI 646 Copyright © James Allan • Assumed “Great” was his last name • Some errors caused by redundancy checking – Ari Fleischer, Dole’s former spokesman who now works for Bush – Ari Fleischer, a Bush spokesman • This was redundant because of previous kernel 21 Final scores of systems (TREC 2003) • Three types: factoid, list, and definition • Final score is a linear combination ½ f t id– ac o score – ¼ list score – ¼ definition score • Doesn’t match balance of questions – 413 factoid – 37 list – 50 definitions CMPSCI 646 Copyright © James Allan • Reflects desire to force work on lists and definitions • But to keep factoid questions important Scores of top 15 systems (TREC 2003) CMPSCI 646 Copyright © James Allan 22 What about that top-performing system? • LCC (Language Computer Corporation) does a consistently great job at this task Very complex system with lots of AI like technology• - – Attempts to prove candidate answers from text – Lots of feedback loops – Lots of sanity-checking that can reject answers or require additional checking • Attempts to replicate results have failed – System is so complex it’s hard to know where to start CMPSCI 646 Copyright © James Allan – LCC is a company and probably isn’t telling us everything • Until their high-quality results are understood, they remain an outlier (albeit a really good outlier) Summary • Question answering is a hot area right now • Has been explored numerous times in the past P h ti i ?– er aps me was r pe • So far focus has been on simpler questions – “Factoid” questions, lists, definitions – TREC tries to make things more difficult each year • Part of an AQUAINT program looking at problem – Much more complex types of questions being explored in research program CMPSCI 646 Copyright © James Allan • Dialogue situations, cross-language, against rapidly changing data (so answer might change) • Some efforts require heavy knowledge bases (e.g., Cyc) • Exciting and active area of research at the moment
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved