Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Information Retrieval: Data vs. Information and Models for Information Retrieval, Study Guides, Projects, Research of Computer Science

This document, from oregon state university's cs419/519 information filtering and retrieval course, covers the differences between data retrieval and information retrieval, the traditional information need model, and various models for retrieval. Topics include query languages, relevance, and indexing. The document also discusses the boolean model and vector space model.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 08/30/2009

koofers-user-6p7
koofers-user-6p7 🇺🇸

10 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Information Retrieval: Data vs. Information and Models for Information Retrieval and more Study Guides, Projects, Research Computer Science in PDF only on Docsity! 1 CS419/519 Information Filtering and Retrieval Jon Herlocker Dept. of Computer Science Oregon State University Tu/Th Day 2 Review of Last Lecture • Tour of an research information retrieval system – Traditional IR search engine – Collaborative filtering recommendations – Crawling, pre-processing, and indexing – Personalization – User study Today • Project idea presentations • Introduction to IR • Assignments for next week Project Idea Presentations What is Information Retrieval? • How does it differ from databases (data retrieval)? Data Retrieval vs. Information Retrieval Data retrieval Information retrieval Content Data Information Data object Table Document, image, other Matching Exact match Partial match, best match Items wanted Matching Relevant Query language SQL(artificial) Natural Query specification Complete Incomplete Model Deterministic Probabilistic Highly structured Less structured Table by Xin Xao, Drexel University 2 Data Retrieval vs. Information Retrieval • Information retrieval solutions may incorporate data retrieval – Data retrieval as a subset of information retrieval • For this class, data retrieval alone is not interesting Next Models of information retrieval Traditional Information Need Model 1. User has an information need 2. Users forms a query 3. IR system makes best match with documents 4. User evaluates ranked documents 5. Is the need met? 6a. Yes -> done 6b. No -> reformulate query Some Assumptions of Traditional Model • It is possible for the user to specify their exact needs • Document texts are functionally equivalent to information needs – Essentially document retrieval • The information need remains constant throughout search process • The user will always recognize relevant documents (Belkin, Oddy, & Brooks) Relevance? • What does relevance mean? • A document might be relevant for many reasons – Answers a question with a fact – Gives part of an answer – Gives link to the answer – Gives related information • Relevance is subjective! – We’ll return to this discussion later Other Models • Anomalous States of Knowledge (ASKs) – (Belkin, Oddy, and Brooks) – A recognized anomaly in a user’s state of knowledge that the user is not able to specify specifically • Berry-Picking Model – (Bates 90) – Interesting information scattered like berries on bushes – The query is continually shifting 5 Boolean Query Language • Terms – Words – Phrases • Operators – AND – OR – NOT Example Boolean Queries • House • House AND Corvallis • House OR Corvallis • (House OR Condo) AND Corvallis • House AND Oregon AND NOT Eugene • (House OR Condo) AND Corvallis and NOT Eugene Rules for Boolean Logic • DeMorgan’s Law – NOT (A AND B) = (NOT A) OR (NOT B) – NOT(A OR B) = (NOT A) AND (NOT B) • Search for “Boolean Logic” if you want to know more Informal Pseudo Boolean Notation • Evolved in web search engines • +house +Corvallis – house AND Corvallis • +house +Oregon –Eugene – House and Oregon and NOT Eugene • House +Corvallis – ? Pseudo Boolean Notation • House +Corvallis – (Corvallis AND House) OR Corvallis • +House +Condo Corvallis Salem – (House and Condo and Corvallis) OR (House and Condo and Salem) OR (House and Condo) Ordering of Retrieved Documents • Pure Boolean has no order – All returned documents are equally relevant • In reality, different approaches can be taken – Chronologically – Order by number of times a specified term occurs – Other approaches – get further and further away from Boolean. 6 Boolean Searching • Upsides – Easy to implement – Simple queries are easy – Query language gives significant control over results • Downsides – Binary relevance decision • No ordering criteria • Usually too much or too little – Syntax can be complex Who Uses Boolean Searches • Everybody until about ten-fifteen years ago • Even now, many commercial systems (library catalogs, abstracting services, etc) Boolean model – Data Retrieval or Information Retrieval? • Very close – hard to distinguish • Differences – Enormous number of attributes – A document only has values for a few of those attributes – Inefficient to store and search using traditional data retrieval methods – Ordering may still be important Proximity operators • NEAR • WITHIN
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved