Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

History and Applications of Information Retrieval in Web Systems - Prof. Luo Si, Study notes of Computer Science

An overview of the history and development of the web, focusing on its role as a platform for information retrieval. Topics include the origins of arpanet and nsfnet, the growth of the web, and various applications such as information retrieval, web services, and multimedia retrieval. The document also discusses the challenges of dealing with unstructured data and the importance of effective query representation and retrieval models.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-ncz
koofers-user-ncz 🇺🇸

10 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download History and Applications of Information Retrieval in Web Systems - Prof. Luo Si and more Study notes Computer Science in PDF only on Docsity! CS490W: Web Information Systems CS-490W Web Information Systems Luo Si Department of Computer Science Purdue University Overview Web Web is a young but BIG thing…. ARPAnet ARPA (Advanced Research Projects Agency) was created (1957) to outrun the Russians in the race for mastering rocket launching. In 1969, ARPA decided to link sensitive computer centers by a network in order to withstand a possible nuclear attack. The idea was to allow centers to communicate even after a center is destroyed. It connected government labs, major research centers and universities. It existed until 1988 and dismantled in 1990 Backbone Network speed: 64Kbits/second Applications: TCP/IP, Domain Name Service, E-mail, FTP, Telnet… Web Web is a young but BIG thing…. NSFnet DARPA still keeps its own network but the original ARPAnet was integrated into the current Internet. The National Science Foundation in the USA funded the NSFnet which was created in 1985. Backbone Network speed: T1 (1.5mb/sec.) to T3 (45mb/sec.) It originally connected 5 major universities with supercomputer centers, but rapidly included other universities, research centers and private companies. Replaced ARPAnet as the backbone of Internet in 1990 Web Web is a young but BIG thing…. Next Generation of Internet The National Science Foundation announced the initiative GENI (Global Environment for Networking Innovations) It is expected to take five to seven years to build Higher speed, more capacity (optical network) Better support for applications More intelligent Better security ….. Web: Growth of the Web “… The world produces between 1 and 2 exabytes (1018 bytes) of unique information per year, which is roughly 250 megabytes for every man, woman, and child on earth. …“ (Lyman & Hal 03) Web Web opened the door for many important applications Information Retrieval – Web Search – Information Recommendation by content or by collaborative information Web Services Semantic Web Web 2.0 XML ……………………….. Why Information Retrieval: Information Retrieval (IR) mainly studies unstructured data: Merrill Lynch estimates that more than 85 percent of all business information exists as unstructured data - commonly appearing in e- mails, memos, notes from call centers and support operations, news, user groups, chats, reports, … and Web pages. Text in Web pages or emails; image; audio; video; protein sequences.. Unstructured data: No structure: no primary key as in RDBMS Semantic meaning unknown: natural language processing systems try to find the meaning in the unstructured text IR vs. RDBMS Relational Database Management Systems (RDBMS): Semantics of each object are well defined Complex query languages (e.g., SQL) Exact retrieval for what you ask Emphasis on efficiency Information Retrieval (IR): Semantics of object are subjective, not well defined Usually simple query languages (e.g., natural language query) You should get what you want, even the query is bad Effectiveness is primary issue, although efficiency is important IR and other disciplines Information Retrieval Machine Learning Pattern Recognition Statistical Learning Natural Language Processing Image Understanding Theory Deep Analysis Information Extraction Text Mining Database Data Mining Library & Info Science Security System Bioinfo rmatic s Visualization Applications System Support Medica l inform atics Some core concepts of IR Information Need Retrieval Model Representation Query Indexed Objects Retrieved Objects Representation Returned Results Evaluation/Feedback Some core concepts of IR Multiple Representation Text Summarizations for retrieved results IR Applications: Citation/Link Analysis Citation/Link : importance IR Applications: Multimedia Retrieval Query Pictures Feature Extraction Feature Extraction Retrieval Model Color Histogram Wavelet… IR Applications: Information Visualization Partial Structure of pages from a Web subset visualized by Mapuccino Grading Policy: Assignments: 30% Project: 30% Final exam: 30% Class attendance: 10% Grading Policy: Assignments (30%): Algorithm design and implementation (about 3 assignments) Implement and improve common retrieval algorithms Create and compare algorithms for information retrieval applications (web page/email spam classification and recommendation system) Late submission 90% credit for next two days, 50% afterwards You may help each other by discussion (please indicate so in the submission), but copying/cheating may result in 0 credit It is safe to start early… Grading Policy: Project (30%): Goal Show your knowledge and creative ideas on real applications Leading to research report/publication (optional) Topics Suggested by the lecturer or any related topic proposed by you Project progress Project proposal Project final report and presentation Grading Policy: Test(s) (30%): One or two tests? In class or not? Based on lecture contents (more) and required reading materials (less) Review session Attendance (10%): Be interactive: the best way to learn is to ask questions Insightful questions/suggestion gives extra credit Support System: Course web page: http://www.cs.purdue.edu/homes/lsi/CS490W_Fall_07/CS490W.html Schedule, slides, reading materials, assignments, etc. Textbook: Introduction to Information Retrieval (Manning, C.; Raghavan, P.; Schütze, H. Cambridge University Press (2008). Online free version Other readings: on the course web page Office hour: Wednesday 2:00-3:00 PM or reach me by: lsi@cs.purdue.edu Course Description: The Goal Learn the techniques behind Web search engines, E-commerce recommendation systems, etc. Get hands on project experience by developing real- world applications, such as building a small-scale Web search engine, a Web page management system, or a movie recommendation system. Learn tools and techniques to do research in the area of information retrieval or text mining. Lead to the amazing job opportunities in Search Technology and E-commerce companies such as Google, Microsoft, Yahoo! and Amazon. Lecture Review: Core concepts of information retrieval Query representation; document representation; retrieval model; evaluation Applications of information retrieval Web Search; Text Categorization; Document Clustering; Information Recommendation; Information Extraction; Question Answering….. Grade Policy Assignments: 30%; Project: 30%; Final Exam: 30%; Class attendance: 10%
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved