Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Information Extraction from Text: Natural Disasters and John Cleese's Movie Characters, Exams of Computer Science

The concepts of information extraction (ie) through two distinct topics: natural disasters and the characters portrayed by john cleese in movies. The first part delves into the process of extracting relevant information from text about natural disasters, using earthquakes as an example. The second part focuses on extracting character information from movie scripts, using john cleese as the subject. The document also discusses the importance of ie systems and their applications in various fields.

Typology: Exams

Pre 2010

Uploaded on 08/30/2009

koofers-user-7l8-1
koofers-user-7l8-1 🇺🇸

10 documents

1 / 6

Toggle sidebar

Related documents


Partial preview of the text

Download Information Extraction from Text: Natural Disasters and John Cleese's Movie Characters and more Exams Computer Science in PDF only on Docsity! CS674 Natural Language Processing • Introduction to Information Extraction – Task definition – Evaluation – IE system architecture Monty Python & The Holy Grail Information needs • What was the name of the enchanter played by John Cleese in the movie “Monty Python and the Holy Grail”? – Ad-hoc IR / Google search – Question answering systems • Describe each character, including the actor who played him or her, in every movie starring John Cleese. Information extraction need motion-picture title: date: plot: characters: … movie-role actor: character: description: Describe each character, including the actor who played him or her, in every movie starring John Cleese. Information extraction Information Extraction System text collection Who: _____ What: _____ Where:_____ When: _____ How: _____ Who: _____ What: _____ Where:_____ When: ___ How: _____ Who: _____ What: _____ Where:_____ When: _____ How: _____ IE system: natural disasters Document no.: ABC19980530.1830.0342 Date/time: 05/30/1998 18:35:42.49 Disaster Type: earthquake •location: Afghanistan •date: today •magnitude: 6.9 •magnitude-confidence: high •epicenter: a remote part of the country •damage: •human-effect: •victim: Thousands of people •number: Thousands •outcome: dead •confidence: medium •confidence-marker: feared •physical-effect: •object: entire villages •outcome: damaged •confidence: medium •confidence-marker: Details now hard to come by / reports say PAKISTAN MAY BE PREPARING FOR ANOTHER TEST Thousands of people are feared dead following... (voice-over) ...a powerful earthquake that hit Afghanistan today. The quake registered 6.9 on the Richter scale, centered in a remote part of the country. (on camera) Details now hard to come by, but reports say entire villages were buried by the quake. SAN SALVADOR, 15 JAN 90 (ACAN-EFE) -- [TEXT] ARMANDO CALDERON SOL, PRESIDENT OF THE NATIONALIST REPUBLICAN ALLIANCE (ARENA), THE RULING SALVADORAN PARTY, TODAY CALLED FOR AN INVESTIGATION INTO ANY POSSIBLE CONNECTION BETWEEN THE MILITARY PERSONNEL IMPLICATED IN THE ASSASSINATION OF JESUIT PRIESTS. "IT IS SOMETHING SO HORRENDOUS, SO MONSTROUS, THAT WE MUST INVESTIGATE THE POSSIBILITY THAT THE FMLN (FARABUNDO MARTI NATIONAL LIBERATION FRONT) STAGED THIS ASSASSINATION TO DISCREDIT THE GOVERNMENT," CALDERON SOL SAID. SALVADORAN PRESIDENT ALFREDO CRISTIANI IMPLICATED FOUR OFFICERS, INCLUDING ONE COLONEL, AND FIVE MEMBERS OF THE ARMED FORCES IN THE ASSASSINATION OF SIX JESUIT PRIESTS AND TWO WOMEN ON 16 NOVEMBER AT THE CENTRAL AMERICAN UNIVERSITY. IE system: terrorism IE system: output 1. DATE - 15 JAN 90 2. LOCATION EL SALVADOR: CENTRAL AMERICAN UNIVERSITY 3. TYPE MURDER 4. STAGE OF EXECUTION ACCOMPLISHED 5. INCIDENT CATEGORY TERRORIST ACT 6. PERP: INDIVIDUAL ID "FOUR OFFICERS" "ONE COLONEL" "FIVE MEMBERS OF THE ARMED FORCES" 7. PERP: ORGANIZATION ID "ARMED FORCES", "FMLN" 8. PERP: CONFIDENCE REPORTED AS FACT 9. HUM TGT: DESCRIPTION "JESUIT PRIESTS" "WOMEN" 10. HUM TGT: TYPE CIVILIAN: "JESUIT PRIESTS" CIVILIAN: "WOMEN" 11. HUM TGT: NUMBER 6: "JESUIT PRIESTS" 2: "WOMEN" 12. EFFECT OF INCIDENT DEATH: "JESUIT PRIESTS" DEATH: "WOMEN" IE vs. IR vs. full NLU • IE requires more text-understanding capabilities than the bag-of-words approaches provided by IR techniques • IE systems often presume that a text categorization system has identified documents relevant to the extraction domain • IE requires more than document classification • IE requires a more shallow understanding of the text than a natural language understanding system attempting full/deep semantic analysis. IR, TC < IE < NLP, NLU • tension between domain-independent and domain- dependent language processing – treating task in a domain-independent way allows the use of general IR/NLP techniques and tools – treating task in a domain-dependent way allows for tailoring of techniques for better performance • IE is generally handled as domain-specific text understanding – key system components need to be re-built for each new domain – difficult and time-consuming to build if constructed manually • Initially, ~6-12 months/system for IE from unstructured text – requires the expertise of computational linguists Issues… Machine learning methods • acquire linguistic knowledge by applying statistical and symbolic learning methods; derive training examples from the texts themselves • automate the construction of each IE system component • improve robustness of final systems while maintaining (or at least approaching) the accuracies of handcrafted systems Information extraction • Introduction – Task definition – Evaluation – IE system architecture Natural disasters example disaster location damage IE patterns (syntactico-semantic) IE system components 4 Apr Dallas - Early last evening, a tornado swept through an area northwest of Dallas, causing extensive damage. Witnesses confirm that the twister... [ [ ] ] [ ] sentences, tokens det nconn v prep n adv prep ndet , n n-pl vv-prog adv . adv adj n det ncd n n - , part of speech taggingma tic class tagging time loc loc person named entity identification date timeloc loc subjectobject subject …inclu ing subj cts / objects(partial) pars ng np vg pp cor fer r solution Stages of processing 4 Apr Dallas - Early last evening, a tornado swept through an area northwest of Dallas, causing extensive damage. Witnesses confirm that the twister... Tokenization and Tagging Sentence Analysis Early/adv last/adj evening/noun/time ,/, a/det tornado/noun/weather swept/verb through/prep ... Early last evening adv phrase:time a tornado noun group/subj swept verb group through an area pp:loc northwest of Dallas adv phrase:loc causing verb group extensive damage. noun group/obj Stages of processing tornado swept tornado swept through an area area northwest of Dallas causing extensive damage Event: tornado Loc:“area” Loc: “northwest of Dallas” Damage Extraction Merging Template Generation Early last evening, a tornado swept through an area northwest of Dallas, causing extensive damage. Witnesses confirm that the twister...
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved