Download A freshman-level, rigorous, non-programming, computer ... and more Exercises Web Application Development in PDF only on Docsity! Lillian Lee - ACL TeachNLP Workshop, 2002 1 A freshman-level, rigorous, non-programming, computer-science intro to NLP, IR, & AI Lillian Lee Department of Computer Science Cornell University http://www.cs.cornell.edu/home/llee [This presentation was given at the 2002 ACL Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics, and hence was directed towards an NLP, rather than a computer science, audience.] Lillian Lee - ACL TeachNLP Workshop, 2002 2 Computation, Information, and Intelligence A new computer-science course on CL/NLP, IR, and AI (henceforth “NLP”). Three main design decisions: (1) For entering college freshmen. Usually junior+ courses (at least at Cornell). (2) A rigorous, technical focus, including recent research results. Not “Philosophy of AI” or “The Information Society”. (3) No programming. Neither required nor taught. Lillian Lee - ACL TeachNLP Workshop, 2002 5 “If you want to truly understand something, try to change it.” – Kurt Lewis The course format was fairly traditional. Course material was introduced almost entirely in lecture (no obvious textbook, research papers not suitable; lecture notes available this fall). Homework involved challenging pencil-and-paper problem sets. Problems typically investigated implications of lecture material rather than simply testing recall, e.g., students explored the consequences of changing definitions, assumptions, or settings. Exams were similar to the homework, but emphasized the basic concepts. Lillian Lee - ACL TeachNLP Workshop, 2002 6 Syllabus Outline “Knowledge without appropriate procedures for its use is [mute], and procedure without suitable knowledge is blind.” – Herb Simon, 1977 Computation: [15 lecs] Search; game-playing; perceptron and nearest-neighbor learning; the halting problem. Used later: graphs, inner products, Turing machines Information: • Document retrieval [3 lecs] • The World Wide Web [4 lecs] • Language structure [7 lecs] • Statistical NLP [6 lecs] Intelligence: • The Turing Test [2 lecs] Lillian Lee - ACL TeachNLP Workshop, 2002 7 Document Retrieval [3 lecs] IR was treated as a subfield of NLP using reduced models of language. • Tighter integration of the syllabus • Search engines a highly visible “NLP app”. Topics: Boolean query retrieval, indexing structures (arrays, B-trees, binary search), the vector space model, term weighting. Notes: inner products and related geometric notions were introduced in the previous perceptron unit. Lillian Lee - ACL TeachNLP Workshop, 2002 10 Statistical NLP [6 lecs] Explorations of sub-sentential, distributional language structure. Word counts, Zipf’s law, and Miller’s [1957] monkeys. Same type of argument as the rich-get-richer hyperlink power law derivation IBM-style statistical MT. Alignments : translations :: hubs : authorities. Japanese segmentation [Ando/Lee 2000]: more multilingual considerations The Federalist Papers [Mosteller/Wallace 1984]: historical applications Infant statistical segmentation learning [Saffran et al 1996]: cf. Ando/Lee Notes: The statistical paradigm was introduced in the unit on learning. Kevin Knight’s [1999] tutorial was very helpful. Lillian Lee - ACL TeachNLP Workshop, 2002 11 Statistical NLP [6 lecs] Explorations of sub-sentential, distributional language structure. Word counts, Zipf’s law, and Miller’s [1957] monkeys. Same type of argument as the rich-get-richer hyperlink power law derivation IBM-style statistical MT. Alignments : translations :: hubs : authorities. ...plus other topics ... Notes: The statistical paradigm was introduced in the unit on learning. Kevin Knight’s [1999] tutorial was very helpful. Lillian Lee - ACL TeachNLP Workshop, 2002 12 Statistical NLP (cont) Japanese segmentation [Ando/Lee 2000]: more multilingual considerations The Federalist Papers [Mosteller/Wallace 1984]: historical applications Infant statistical segmentation learning [Saffran et al 1996]: cf. Ando/Lee