Computational Linguistics Lecture notes (professor Cristiano Chesi 2018) University of Siena - Master degree "language and Mind - Linguistics and Cognitive Studies" Generative Lexicon Structure of lexicon Two‐levels morphology: FiniteStateAutomata (FSA) and FiniteStateTransducers (FST) Error classification and Orthographic correction: T9 and Swipe Turing machine Syntactic Parsing TopDown and BottomUp parsing Algorithms Left Corner Algorithm Earley Algorithm +Lab instructions: Parsing

Tipologia: Appunti

2018/2019

Caricato il 05/08/2019

claudia-ruzza 🇮🇹

4.3

(7)

24 documenti

1 / 25

Spesso scaricati insieme

PARAFRASI COMPLETA DEL CANZONIERE DI PETRARCA

(102)

ANALISI E COMMENTO PETRARCA CANZONIERE

(12)

Psycholinguistics notes

(2)

Computational linguistics Lectures notes 4 - C. Chesi

(1)

Canzoniere Petrarca /tutte le poesie/

(1)

Computational linguistics Lectures notes 2 - C. Chesi

Computational linguistics Lectures notes 1 - C. Chesi

Psycholinguistics Notes

Documenti correlati

Informatica umanistica - Lazzari, Bianchi, Cadei, Chesi, Maffei

(5)

Scienza delle costruzioni Claudio Chesi - Seconda parte

(4)

Corpus Linguistics Lectures

(1)

EU Law (IPLE, UNIMI) full lectures and textbook notes

Scienza delle costruzioni Claudio Chesi - Prima parte

(4)

LECTURES OF THEORY IN CONTEMPORARY ARCHITECTURAL DESIGN

Complete notes Computational Linguistics 1 Linguistic computing Passarotti Part1

Computational linguistics Lectures notes 2 - C. Chesi

Computational linguistics Lectures notes 4 - C. Chesi

(1)

Computational linguistics Lectures notes 1 - C. Chesi

Computational linguistics

Computational Linguistics (parte1)

Computational Linguistics (parte2) (1)

Computational Linguistics: Processing Human Language with Computers

Introduzione a Python: linguaggio alto livello e computational linguistics

Strategy notes lectures

principles of roman law notes lectures 1-6

Understanding Suicide Notes: Insights from Forensic Linguistics

Riassunto finanza aziendale Chesini UNIVR

(6)

Modulo 1 English Linguistics

(1)

Public Finance and Public Policies - CLEF 2021/22. Notes of professor's lectures.

Neuroanatomy - 10 lectures

Anteprima parziale del testo

Scarica Computational linguistics Lectures notes 3 - C. Chesi e più Appunti in PDF di Linguistica solo su Docsity! 16.11.18 LEXICON, MORPHOLOGY… AND NON‐STANDARD ORTHOGRAPHY Essential references Jurafsky, D. & Martin, J. H. (2009) Speech and Language Processing. Prentice‐Hall. (2nd edition) http://www.cs.colorado.edu/~martin/slp.html Chapter 3 Extended references ●Koskenniemi, K. (1983) Two‐level morphology: A general computational model for word‐ form recognition and production. Helsinki ●Miller & al. (1993) Introduction to WordNet: An On‐line Lexical Database. ms. ●PustejovskyJ. (1995) The Generative Lexicon. MIT Press ●Levin B. (1993) English Verb Classes and Alternations. The University of Chicago Press. Index: Lexicon and Morphology ●Organizing lexical entries ●Two‐levels morphology ●Morphological analysis with FiniteStateAutomata (FSA) and FiniteStateTransducers (FST) ●Some simple application and psycholinguistic reality: stemming Input normalization ●Intro to orthographic correction ●Error classification ●Spell‐checking methods ●T9 and Swipe Generative lexicon Include in the lexicon any inflected word as independent and «atomic»? ●It will be inefficient: GL was initially developed as a theoretical framework for encoding selectional knowledge in natural language. This in turn required making some changes in the formal rules of representation and composition. Perhaps the most controversial aspect of GL has been the manner in which lexically encoded knowledge is exploited in the construction of interpretations for linguistic utterances. The computational resources available to a lexical item within this theory consist of the following four levels: 1. Lexical Typing Structure: giving an explicit type for a word positioned within a type system for the language; 2. Argument Structure: specifying the number and nature of the arguments to a predicate; 3. Event Structure: defining the event type of the expression and any subeventual structure it may have; with subevents; 4. Qualia Structure: a structural differentiation of the predicative force for a lexical item. ●in Turkish (agglutinative language) there would be 600x106entries. In Finnish 107 ●It will be non‐informative: - No relation among lexical entries (alphabetic order is not interesting) - No processing hints (nouns = verbs?) A computational lexicon can be conceived as: ●Mental lexicon –the relation among lexical units are psycholinguistically plausible? ●Computational lexicon –is the lexical representation efficient? Rule of thumb: ●Lexical representation must be explicit and independent(with respect to the application that will use it) ●Global structure of lexical entries is as important as internal structure ●A lexicon must have a sufficient domain coverage (consider nearly 400.000 lexical entries) Computational Lexicon evaluation parameters: • Coverage(sufficient domain extension, and depth, also in terms of featural information) • Extensibility (how easy it is enriching the lexicon?) • Utility (single application benefit) ! Remember: ●completeness does not ensure correctness (neither psycholinguisticnor computational) ●psycholinguistic plausibility does not guarantee computational effectiveness (and the way around) Single entry structure: ●orthographic/phonetic information ●morphology(inherent features like number, gendered...) ●syntactic(POS and more fine grained features: mass/countable, animacy, selection…) ●semantic(semantic relations, useful information for Machine Translation) CASA (“house”) <C,A,S,A> {N, sing, fem ...} {N com ...} [house] E.g. XML coding for the “casa” («house») entry: <word cat=“noun" subcat="common.countable" num="sg" gen="f" sem="c12"> casa </word> Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. Morphology –the theoretical model We have a surface form (token) and then we want to retrieve the lexical form (lemma). There are two levels of morphology: surface and deep form (lexical). Goal: recognizing a well‐formed string and decompose it in morphemes Theoretical model: Recognize lexicon tools: FSA and 2 level morphology Morphological analysis with FSA An FSA can be used for recognizing or generating a lexical item, but also for representing the Lexicon. FSA recognizing casa and its plural Morphological analysis FSA and two‐level morphology FSA limits no memory: it is not possible toassociatea structural description to an element recognized as belong ing to the lexicon, simple FSAs are not sufficient (since it does not exist an external memory, there is no way to keep track of the derivation). Koskenniemi(83) two_level morphology: a lexical leveland a superficial onethat must be put in a specific relation one with the other. We use Finite‐State Transducers(FST) to do so. Morphological analysis with FSTs Finite‐State Transducers(FST, o Transducers) <Q, Σ, q0, F, δ>: ●Σ= finite, non_null alphabet, input special (complex) chars of the form i:o where iare symbols of t he input alphabetI and o are symbols of the output alphabet O. Σ Є IxO. ε (the null element) can be included both in I and in O ●δ= is defined as (q , q', i: o) and it represents a transition matrix putting in relation a state q(start) with a state q′ (arrival) if the i: o relation is defined. δ is then a relation from Q x Σ to Q → FSAs define a formal language (set of strings); → FSTs define relations among languages. To see transitions we have a combination of 2 FSA that results in TRANSDUCERS (FST) Problems of FST is the non-determinism: we process a word like Martino as a diminutive form and not as a noun. FSTs can be used as recognizers, generators, translators, correlators among sets. Some formal property of the FSTs: • Inversion: if we have a surface, we can entry an input and get a plural or vice versa. Flip input with output (PC-KIMMO) allows us to do this type of generation (defined as T‐1): input and output labels can be inverted • Composition: recognizes something and it is able to connect it to other words. Problems with Italian: there is a default case but also a lot of exceptions. if T1 maps I1 to O1 and T2 transduces from I2 to O2, T1○T2 maps I1 into O2. PRIMING: show a word and see how fast we process that word (pre-actualization of lexical items). Things are stored in our brain and we process them together. Errors are processed because we recognize the wrong inflection. Example of inflectional morphology approach: define a FST describing plural inflection in Italian. ●Problem representation examples: casa > case; donna > donne; gatto> gatti; ago > aghi; sacco> sacchi... ●Generalizations/intuitions feminine nouns get as plural inflection «e», «i» for masculine. C and g become ch and gh respectively. ●Formalization regular case: masculine noun > @:@ c|g|@:ch|gh|@ o:i feminine noun > @:@ c|g|@:ch|gh|@ a:e irregular case: uomo> @:@ o:i #:n #:i ●Implementation feminine noun > @:@ c|g|@:ch|gh|@ e:a #:ε #:+N #:+PL e.g.case ‐> casa +N +PL (c:ca:a s:s e:a #:ε #:+N #:+PL) Approximation of a FST for describing plural morphology in Italian: Inadequacy of FSTs (and FSA) to express any morphological phenomena There are languages that present morphological derivations more complex than the one described. Such phenomena fall in the class of what we call non‐concatenative morphology - Tagalog(Philippine dialect) uses infixes in the middle of a word: um(marks the agent) + hingi(«lend») = h‐um‐ingi(«lend to someone») - Semitic languages, template morphology: consonant roots (CCC) lmd(«learn») + inflection by vocalic schemes (CVCVC) = lamad(«learned») lumad(«was learned») On inadequacy of FSTs (and FSA) Problems: ●Non‐determinism(multiple transition from the same state qmight be pursued; ε transitions) ●Inadequacy(e.g. non‐concatenative morphology) ●Order of application of FSAs Morphological analysis: some application - Information extraction (web, unstructured corpora/digital archives) - Keywords expansion: (hotels in Florence = (hotel AND Florence) OR (hotels AND Florence)) - Stemming: retrieving the word root (stem) we can refine queries and make them more tolerant - Porter Stemming Algorithm simple set of cascade FSTs like: ATIONAL ‐> ATE (e.g. relational ‐> relate) ING ‐> ε (talking ‐> talk) pros e cons: - Ipergeneralization(Krovetz93) e.g. organization > organ, generalization > generic, - Non captured exception : matrices > matrix or European > Europe. - Stemming is useful only with expansive research (no standard information retrieval) Morphological analysis: psycholinguistic plausibility How is the mental lexicon structured? ●full listing hypothesis_runs and run, are two lexical entries in the mental lexicon (no internal morphological structure) ●minimum redundancy_only morphemes are encoded in the mental lexicon; accessing an inflected lexical item requires accessing distinct morphemes and combination rules Evidence for a structured lexicon: ●Priming Effects (Stannersad al. 79) irregular inflections: happiness, happilyno priming with the root happyVs. regular inflections: pouring> pour Error correction in specific contexts: T9 Linguistic resources needed for T9 ●Dictionary ●Frequencies (e.g. typing 6_6, “ON” will be preferred to “NO”. This is determined on the b asis of statistical observations: in this case, the British National Corpus. The alternative choi ce is required about 5% of the time in English T9!) Non linguistic resources to evaluate the model efficiency ●Fitts’ Law (modeling rapid, goal‐directed movements) Results (in Words Per Minutes, wpm) ●Multi‐press:25‐27 wpm ●Two‐key:22‐25 wpm ●T9:41‐46 wpm Today’s key concepts What is a Computational Lexicon ●Single entry structure (morpho‐syntactic features) ●Global structure (Wordnet) How do we deal with morphological analysis ●Two‐level morphology and FST ●Some application (stemming) ●The psycholinguistic plausibility of the model Input normalization and spell‐checking ●Error classification ●Standard approach to spell correction (minimal distance, similarity keys, ngrams) ●The case of T9 29.11.18 SYNTACTIC PARSING Computability and complexity ●Space/time complexity ●Grammatical complexity ●Psycholinguistic complexity Parsing algorithms ●Exploring the problem space created by the grammar ●Main algorithms •top‐down Vs. bottom‐up •left‐corner •Dynamic programming and Earley algorithm References Essential references (http://www.ciscl.unisi.it/master/materials.htm) ●Jurafsky, D. & Martin, J. H. (2009) Speech and Language Processing. Prentice‐Hall. (2nd edition) http://www.cs.colorado.edu/~martin/slp.html Chapter 13, 15 Extended references ●BartonG.E., BerwickR. & RistadE.S. 1987. Computational Complexity and Natural Language. M IT PressShank ●Hale, J. T. (2011). What a rational parser would do.Cognitive Science,35(3), 399‐443. ●Van de KootH. The Computational Complexity of natural language recognition. Ms. University C ollege London Why having a computational model: -Predict possible dysfunctions -Calculate the complexity of certain processes A → B → C We have a process in a and then we pass through b to go to c but if it is impossible to pass through b we cannot then reach c. Assuming that we have a problem to solve: put in order some numbers, i.e. a sorting problem. Pick each element and put them in order. The complexity is proportional to the number of elements that we have and so to the number of comparisons we have to make. The general problem is that we have a baseline strategy: compare each number with all the number we have already stored. Another strategy is to divide the baseline so that we have to make less comparisons: this is what an algorithm makes. Assuming we have the same number, we must evaluate. We want to connect each number to another with the shortest path possible. To solve this problem, we have to make a lot of calculation, so this can be considered a complex problem which cannot have a clever strategy to solve it. What’s computable Informally speaking,a computation is a relation between an inputand an output. This relation can be defined by various algorithms: a series of computational states and transitionsamong them until the final state is reached. A computation attempts at reaching the final state through legal steps adm itted by the computational model (problem space = set of all possible states the computation can reach. Taking as an example the game tris: it shows us that there are may possibilities to complete the puzzle but every 9 moves the game is ended, called the time of a game: the more are the steps we have to take to end the game/problem, the long is the time of that problem and therefore its complexity increases. Turing‐Church thesis (simplified) Every computation realized by a physical device can be realized by means of an algorithm; if the physical device completes the computation in n_steps, the algorithm will take m_steps, with m_ differing from n_ by, at worst, a polynomial. Some algorithm might take too much time to find a solution (e.g. years or even centuries) and other algorithms can not even terminate! Turing machine The problem has an exponential time growth complexity function, but, once solved, can be readily proved: hard to solve, easy to verify! Quantified Boolean Formula (QBF) problem find a value assignment for all propositional letters satisfying the formula below : Qx1,Qx2…Qxn (x1,x2…xn) (with Q = ∀ or ∃ ) The problem is hard to be solved, as 3SAT, but also hard to be verified: (the 3SAT problem is a spe cial case of QBF where all Q are existential. The universal quantification requires any assignment of values to be verified. Complexity of classic problems and reducibility If a computer effectively solve a problem like 3SAT, it will use an algorithm that is, at worst, polyn omial. Because of the problem structure/space, such algorithm should be necessarily non‐deterministic. We call the complexity of this king of problems NP: Non‐deterministic Polynomial time Problem with complexity Pare deterministic and polynomial. Problems with an order P of complexity are (probably) included in problems with a NP complexity order (no proof of reducib ility from NP to P exists… yet). Computational complexity theory focuses on classifying computational problems according to their inherent difficulty, and relating these classes to each other. A computational problem is a task solved by a computer. A computation problem is solvable by mechanical application of mathematical steps, such as an algorithm. A problem is regarded as inherently difficult if its solution requires significant resources, whatever the algorithm used. The theory formalizes this intuition, by introducing mathematical models of computation to study these problems and quantifying their computational complexity, i.e., the amount of resources needed to solve them, such as time and storage. Other measures of complexity are also used, such as the amount of communication (used in communication complexity), the number of gates in a circuit (used in circuit complexity) and the number of processors (used in parallel computing). One of the roles of computational complexity theory is to determine the practical limits on what computers can and cannot do. The P versus NP problem, one of the seven Millenium Prize Problems, is dedicated to the field of computational complexity. Parsing Parsing → applying a function P (G, I), i.e. Parsing (Grammar, input) Given a Grammar Gand an input i, parsingimeans applying a function p(G, i)able to: 1.Accept/Reject i 2.Assign to ian adequate descriptive structure (e.g. syntactic tree) Universal Recognition Problem (URP) and reduction Universal Recognition Problem(URP) Given a Grammar G (in any grammatical framework) and a string x, x belongs to the language generable by G? Reduction is there any efficient mapping from this problem to a another well know problem for which we can easily evaluate the complexity? YES… SAT problem! URP is a generalized parsing problem that can be reduced to SAT in its core critical structure In a nutshell: a string x, as a proposizionalain a SAT formula, can receive an ambiguous value assig nment (for instance “vecchia”in Italian can both be a noun and an adjectival, while a can be true or false). We then need to keep the assignment coherent in x(to evaluate the correctnes s of the final outcome) as in a SAT formula. We conclude that URP is at least as complex as SAT, that is, NP‐hard! Per scoprire la classe di complessità di un problema si ricorre alla riduzione: si prende cioè un problema di cui si conosce già la complessità, si trova un mapping efficiente che trasformi ogni istanza del problema noto in un istanza del nuovo problema e che preservi i risultati richiesti. È questo il caso dell’Universal Recognition Problem (URP) che viene così formulato: Data una grammatica G (in qualsiasi framework grammaticale) e una stringa x, x appartiene al linguaggio generabile da G? L’URP è un problema di parsing generalizzato che può essere ridotto ad un problema come 3SAT intuizione di fondo: per una stringa x, come un proposizionale a in una formula di 3SAT, può esistere un’assegnazione ambigua di valori (ad esempio la stringa“vecchia” può essere sia nome che aggettivo, mentre il proposizionale a può essere vero o falso). Inoltre si può parlare di qualcosa come la verifica di accordo sia per la stringa x (tutte le occorrenze di x devono accordarsi in senso linguistico) che accordo tra i proposizionali in una formula 3SAT (inteso come consistenza dell’assegnazione di valori). Si deduce quindi che l’URP è almeno complesso quanto il 3SAT e che quindi è NP-completo. Barton, Berwick and Ristad (1987) focus on grammatical recognition problems that have the following form: Given a grammar G (in some grammatical framework) and a string α, is α in the language generated by G? In what follows we consider several recognition problems of this form. This kind of problem allows one to study the complexity of an entire class of grammars, namely those specified by some grammatical theory (like Government Binding theory or Generalized Phrase-Structure Grammar). For this reason, it is also known as the Universal Recognition Problem (URP) for a linguistic model Chomsky’s hierarchy and complexity Psycholinguistics complexity Complexity = difficulty in processing a sentence We can assume that complexity and difficulty can be considered as the same thing. There are many hypothesis. Hypothesis 1: formal complexity = psycholinguistic complexity ((Regular grammar -RG) context free grammar-CFG) In cfg we have many items: an, bn … Processing non context‐free structures causes major difficulties (Pullume Gazdar1982) Hypothesis 2: limited processing memory Limited_size Stack(Yngve1960) linguistic processing uses a stack to store partial analyses. The more partial phrases are stored in the stack, the harder the processing will be. Syntactic Prediction Locality Theory (SPLT, Gibson 1998)total memory load is proportional to the sum of required an integration + referentiality needs: 1. DPs required VPs (in SVO languages ): DP DPDPVP VPVP... Is harder than DP VP 2. pronounreferring to an already introduced referential entity is less complex than a new referent (pro < full DPs). The complexity of the URP for a linguistic theory is a direct measure of how difficult it is to parse the languages generated by the class of grammars specified by the theory. Parsing is the process of using a grammar to assign a syntactic structure to a string of words. It is widely assumed that parsing is an integral component of the computations carried out by language users and that apart from the constraints imposed by performance limitations, human 4 sentence parsing is extremely efficiently. To reduce 3SAT to the URP for some grammatical framework, we must first exhibit a mapping from 3SAT problem instances to sentences for recognition. In the proofs that follow, this is achieved simply by erasing the "Y" and "Z" symbols and the brackets in the 3SAT problem instance to produce a sentence for recognition. We then demonstrate how to construct a grammar in the particular linguistic theory that will generate such a mapped sentence iff the corresponding 3SAT instance is satisfiable. For the reduction to be any good, we must show that it can be carried out in polynomial time. What building the grammar boils down to is that one demonstrates how the linguistic framework allows one to realize the three components listed above. Bottom‐Up Parsing Algorithm Historically, the first parsing algorithm (Yngve55) and possibly the most common (e.g. in program ming languages parsers). It starts from lexical elements, that are terminal symbols, and, phrase by p hrase, up to S. What’better? All of them will give results but both are incomplete talking about direction. We assume that the parsing goes from left to right, but the algorithm can easily do that also from right to left. Top_Down_strategy doesn’t loose time generating ungrammatical trees, but it generates sentences without considering the input till the end. Bottom_Up strategy, will be locally consistent with the input, but it will generate ungrammatical phrases unable to be rejoined under the root node SS. Both blind strategies are complete, then roughly equivalent, but: a. Consider starting from the side with the most precise (unambiguous) information b. Explore the tree trying to be guided by the smallest possible ramification factor. A bottom-up approach is the piecing together of systems to give rise to more complex systems, thus making the original systems sub-systems of the emergent system. Bottom-up processing is a type of information processing based on incoming data from the environment to form a perception. From a cognitive psychology perspective, information enters the eyes in one direction (sensory input, or the "bottom"), and is then turned into an image by the brain that can be interpreted and recognized as a perception (output that is "built up" from processing to final cognition). In a bottom-up approach the individual base elements of the system are first specified in great detail. These elements are then linked together to form larger subsystems, which then in turn are linked, sometimes in many levels, until a complete top-level system is formed. This strategy often resembles a "seed" model, by which the beginnings are small but eventually grow in complexity and completeness. However, "organic strategies" may result in a tangle of elements and subsystems, developed in isolation and subject to local optimization as opposed to meeting a global purpose. LEFT CORNER algorithm Basic idea combination of a Top‐Down strategy, filtered by Bottom‐ Up considerations. Left‐corner rule Every non‐ terminal category will be rewritten at some point by a word in the input Then B if the «left‐corner» of the A category IFF A →* B → α. Off‐line table of left corner given a standard grammar: We can conclude that a good left corner for S is D. This prevents us from assuming that “pro → la” may not be an option. We filter our strategy by stating what can be put in the left corner or not, thus eliminating many options. Another example: the left corner for VP may be either a V or a Pro. Left corner will prevent us by spending time exploring branches that are completely meaningless. S ? S → DP VP DP → D NP 30.11.12 Unresolved problems Left‐recursion A →* Aα (es. DP → DP PP) how do we stop? Ambiguity ●PP attachment(I saw a man with the binocular) ●coordination(«papaverie paperirossi», red poppies and ducks) exponential growth of alternatives (Church e Patil82) with respect to the number of PPs (3 PPs up to 5 possible analyses, 6 PPs up to 469 possible analyses… 8 PPs … 4867 possible analyses!). Inefficiency in subtrees analysis backtracking is not needed in certain analysis: A flight from Rome to Milan at 7:00 PM with a Boeing747 1. NP → D N 2. NP → D N PP 3. NP → D N PP PP ➔ 4. NP → D N PP PP PP Even if the analysis 1. 2. 3. were good, they were incomplete at the beginning. The fourth one is the one that best fit, but we did the same analysis four times so we waisted our time applying the same analysis four times. Dynamic Programming Dynamic programming reuse useful analysis by storing them in tables (or charts). Once sub_problems are resolved (sub_trees in parsing), a global solution is attempted by merging partial solutions together. Left-corner parsing shares part of the approach with shift-reduce and part with topdown: Both existing elements (a word’s category or a constituent we have found) and predictions (based on an existing elements and the rules in the grammar) have a place on the stack. In this parser, we are going to use “slash notation” on the stack to represent both types of elements: • If an NP has previously been found and the grammar contains S → NP VP, we can predict that the NP may actually be the first part of an S whose VP has not been encountered yet. Therefore, on the stack, we can rewrite NP as S/VP. • Using the same logic, completed elements are elements in which nothing is missing: - Once the sentence’s VP is found, the stack will contain [S/VP VP], and the two elements can then be merged to obtain S/[], which in the case of binary grammars is sometimes written simply as S. - Word categories in the lexicon can be considered as minimal constituents not missing anything: If the next word in the sentence is cat, it can be added to the stack as N/[], or more simply as N.