Scarica Computational linguistics Lectures notes 4 - C. Chesi e più Appunti in PDF di Linguistica solo su Docsity! 14.12.18 ADVANCED PARSING: FROM RULES, TO P&P… AND MINIMALISM Essential references ●Stabler E. (1997) Derivational minimalism. in Retoré, ed. Logical Aspects of Computational. Lin guistics. Springer Extended references ●Chesi C.(2015) On directionality of phrase structure building.Journal of Psycholinguistic Research ●Fong S.(1991) Computational Properties of Principle_Based Grammatical Theories.tesiPh.D. MIT Index • Principle and Parameters Parsers • Minimalist Grammars • Phase‐based Minimalist Grammar • Parsing with PMGs From Rules to Principles and Parameters Rules Principle & Parameters (P&P) Language specific linguistic universals + parameters settings P&P aims at a better explicative adequacy (other than descriptive) Goal: linguistic universals capture the limited syntactic variability across languages Principle_based parsers(Barton 1984, Berwick e Fong 90, Stabler 92) are inspired by this intuition: ●Grammatical principles are parser axioms ●The parser operates as a deductive system inferring grammatical structures applying the axioms to the input Advanced parsing Hierarchical properties will allow us to predict certain problems, i.e. where the subject should be put. The idea of using PP in a parsing process is similar to a deductive system. Apply rule by rule to derive a sentence. Problematic in a logic point of view because principles are so general that we have to compile them to derive a grammar. . Few principles + Few parameters = Thousands of rules Parsing transformation and we create a focalized sentence. A bunched of rules will deal with any possible relation with a basic declarative sentence and any variation of this sentence. Few principles and by the combination and relation among them we can generate variation of rules. ➔ The previous approach Slobbing, i.e. the derivation theory of complexity (DTC). P&P T model We have a surface structure SS, the sentence we pronunciate, that can be seen from two different point of view: the phonetical part (Phonetic form PF) and the meaning (Logic Form LF) this sentence has. Searching deeper, we find the Deep Structure DS, where we find the lexicon. XI theory → θ‐criterion every argument must receive one and only one thematic () role (and every thematic role is assigned to just one argument) → Case filter every lexical NP must receive case (P e Vfiniteare case assigners) Generators = principles producing more structures than the ones in input: ●Move α ●Free indexation ●... Filters = principles selecting fewer structures than the ones received as input: ●X' theory ●θ‐criterion ●Case filter X-bar theory is a theory of syntactic category formation. It embodies two independent claims: one, that phrases may contain intermediate constituents projected from a head X; and two, that this system of projected constituency may be common to more than one category (e.g., N, V, A, P, etc.). The letter X is used to signify an arbitrary lexical category (part of speech); when analyzing a specific utterance, specific categories are assigned. Thus, the X may become an N for noun, a V for verb, an A for adjective, or a P for preposition. PMGs SBO: Merge Right (Phillips 1996) Proposal: Phase‐based MGs PMG: Sample derivation of wh‐question Structure building operations: (default) Expand(Lex: CPwh= (+wh +T +S V) ) Insert(Lex: (+wh +D N what)) Insert(Lex: (+T did)) Insert(Lex: (+S +D N you)) Insert(Lex: (V =DP =DP see)) Expand((V=DP)) Move((+D N you)) Expand((V=DP)) Move((+D N what)) Two more Structure Building Operations Internal Merge Successive Cyclic A'‐movement Parsing Algorithm with PMGs(PMG‐pa) Parsing States: CFG-Earley Vs. PMG-pa
Input position Input position
(e.g. the man e runs) (the man * runs)
Grammar rule Phase inspected
(e.g.S-> NP.VP) (e.g. Verbal)
Dot-position Constituent completion status
(eg.S>NPeVP) (is the phase headed? yes
are all thematic requirements satisfied? yes
are non-thematic dependencies licensed? yes
are non-thematic dependencies unique? yes
further non-thematic dependencies available? yes
status: potentially complete)
Leftmost edge of the substring this rule Constituent prefix
generates (e.g. the man runs)
(e.g. the man runs)
Memory buffer status
(e.g. one N(ominal), determined, potentially complete
phase)
Predict Phase Projection
top-down expansion of non-terminal rules in the new constituents insertion based on selection
grammar; features in the lexicon;
Result: new rules (NP, VP, PP, AP...) added Result: Rooted Trees (RTrees) decorated with empty
NP, VP or AP
Scan Move / Lexical insertion
bottom-up inspection of the lexicon given a word; unselected lexicalized sub-trees (moved LTrees) are
Result: PoS list to be integrated in the rules available in this list, plus lexicalized sub-trees
projected from a Top-Down inspection of the
processed word-token;
Result: ordered list (the pending list) of lexicalized
sub-trees (LTrees) to be integrated in the structure
Complete Merge
tap-down expansion of non-terminal rules in the unificatian algorithm among pending rooted
grammar structures (RTrees) and lexicalized sub-trees in the
pending list (LTrees)
Getting asymmetries with PMG-pa
Subject relatives in head initial languages
a.[The reporter who attacked the senator admitted the error.
b.|The reporter who the senator attacked admitted the error.
Wo= “the”
2. project(default): [,.], lu]
getMovedì): nothing
getPOSIthe): [pp the]
3. setAttachment(lyp ]): {yp]
setAttachment([ye ]): [y]
4. merge(lyp], Iupsothel): lun lupo thell
merge(luo 1, lu the]): [ue sn the]
5. movellyp [ue inthe]]): nothing
move([n, yy the]): nothing
The input and output are symbolic, everything else is unsymbolic. ●No explicit structure or structure building operations; ●System complexity (and its apparent representations) is an emergent property simple interaction a mong parts Which problems? Problems that can hardly be decomposed in sub‐problems: ●Problems complex to be described ●Partial representation of problem space Problem of space: tic-tac-toe feasible but more complex games present a problem of space in representing all the steps. Most people rely on machine learning. ●Complex algorithmic solutions that ask for approximations ●Rules and/or heuristics hard to be defined ●High degree on interaction among level (multiple constraints) It is very hard to drive all the constraints in a linear way. Competence and Grammar: two definitions Symbolic (e.g. phrase structure grammar) static set of rules and/or principles (explicit representation of the competence) Sub‐symbolic (e.g. neural networks) Grammar is a processing device (implicit representation) reacting to contextual input; words are «o perators» smoothly moving the system from state to state It simply creates a processing device. (more procedural than contextually grammar) Symbolic perspective of the conjunction in logic. Approaching it from a sub symbolic point of view: representing it in a cartesian way, it is plenty of solutions to draw that. (the line can shift up and down from the point) The central nervous system The central nervous system (CNS) is the part of the nervous system consisting of the brain and spinal cord. A neuron, also known as a neurone (British spelling) and nerve cell, is an electrically excitable cell that receives, processes, and transmits information through electrical and chemical signals. These signals between neurons occur via specialized connections called synapses. Neurons can connect to each other to form neural pathways, and neural circuits. Neurons are the primary components of the central nervous system, which includes the brain and spinal cord, and of the peripheral nervous system, which comprises the autonomic nervous system and the somatic nervous system. A typical neuron consists of a cell body (soma), dendrites, and an axon. All neurons are electrically excitable, due to maintenance of voltage gradients across their membranes by means of metabolically driven ion pumps, which combine with ion channels embedded in the membrane to generate intracellular-versus-extracellular concentration differences of ions such as sodium, potassium, chloride, and calcium. Changes in the cross-membrane voltage can alter the function of voltage-dependent ion channels. If the voltage changes by a large enough amount, an all-or-none electrochemical pulse called an action potential is generated and this change in cross-membrane potential travels rapidly along the cell's axon, and activates synaptic connections with other cells when it arrives. Synaptic transmission: ➔ Electric(cytoplasm continuity between neurons, direct ions transmission flow, no delay, bidi rectional communication) ➔ Chemical(pre‐synaptic vesicles active membrane sites + post‐ synaptic receptors, chemical transmitters, 0,3‐5 msdelay or more, mono‐directional) Neuron typology The learning activity depends on the number of connections that the neurons establish with each other. Artificial neurons Synaptic transmission is the biological process by which a neuron communicates with a target cell across a synapse. Chemical synaptic transmission involves the release of a neurotransmitter from the pre-synaptic neuron, and neurotransmitter binding to specific post-synaptic receptors. Electrical synapse transmission involves the transfer of electrical signals through gap junctions. Structural classification Polarity Different kinds of neurons: 1 Unipolar neuron 2 Bipolar neuron 3 Multipolar neuron 4 Pseudounipolar neuron Most neurons can be anatomically characterized as: Unipolar: only 1 process Bipolar: 1 axon and 1 dendrite Multipolar: 1 axon and 2 or more dendrites Golgi I: neurons with long-projecting axonal processes; examples are pyramidal cells, Purkinje cells, and anterior horn cells. Golgi II: neurons whose axonal process projects locally; the best example is the granule cell. Fundamental idea: simple processing units linked by weighted connections. Their interaction might be extremely complex (emergent property); brain‐like with respect to serial computers. An «unsolvable» logical problem (Minsky & Papert 69) Learning in ANN There are two ways of learning: The supervised one and the unsupervised one (no explicit teaching). ➔ Supervised learning: we inform the network when the output is wrong or correct ➔ Unsupervised learning: implicit learning, no information on the given/expected output ➔ Hebbian learning (Hebb 49) “When an axon of cell A is near enough to excite a cell B and repeatedly or persistently tak es part in firing it, some growth process or metabolic change takes place in one or both cells such that A's efficiency, as one of the cells firing B, is increased” (Hebb 1949:62) ➔ Perceptron Convergence Procedure (PCP, Rosenblatt 59) Calculate the difference between realand expected output. Minimize error, rewarding or punishing weights on incoming connections Back Propagation (simplified) Proportional redistribution of the errors back‐ward, layer by layer, up to the input nodes How to select the best architecture? Heuristics are the best hint (Rumelhart& McLelland1982) From a purely formal point of view, 3 layers can solve any problems (Hornik, Stinhcombe& White 1989), but how many neurons and which connection pattern? Coding input (and output) Information bits > number of input tokens (distributed coding) e.g. using a binary coding, we can use 2 bits, for representing 4 elements (a, b, c, d) that is, 2 input neurons (a=00, b=01, c=10, d=11) No similarity among inputs, orthogonal input (localist coding) 1 word = 1 node e.g. 4 input units: a (0001) b (0010) c (0100) . (1000) Learning linguistic properties Past tense (Rumelhart& McClelland, 86) Clear linguistic pattern: • phase 1: few high frequency verbs (children leaned few crystallized forms) • phase 2: over_regularity (break > breaked) • phase 3: irregular verb inflection reconsidered (smooth coexistence of irregular and over- regular forms… until only correct regular forma are used) Human‐like performance (some errors still present) Phonetic input coding (Wickel‐features, Wickelgren69) Network architecture (460 input and output units using wickel‐features Results: (after 420 turns of 200 verbs presentations = 84.000 examples) Time in ANN • Epochs (sweeps): (discrete) temporal units corresponding to one input processing • Atemporal processing: activation only depends on input, connections and weights • Temporal flow simulation trick:input divided in groups; each group a distinct temporal interval • Context layer (Elman 1990):hidden layer activation is copied on a context layer that will be added to the next input activation Simple Recurrent Networks Guessing next word paradigm e.g. the house is red Input = the Output = house («unsupervised» learning: auto‐supervised) Psycho/neurological plausibility (Cole & Robbins 92) A sort of priming Input structure localist(1 node = 1 concept) An SRN or Simple Recurrent Network (or Elman Network) is a kind of recurrent network. They are useful for discovering patterns in temporally extended data. They are essentially variants on a backprop network which is trained to associate inputs together with a memory of the last hidden layer state with output states. In this way, for example, the network can predict what item will occur next in a stream of patterns. Computational Linguistics Lecture 15 (Lab 4)
Cristiano Chesi (c. chesi @ unisi. it)
TLEARN AND SIMPLE RECURRENT NETWORKS
[ Learning AND operator
(1) Start TLeam
(2) = Select “Network"> “New project”
(3) Create a new folder “and” and choose a project name (“and*)
(4) Theee files are generated:
1. and.cf (network configuration)
2 ‘and.data (inputs)
3 and.teach (outpus according to the inputs in and.data)
(6) and.cf and.data and.teach
NODES: distributed distributed
nodes
2 4 4
inputs
outputs = 1 11 1
output node is 1 10 0
CONNECTIONS: oi o
groups = D_ 00 9
1 from il-i2
1 fron 0
SPECIAL:
selected = 1
weight_limit = 1.00
(6) = Select"Displays” > “Error display” and “Network Architecture”: then “Window” > "Tile”
(7) Select “Network" > “Training options”; press “more...” to set other parameters
(8) Select “Network” > “Train the Network" (shortcut = CRTL+T)
(9) Change training parameters and options and re-"Train the Network"
(10) —Select“Displays” > “Node activations” ad “Connection Weights"; then “Window” > “Tile”
(11) Substitute <0, 0, 0, 1> inthe file “and.teach” with: <0, 1, 1, 0>
(12). Changetraining parameters and options and re-*Train the Network"
(13). Savethe project and close it
Computational Lingui:
Cristiano Chesi (c. chesi @ unisi. it)
Lecture 15 (Lab 4)
(14)
XOR network
xor.cf xor.data
NODES; distributed
nodes = 4 4
inputs = 2
outputs = 1 1
output node is 4 1
0
1
CONNECTIONS:
groups = 0
1-3 from il-i2
4 fron 1-3
Loi
SPECIAL:
selected = 1-3
weight_linit = 1.00
xor.teach
distributed
4
0
1
1
0
[ SimpleRecurrent Network
(15)
sm.cf sm.data
NODES: localist
nodes = 2 17
inputs
outputs = 5
output nodes are 21-25
ui
CONNECTIONS:
groups = 0
1-10 fron il-i5
11-20 fron 1-10 - 1. & 1.
fixed one-to-one
1-10 fron 11-20
21-25 fron 1-10
SPECIAL:
linear = 11-20
wheight_limit = 0.1
selected = 1-10
UT LP NW UT UTAH gw
‘Sentence file
Mary
frasi:
John
often John
kisses kisses
Mary often
Mary
Mary
kisses
John
Mary
often
kisses
John
“translation” file
John MAPPINGS:
kisses 1-5 from frasi
dawn
sm.teach
localist
17
PUBWINHN WI HIS YI NY
Computational Lin
Cristiano Chesi (c. chesi @ unisi. it)
Lecture 15 (Lab 4)
(16) |. testfor cluster analysis a new data-set (2,3.4,5) and teach-set (3,4,5,1) (use a “names file” with the words “John” “kisses”
“often” “Mary” one per line (for creating a “vector file” go on “network” > “probe selected nodes” and save only the numeric
part of the output in a file, named vector file or similar);
[ Network for leaming ab" (counting) recursion
file.cf file.data file.teach
(17) NODES: localist localist
nodes = 12 14 14
inputs = 2 1 1
outputs = 2 1 2
output nodes are 11-12 2 2
2 o
CONNECTIONS: o 1
groups = 0 1 2
1-5 from il-i2 2 o
6-10 from 1-5 = 1. & 1. o 1
fixed one-to-one 1 1
1-5 fron 6-10 1 2
11-12 fron 1-5 1 2
2 2
2 o
SPECIAL: 2
linear = 6-10
wheight_limit = 0.1
selected = 1-5
(18) —modify file.data and file.teach for testing XX e XXF recursion
[ References
TLeam cab ne freerly downloaded from here:
http:/crl.ucsd.edu/innate/tlearn.html
Plunkett & Elman (1996) Exercises in Rethinking Innateness: A Handbook for Connectionist Simulations. MIT Press.