Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Proseminar on Prosodic Words: Processing Units vs. Prosodic Constituency, Exams of Italian Language

A lecture note from a proseminar on prosodic words held on november 9, 2006. The lecture discusses the differences between processing units and prosodic constituency, with a focus on bybee's model of morpheme independence and hay's model of whole-word entries. The document also touches upon the role of frequency and the determination of activation in word recognition.

Typology: Exams

Pre 2010

Uploaded on 09/17/2009

koofers-user-l97-2
koofers-user-l97-2 🇺🇸

10 documents

1 / 9

Toggle sidebar

Related documents


Partial preview of the text

Download Proseminar on Prosodic Words: Processing Units vs. Prosodic Constituency and more Exams Italian Language in PDF only on Docsity! Ling 215A/B: proseminar on the prosodic word Zuraw, 9 Nov. 2006 1 Processing units vs. prosodic units (1) First, a loose end from Bybee: Morpheme independence—Spanish past participles Bybee argues that –ado can’t be stored as a separate morpheme, because its behavior depends on the frequency of the word in which it appears. But that follows only if storage as a separate morpheme precludes whole-word storage. In Hay’s model, we saw that the two coexist. Bybee’s model of lexical representation concentrar condenado reducir concentrado condenar reducido Hay’s model ado concentrado concentr ido condenado conden reducido reduc (If the bases are bound forms, then what affects their resting activation is not their frequency in isolation, but only, as with affixes, how often they get accessed in derived words.) It’s hard/wrong to have intuitions about these models, as always—we need to write down equations or run simulations1—but here’s how I think the predictions differ. Bybee: Frequent words get more reduced. Reduction of –ado in one word spreads to other –ados. Why is –ado more reduced than –ido? Either the words that –ado occurs in tend to be frequent (or, it just occurs in a higher number of frequent words, assuming no inhibitory effect from the infrequent words it occurs in), or it’s an effect of the difference in deletion rate after a vs. i.  More reduction in frequent words; more reduction in affixes that occur in lots of frequent words. Hay: Frequently-accessed strings get more reduced. Because the ado and ido strings are contained within the suffix (don’t cross a morpheme boundary—at least under the implied morphological analysis above), whole-word entries for participles that are accessed frequently get more reduced, and entries for suffixes that are accessed frequently. (There could also be a phonetic effect of a vs. i.)  More reduction in words that are more frequent than their bases (that’s just a rough measure—really, the increased reduction should be in words that are whole-accessed more frequently than their bases are accessed); more reduction in affixes that occur in lots of words that are less frequent than their bases (same caveat). 1 For a discussion and illustration of why relying on intuition is bad—and what to do instead—see Partha Niyogi & Robert Berwick (1997). Populations of learners: the case of Portuguese. Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society. Ling 215A/B: proseminar on the prosodic word Zuraw, 9 Nov. 2006 2 The data on p. 152 are consistent with either theory. If Hay’s model is right, then the difference between high- and low-frequency words is coming from the fact that more of the high-frequency words are more frequent than their bases. (2) Prosodic constituency vs. Bybeean processing units Bybee proposes that “words that are often used together become processing units” (p. 157) and this leads to “phonological fusion”. This is pretty much the Hay-ian view of what happens within words. [Of course, we need to define ‘often used together’ (or let the model define it implicitly)—do we just mean that the sequence is frequent, or that it’s more frequent than would be expected given the frequencies of the components and some assumptions about how things combine?] How might prosodic units be different from processing units? (3) I. grammar-dependence vs. distribution/usage dependence Here are some possible worlds... (i) Units determined entirely by the grammar E.g., ALIGN(LxWd,L,PWd,L) ( compounds 2 words, prefixes and proclitics left out, suffixes and enclitics folded in); p-word then acts as rule domain (ii) Units determined entirely by processing Sequences stored as units (or, accessed in unit stored form, even if decomposed alternatives exist) display phonological fusion internally, propensity to alternate at edges. In general, (i) should predict a cleaner pattern than (ii), with fewer frequency effects on individual items. (i) also predicts tidy interaction with (presumably) non-processing considerations such as prosodic minimality. (iii) Processing masquerading as grammar Say that processing privileges left edges in such a way that prefixes and proclitics are, in general, more likely than suffixes and enclitics to get left out of the processing unit. If the tendency is strong enough, it could look like the ALIGN constraint above, perhaps with some lexical exceptions. Similarly, effects of affix length, and differences between compounding and affixation (a given morpheme presumably participates in a wider variety of compounds than it does affixed forms) could come out of a processing model. If strong enough, they could look like grammar (plus exceptions). (iv) Grammar with processing-grounded constraints We often appeal to phonetic motivations for constraint rankings—why not appeal to a processing motivation for the tendency ALIGN(LxWd,L,PWd,L) >> ALIGN(LxWd,R,PWd,R)? Ling 215A/B: proseminar on the prosodic word Zuraw, 9 Nov. 2006 5 (7) What determines activation? If an entry’s weight is still being allowed to increase, the preceding activation is just multiplied by that node’s decay rate: a(w, t) = a(w,t-1)/w (a stands for ‘activation’; w identifies the entry; t is the current timestep; w is the decay rate for that node) If the entry’s weight is determined to have peaked, the activation asymptotes out to the original (resting) activation: a(w, t) = |a(w,t-1)-a(w,0)|*w What determines when an entry starts decreasing its activation? If an entry has not yet reached threshold, and it is either edge-aligned with the target* or similar enough† to the target, then it gets to keep increasing. (*or, edge-aligned with a substring of the target that can be formed by stripping off outer affixes that have reached threshold; e.g., in is edge-aligned with uninformed if un has already reached threshold) (†similarity is length of the target, if entry is superstring of target; otherwise, similarity is length of the target minus Levenshtein edit distance between entry and target; entry is “similar enough” if its similarity  t) What determines an entry’s decay rate? Informally, it’s a combination of its length, its resting activation, and how much of the target it matches. ζ αδαδδ         −             + −+ + = ))(),(max( )()( ))0,(log( )( )( )( 1 ))0,(log( )( )( )( TLwL TLwL wa wL wL wL wa wL wL wL w if  > 0, and otherwise w =  where L(i) is the length of i, T is the target word, and there are free parameters  > 0 (spike parameter),  > 0 (forest parameter), and 0<<1 (overall decay rate). (8) Testing the model’s predictions—example from the literature Hay & Baayen (2002)4 find that, for a set of Dutch words in -heid Matcheck’s parsing times are correlated with subjects’ lexical-decision reaction times (from a previous study). 4 Jennifer Hay & Harald Baayen (2002). Parsing and productivity. In Geert Booij & Jap van Marle (eds.) Yearbook of Morphology 2001. Kluwer Academic Publishers. Pp. 203-235. Ling 215A/B: proseminar on the prosodic word Zuraw, 9 Nov. 2006 6 Ling 215A/B: proseminar on the prosodic word Zuraw, 9 Nov. 2006 7 If we split up the words... • For words where the whole-word parse is faster, only the time to the whole-word parse is significantly correlated with RT. • For words where the decomposed parse is faster, the time to the whole-word parse and the morphological family size are sig. correlated with RT.  Why does a large family speed RT in the decomposedly-parsed items? Assume that, on top of the form-similarity relations implicit in Matcheck, there are explicit connections between words that contain the same morpheme, and they spread activation to each other. Thus, the stem gets activated faster when it occurs in lots of other words. “Now consider the case in which the parsing route wins the race. The present experimental data on -heid suggest that in this case activation spreads into the morphological family. This makes sense, as initially the comprehension system knows only that it is dealing with a stem that has to be combined with some affix. By allowing the morphological family members to become co-activated, all and only the possibly relevant candidates are made more available for the processes which combine the stem with the derivational suffix and which compute the meaning of their combination. “In fact, since the derived form is one of the family members of the base, it will be activated more quickly when the base has a large morphological family. This is because it is embedded in a larger network of morphologically related words, and the morphologically related words spread activation to each other. This may explain why log derived frequency remains such a strong predictor of the response latencies even for the words in the P set [i.e., the words where the decomposed parse wins].” (pp. 13-14 of ms. version)  Why doesn’t family size matter for whole-word-parsed items? This part of the paper is fuzzy to me. I’ll just quote: “Consider what happens when the direct route is the first to provide a complete spanning of the target word, say, snel-heid, “quickness”, i.e., “speed“. Once the derived form has become available, the corresponding meaning is activated, apparently without activation spreading into the morphological family of its base, “snel”. In other words, family members such as ver-snel-en (“to increase speed”) and snel-weg (“freeway”) are not co-activated when snel- heid has been recognized by means of the direct route.” (p. 13 of ms. version) It’s hard to reconcile this with the idea that words containing the same stem are connected. Ideas?  Why doesn’t the time to the decomposed parse matter? “We think that derived frequency and family size may conspire to mask an effect of the timestep itself at which the base word itself becomes available, leading to the absence of a
Docsity logo



Copyright © 2024 Ladybird Srl - Via Leonardo da Vinci 16, 10126, Torino, Italy - VAT 10816460017 - All rights reserved