Publications & Papers - Towards a 'Converging Theories' Model of Language Acquisition

Towards a ‘Converging Theories’ Model of Language Acquisition:
Continuing Discontinuity

Joseph Galasso

English Department
California State University, Northridge

Incomplete Working Draft: Spring 2003

Abstract

The Dual Mechanism Model credits the Brain/Mind with having two fundamentally different cognitive modes of language processing—this dual mechanism has recently been reported as reflecting inherent qualitative distinctions found between (i) regular verb inflectional morphology (where rule-based stem+affixes form a large contingency), and (ii) irregular verb constructions (where full lexical forms seem to be stored as associative chunks). In this paper, we examine the Dual Mechanism Model and broaden its scope to covering the overall grammatical development of Child First Language Acquisition.

Proposal

This paper proposes new accounts of old issues surrounding child first language acquisition. The general framework of our proposal is based upon hybrid theories—proposals stemming from recent investigations in the areas of PDP-style connectionism, as well as from more naturalistic studies, and sample-based corpora of Child Language Acquisition. Much of what is sketched out here attempts to converge the leading tenets of two major schools-of-thought—namely, Associative Frequency learning and/vs. Symbolic Rule learning. Cast from this new tenor, proponents calling for a Dual Mechanism Account have emerged advocating a dual cognitive mechanism in dealing with processing differences found amongst regular and irregular verb inflection morphology (inter alia). The main task of this paper is (i) to broaden and extend the dual mechanism account—taking it from the current slate of morphology to the larger syntactic level, and (ii) to spawn some theoretical discussion of how such a dual treatment might have further reaching implications behind more general developmental aspects of language acquisition (as a whole), namely (though not exclusively), the twin benchmarks of syntactic development regarding Lexical vs. Functional (staged) grammar, etc. Our central claim will be that whatever factors lead to a deficient morphology, say, at a given stage-1 of development—factors that may potentially lead to the postulation of a non-rule based account—these same factors are likely to be carried over, becoming a factor of deficiency in the overarching syntax. Thus, the tone of the discussion is dualistic throughout. Our main goal is two-prong: first, to assert as the leading null hypothesis that language acquisition is Discontinuous in nature from that of the adult target grammar, and that this discontinuity is tethered to maturational factors which lay deep seeded in the brain—factors which yield fundamental differences in the actual processing of linguistic material, (a so called ‘Fundamental Difference Hypothesis’), and second, to show that this early multi-word non-target stage can be attributed to the first leg of this dual-mechanism—i.e., that side of cognitive/language processing that governs (i) (quasi-) formulaic structures along with (ii) non-parameterizations. We attribute the generation of this two-stage development to maturational scheduling—viz., a Non-Inflectional stage-1 and/vs. an Optional Inflectional stage-2 (where formal grammatical relations are first learned in a lexical bottom-up fashion and then later regroup to generalize across the board in a word class top-down fashion). It is our understanding that the two-staged development involves and shares both a relevant associative style theory of learning (cf. Skinner/Associative style learning, for our former stage-1), while preserving the best of what syntactic rule-driven theories have to offer (cf. Chomsky/Rule style learning, for our latter stage-2)—hence, the entitled term Converging. By analyzing much of what is in the literature today regarding child language acquisition, as well as drawing from the rich body of work presently being undertaken in connectionism, it is our hope that a new hybrid converging theory of child language acquisition can be presented in a way that captures what is inherently good from both schools—an alternative theory that bears more flavor of truth than camp rhetoric.

Why—I don’t need any ‘rule’ to see this tree here in front of me. My eyes work just fine. That is, insofar as there exists a single tree. But, how is it that my ‘tree’ gets destroyed once I move my head ever so slightly to the east and fall into view of a second tree? The mystery of it all lies somewhere in the dismantling, between a single torn branch of lifted foliage, that forces the rule—for how was I ever to know that this second tree was indeed a tree after all? (Poem based on Plato’s form).

“Humans use stories that they tell themselves in order to get themselves to work on this or that. These stories often deal with confrontation between areas and ideas. From some point of view, it is almost always the case that these high-level stories are relevant only as motivation and not really relevant to what eventually happens in terms of technical understanding”. (Allen Newell)

Sometimes, stories within a certain school split—e.g., formalist debates on the amount of functionalism Chomsky can and should afford to surrender (cf. Pinker & Bloom). Sometimes differing stories converge—Neo-Behaviorists seeking out an innately based architecture (cf. Elman). In any event, differing schools-of-thought are prosaic at best, ripe with unfortunate misunderstandings that lead to the fanfare of debate. There is no clarion call behind dueling rationales; one is left merely to one’s own devises, scrambling to gather fuel for the worthy debate. All reduce to subtle argument of fine detail—very rarely is there really substantial differing. The world as we see it ultimately provides very little in the way of such dividend: perhaps ontogeny recapitulates phylogeny in every respect.

0. Overview

Periodically, say every two or three generations, our vows on science are renewed by a sweeping change of reasoning—cerebral airs that deliver their own inextricable kind of ‘off-the-beaten-path’ hedonism. These solemn changes are few and far between and constitute what the philosopher of science Thomas Kuhn called ‘Paradigm Shifts’ (a new-way of thinking about and old-something). Unfortunately, these generational spurts often provide very little in the way of true original thinking, and much of what is behind the fanfare quickly reduces to little more than the recasting of old ‘brews’ into new ‘spells’. Perhaps a glimmer of true original thought (a ‘new-something’) comes our way every two hundred years or so. We are in luck! One of the greatest breakthroughs in science has been borne in the latter half of the last century and has made its way onto the scene shrouded by questions surrounding how one should go about rethinking the Human Brain/Mind—questions that have led to eventualities in Computer Programming, Artificial Intelligence (AI), Language/Grammar, Symbolic-Rule Programs and Connectionism. Much of what sits here in front of me, at my desk, can be attributed in one way or another to this ‘new-something’, and whenever there is a new-something, whether it be steam-locomotives to transistors to tampering with DNA, there’s bound to be an earful of debate and controversy. And so remnants of this debate have edged their way ever so slowly onto the platform—from the likes of the psychologist Donald Hebb (1940s-50s) (and his revolutionary notion of ‘nerve learning’ based on oscillatory frequency), to the great debates between two great personalities in the AI field, Marvin Minsky and Frank Rosenblatt (1950s-60s), to those in the realm of language, Noam Chomsky (1960s-80s). More recently, the debates have taken on a vibrant life of their own by the great advances in computer technology. The most clearly articulated of these recent debates has come to us by two leading figures in the research group called Parallel Distributed Processing (PDP)—namely, Jay McClelland and Dave Rumelhart (1980s). Most recently, the debates have come to carry a portmanteau of claims—chief among them is the claim that human brain function, and thus human computation, is not analogues to (top-down) symbolic-based computers, but rather, the brain and its functional computations should be considered on a par with what we now know about (bottom-up) nerve functions and brain cell activations. In other words, the paradigm shift here occurs the moment one rejects the computer as an antiquated model of the brain (and language), and instead, prompts up a newer model of language and thinking based on connections and connectionism (as understood in neurological studies). In this vain, it is fair to say that we should no longer view language as a mere gathering and shaping of atomic particles or logical symbols—much like how one might view the atomic nature of computer language as it is composed of a serial string of 0’s and 1’s—rationing out sub-parts of the structure in more-or-less equal portions in hope at arriving at a larger and more cohesive general frame of language. It could be argued by connectionists that language is not only much more fluid than what any strict rule-driven/symbolic function could provide, but also that language requires a greater measure of freedom and flexibility at the bottom end. Whereas rules originate top-down, it may likely turn out that bottom-up processes better reflect what is actually going-on, at least in the initial learning processes of language. (One nontrivial note here to remember is that there is a fundamental and crucial difference between (AI) artificial computer (chips) and living brain cell (neurons): the latter must secure survival. There is no sense in the notion that silicon chips need to secure survival, since there is no death of a chip. Cells are living organisms that must somehow ensure its survival, and this survival apparatus certainly for the individual cell, must be organized in a bottom-up fashion). Along these lines, much of what is coming out of West Coast schools-of-thought (connectionism) affords the old school of Gestalt psychology a new lease on life. Some connectionists find themselves talking-up the fact that language can’t simply be a cohesion of atoms put together in very elegant ways, but that some ‘higher-order’ of fluidness must exist. Human cognition is more fluid, more context driven. In a token manner of speaking, Kohler might carry-on here about mysterious magnetic fields which suddenly arise in the brain which pull sub-particle visual stimuli together—any notion of a gestalt brain, of course, has long been disputed (I think). However, it should be noted that Gestalt psychology continues to pave a way for a serious return in the contexts of connectionism. (In addition, as a historical footnote, let’s not forget that while Rosenblatt’s work originated with visual perception, it is now viewed that his work, if carried-out in today’s climate, would have had potentially serious linguistic implications.). And so let us turn to language. With specific regards to grammar, the Word-Perception Model of Rumelhart and McClelland (1981, 1986) has made a dramatic impact in the field. Not only has it provided us with a new way of looking at potential brain processing (a quantitative way of looking with regards to weights of connections, thresholds, memory storage, etc.), it also made rather precise claims about what kinds of material (qualitative) would be difficult to process in such a model: (the need for hidden units regarding 2-degree complex structures and paradigms, recursive complexity and back-propagation, etc.). Clearly, when one can predict with a fair amount of certainty where problems will be had, and then attempt to account for the nature of the problem in terms of the model, then surely the criterion of explanatory value is close to being met. For example, the now conceded fact that ‘hidden units’ must be installed (p.c. Jeff Elman, as part of the innate apparatus) in order for the full complexity of language to be process via any PDP, I believe, speaks volumes to where we stand today in explanatory value—in fact, hidden units have now become the main rallying cry for those who postulate for rule-based accounts of language (not to mention the nativists among us. See Marcus vs. Elman on this matter). Finally, the typical intransigence that often will shape and define opposing views has given way to a certain amount of movement leading to an ultimate compromise between the two leading schools of thought—as noted by the likes of Steven Pinker and Alan Prince. For instance, Pinker & Prince’s somewhat tentative and partial acceptance of a connectionist model regarding specific lexical processes, if nothing else, has buttressed their own alliances in the pursuit of upholding counter-claims against proponents for a pure ‘Single Mechanism Model’ (strictly based on associative learning). And so out of this twist of faiths, a renewed and rejuvenated interest in rule-driven processes has been gathering momentum in an attempt to seek more narrowly confined rule-based analogies for dealing with specific aspects of language/grammar as a whole. Finally, as suggested by Newell above, long-standing dichotomies often provide a variety of clever means to think about a wide range of topics. It goes without saying that as a pedagogical device at least, students not only crave a good debate, but more importantly, they often report that new material introduced in the form of a debate procures a much higher level of understanding. Well, this singular debate has been ongoing for centuries, merely masked under several different labels: nature vs. nurture, innate vs. learned, hard-wire vs. soft-wire abilities, instinct vs. learning, genetic vs. environment, top-down vs. bottom-up strategies, and as presented herein, the Single vs. Dual Mechanism Model.

Introduction

It is a fact that children do not produce ‘adult-like’ utterances from the very beginning of their multi-word speech. And so much of the debate ongoing in child first language acquisition has been devoted to the nature and extent of ‘What gets missed out where’. Theory internal measures have been spawned every which way in effort to account for the lack of apparent adult-like language in young children—Theories abound. Despite some evidence that would seem to point to the contrary, more robust syntactic theories from the outset continue to view the very young child as maintaining an operative level of language closely bound to abstract knowledge of grammatical categories (Pinker 1984, Hyams 1986, Radford 1990, Atkinson 1992, Wexler 1996, Radford & Galasso 1998). For instance, Pinker (1996) has described early language production in terms of a first order (general natives) cognitive account-suggesting a processing ‘bottleneck’ effect which is attributed to limited high-scope memory to account for the child’s truncated syntax of Tense/Agr/Transitive errors (e.g., Her want), and over application Tense errors (e.g., Does it rolls?). Radford (1990), Radford and Galasso (1998), on the other hand, has maintained a second order (special nativist) maturational account affecting syntactic complexity in order to explain the same lack of adult-like speech. It should be noted that these two nativist positions share a common bond in that they are reactions to much of what was bad coming on the heels of work done in the 1970s—theories which sought to account for such errors on a purely semantic level e.g., Bloom (1975), Braine (1976) and to some extent Bowerman (1973). Steering away from potentially non-nativist associative/semantic-based accounts to proper syntactic-based accounts was viewed by most to be a timely paradigm shift—acting as a safeguard against what might be construed as bad-science Behaviorism (of the purely semantic kind). This shift brought us toward a more accurate ‘Nativist’ stance swinging the Plato vs. Aristotle debate back to Plato’s side, at least for the time being (as witnessed in Chomsky’s entitled book ‘Cartesian Linguistics’)—a move keeping in line with what was then coming down the pike in Chomskyan linguistics. One thing that seems to have caught the imagination of developmental linguists in recent years has been to question again the actual infrastructure of the child-brain that produces this sort of immature grammar—namely, a rejuvenated devotion has reappeared in the literature circumscribing new understandings of age-old questionings dealing with Theory of the Brain/Mind.
For instance, proponents of Behavioral/Associationist Connectionism today (cf. Jeff Elman, Kim Plunkett, Elizabeth Bates, among others) are more than ready to relinquish the old Chomskyan perspective over special nativism (‘special’ in that language is viewed as coming from an autonomous region in the brain, unconnected to general cognition or other motor skill development, pace Piaget and vs. general nativism), and have rather shifted their locus on an innateness hypothesis based not on natural language (per se) but rather on a type of innateness based on the actual architecture itself that generates language (architecture meaning brain/mind: viz., an innate Architecture, and not an innate Universal Grammar).
For Chomsky, it was this autonomous language faculty (that he refers to as a language organ) that allowed this innate language knowledge to thrive and generate grammar. For the connectionist movement, it is the very architecture itself that is of interest—the input/output language result being a mere product of this perfected apparatus. So in brief, the debate over innateness has taken on a whole new meaning—today, perhaps best illustrated by this more narrow debate over General vs. Special Nativism. We shall forgo the meticulous details of specific theories at hand and restrict ourselves to the rather prosaic observation that the child’s first (G)rammar (G1) is not at all contemporary with the adult (T)arget grammar (Gt). Notwithstanding myriad accounts and explanations for this, for the main of this paper, let’s just simply examine the idea that the two grammars (child and adult)—and we do consider them as two autonomous and separate grammars—must partake in some amount of Discontinuity: (Gt is less than equal to G1, or Gt<G1) and that such a discontinuity must be stated as the null hypothesis tethered to maturational/biological differences in the brain. Hence, G1 represents the (B)rain at B1..(B2..B3¼Bt ), while Gt represents the brain at Bt).
Discontinuity theories have at their disposal a very powerful weapon in fighting off Continuity theories—whether it be language based, or biological based (noting that for Chomsky, the study of Language, for all intents and purposes, reduces to the study of biology). This great weapon is the natural occurrence of maturational factors in learning. In fact, on a biological level, maturation is taken to be the null hypothesis—whether it be e.g., the emergence and consequent loss of baby teeth, to learning how to walk-talk, to the onset of puberty. In much the way the adult achieves, the achievement can be attributed to the onset of some kind of scheduled-learning timetable—for language, it’s an achievement mirroring a process in which the nature and level of syntactic sophistication and its allocation is governed in accordance to how the brain, at the given stage, is able to handle the input.
It is common knowledge that (abstract) grammatical relations are frequently a problem for language acquisition systems. Early reflection on this was made by Brown when he discovered that one could not explain why some grammatical morphemes were acquired later than others simply in terms of input. The question was posed as follows: If all morphemes are equally presented in the data-stream at roughly the same time—contrary to what might be believed, parents’ speech toward their children is seldom censored so as to bring about a reduced mode of grammatical communication/comprehension—then, what might account for the observed asymmetrical learning? Similarly, Pienemann (1985, 1988) has made claims for a grammatical sequencing of learning second language based on complexity of morphology. This question led to early notions of a linguistic maturational timetable much like what Piaget would have talked about regarding the child’s staged-cognitive development—maturation being the only way to address such a staged development. Likewise, a Chomskyan position would have it that there must be something intervening in the child’s (inner) brain/mind (albeit not tied to cognition) that brings about the asymmetrical learning since there’s no change in the (outer) input. Well, one of the first observations uncovered by Brown was that a child’s linguistic stage-1 (with multi-word utterances (MLUw) lower than 2) went without formal functional grammar. In other words, Brown noted that a telegraphic stage of learning was absent of abstract grammar such as Inflection, Case and/or Agreement.
One consequence of this style of learning was that children were considered to learn by rote-methods, associative means similar to what Skinner had earlier advanced in (his ‘bad science’) Behaviorism. It was somewhat tentatively suggested here regarding a very early stage-1 that children didn’t start learning language as a set of rules of logic (as Chomsky would have us believe in his notion of generative grammar), but that children would first grapple with the linguistic input by gathering and constructing formulaic chunks. Children would only later on, say at a stage-2 of language acquisition, start to employ Chomskyan rules to generate a target grammar (as a consequence, see ‘U-shape learning’ discussed below). For example, Bellugi (1967), Klima and Bellugi (1966), Bellugi (1971), initially allowed for a certain amount of formulaic misanalysis to enter into the accounting of non-adult-like stage-1 structures. More specifically, recently Rowland and Pine (2000) have similarly suggested that e.g., early Subject-Auxiliary inversion errors such as *What he can ride in? (inter alia) (along with the optional target structures showing inversion What can he ride in?) cannot be accounted for by a rule-driven theory—viz., if the child has access to the rule, the theory would then have to explain why the child sometimes applies the rule, and sometimes fails to apply it. Rowland & Pine rather suggest an alternative account by saying that as a very early strategy for dealing with complex grammar (e.g., Aux. Inversion, Wh-fronting) children learn these semi-grammatical slots as lexical chunks—a sort of lexicalized grammar—whereby they establish formulaic word combinations: e.g., Wh-word + Auxiliary as opposed to Auxiliary + Wh-word combinations. It was shown that aspects of error rate and optionality (as opposed to rule-driven mechanisms) highly correlated to high vs. low frequency rates of certain combinations in the child’s input This early non-rule-based strategy was then able to account for the vast array of the child data—viz., where the number of non-inverted Auxiliaries vs. inverted Auxiliaries were at a significantly higher rate at the initial stage-1 of development. As an example of a non-rule-based account here, they show that when inversions did occur, they typically involved only a certain select few Wh-words, and not the entire class. Hyams (1986, p.85) somewhat agrees with such a reduced structure when she asserts the following: By hypothesis, the modals (or Aux. Verbs) are unanalyzable during this period. Such overall claims strongly support Stromswold’s (1990) statistical data analysis which clearly demonstrated that children at a very early stage-1 might not productively realize an utterance string containing [don’t, can’t] in e.g., I/me [don’t] want, You [can’t] play as the syntactic elements [{Aux} + clitic{n’t}], but that such strings were more limitedly realized as quasi-formulaic representations of a negative element. In other words, the claim could be extended to mean that for the child at this stage-1, the lexical item don’/can’t reduced to the one-to-one sound-meaning of not: e.g., Robin [don’t] [=no(t)] play with pens (Adam28) where the verbal inflection {-s} goes missing since it isn’t analyzed as an Aux Verb. Likewise, Brown came to similar tentative conclusions by recognizing that (i) verbal inflection seemed not to be generalized across all verbs in the initial stages, and therefore, that (ii) children didn’t really start with rules, but rather employed a strategy of ‘lexical-learning’. Early stage-1 inflected verbs might then be learned as separate verbs (chunks) thus explaining observable optionality: since, as the story was then told, either you knew a rule (and so you always applied it) or you didn’t. Optionality of verbal inflection was then seen as a dual process of word acquisition in the brain: both uninflected and inflected words were stored as two different items in the lexicon. (See Bloom 1980 for comments). This notion of a stage-1 learning via non-rule-based means implied that the stage was a formulaic stage, and set-up in such a way as to learn by associative processes buttressed by frequency learning.

The Dual Mechanism Model

It has recently been hypothesized that the language faculty consists of a dualistic modular structure made up of two basic components: (i) a Lexical component—which has to do with formulating lexical entries (words), and a (ii) Computational component—which is structured along the lines of algorithmic logic (in a Chomskyan sense of being able to generate a rule-based grammar). It is argued that these two very different modes of language processing reflect the ‘low-scope’ (1^st order) vs. ‘high-scope’ (2^nd order) dichotomy that all natural languages share. Low/High scope would be described here in terms of a how and where certain aspects of language get processed in the brain (see also section # below on brain studies). In addition to newly enhanced CT brain imaging devices, multidisciplinary data (e.g. linguistic, psychological, biological) are starting to trickle in providing evidence that a dual mechanism is at work in processing language. Results of experiments indicate that only a dual mechanism can account for distinct processing differences found amongst the formulations of irregular inflected words (e.g., go>went, foot>feet) and regular inflected words (e.g., stop>stopped, hand>hands). The former (lexical) process seems to generate its structure in terms of stored memory and is taken from out of the mental lexicon itself in mere associative means: these measures are roughly akin to earlier Behavioristic ideas on frequency learning, etc. fashionable in the 1940s-1960s and made notable by the experimental work of D. Hobb and B.F. Skinner. The latter regular mode of generating structure is tethered to a Chomskyan paradigm of (regular) rule-driven grammar—the more creative, productive aspect of language/grammar generation. Such regular rules can be expressed as [Stem]+[affix] representations, whereas a stem constitutes any variable word <X> (old or novel) that must fit within the proper categorization (parts-of-speech) stem. For instance, using a simplified version of Aronoff’s realization pair format (1994, as cited in Clahsen 2001, p. 11), the cited differences in parsing found between e.g., (i) a regular [Stem + affix] (decomposed) construction vs. (ii) an irregular copular ‘Be’ [Stem] (full-form) lexical item can be notated as follows:

a. <[V, 3sg, pres, ind], X+s>

b. <[V, 3sg, pres, ind, BE], is>

The regular 3Person/Singular/Present rule in (a) spells out the bracketed functional INFLectional features of Tense/Agreement by adding the exponent ‘s’ to the base variable stem ‘X’. The features in (b) likewise get spelled; but rather than in the form of an exponent, the features are built into the lexeme ‘BE’ by the constant form is. Once the more specific, irregular rule is activated, the default regular rule-base spell-out is blocked-preventing the overgeneralization of *bes.

INFLection. Recent research conducted by Pinker (MIT), Clahsen (et al.) (Essex), among others has shown that a dual learning mechanism might be at work in acquisition of a first language. The research first focuses on terminology. It is said that there are two kinds of rules for Inflection: an Inflection based on lexical rules, and an Inflection based on combinatory rules. In short, the types of rules are described as follows:

(i) Lexical Rules: Lexical rules (or lexical redundancy rules) are embedded in the lexical items themselves (‘Bottom-up’). Lexical rules may be reduced to being simple sound rules somewhat akin to statistical learning; for instance, associative regularities are built-up from out of the sequencing of lexical items—e.g., the <sing>sang>sung -> ring>rang>rung> sequencing of an infix (vowel change) inflection (presented below)

(ii) True Rules: Word inflection of the former type (i.e., lexical rules) is cited as an inflection not based on rules, but rather encoded in the very lexical item itself. True Rule (or affixation), on the other hand, would be a combinatory symbolic process based on variables—a creative endeavor not bound by associative input (‘Top-down’). Whereas lexical-based inflections are exclusively triggered by frequency and associative learning methods—i.e., they are not prone to deliver the creative learning of novel words with inflection—Novel word inflection is generated (by default) once the true rule-based grammar is in place. One simple example that Pinker and Clahsen give in illustrating lexical/associative Inflection is the irregular verbs construction below:

Irregular Verb Constructions: The #ing>#ang>#ung paradigm

Table 1

a). sing >	sang >	sung
b). ring >	rang >	rung
c).*bring >	*brang >	*brung

The cause of this commonly made error in (9c) is due to the fact that the phonological patterning of rhyme #ing>#ang>#ung—as a quasi-past-tense infix (lexical-rule) form—is so strong that it often over-rides and out strips the default regular (true-rule) form of V+{ed} inflection for past tense. (Spanish offers many similar examples where frequency of irregular verbs affect the paradigm such as the irregular (incorrect) *Roto (=Broke) over-generalization generated from the regular inflection Romp-ido.) (*marks ungrammatical structures)

The erroneously over-generated patterns of *bring>brang>brung (for English) and *Roto (for Spanish) are heavily based on statistical frequency learning in the sense that the sound sequences of other irregular patterns (e.g., ring>rang>rung,) contribute to the associative patterning. Recall that structured lexical/associative learning merely generalizes, by analogy, to those novel words that are similar to existing ones. Regular grammatical rules (true rules), on the other hand, based on affixation, may apply across the board to any given (variable) syntactic category (such as Verb, Noun). In one sense, the ultimate character of ‘true rules’ is that which breaks the iconic representation of more primitive, associative-based processes, whether it be a neuropsychological process or some other process.
The point that the actual over-generalized strings (bring>brang>brung) are not found in the input demonstrates that there is some aspect of a rule evoked here—albeit, a rule based on rhyme association, and thus not a ‘pure rule’ where true (non-associative) variables would be at work. In other words, these lexical rules are to be generalized as a form of associative pattern learning, and not as a true rule, since they are associated with sound sequencing only. One crucial implication of an Inflection generated by a true-rule is that such inflection could be easily applied to novel or unusual words: viz., words never before heard in the input (contrary to frequency learning of lexical rules discussed above—cf. Brown (1957), Berko (1958).
Expanding on previous studies which examined differences in priming effects between Derivational and Inflectional morphology, Clahsen concludes that difference in priming effects can only be accounted for by a dual mechanism of learning—interpreting the data to show that high priming effects were connected with productive inflectional forms not listed in the mental lexicon, where as low priming effects were connected to productive derivational forms associated with stem entries.

With regards to German forms of pluralization, Clahsen (et al. p. 21) note that the same argument can be made for a dual mechanism process—viz., the high priming regular (default) plural ‘-s’ (auto-s) contrasts with the low priming of the irregular plural ‘-er’ (kind-er). The raw findings here suggest that certain irregular inflections in German (e.g., participle {-n}, plural {-er}) might be stored in the lexicon as undecomposed form chunks and that these two processes of storage are activated in very different places and manners in the brain—viz., the findings that irregular inflections spawn reduced priming as compared to regular inflection suggest that regular inflections are built forms based on rules that contain variables which make the basic unmarked stem/root available for priming. It is clear from the table below that regular inflected word forms such as {-t} participles and {-s} plurals produce full priming and no word-form frequency effects. For irregular inflected forms such as {-n} participles, {-er} plurals and (irregular) {-n} plurals the opposite pattern appears. The data suggest that irregular forms are stored as undercomposed stems—hence the emergences of full form frequency effects. Regular forms are captured by the full rule process and are stored in a computational manner that works off of variable+stem algorithms—hence, the lack of full-form frequency effects. These differences in German morphology seem to parallel what we find between English (i) Inflectional morphology and (ii) Derivational morphology where the former seeks out specific rule formulations—e.g., V + {ed} = Past, or N + {s} = Plural, etc. and where the latter seeks out associative style sound-to-meaning learning approaches (as in irregular verbs/nouns e.g., go>went, tooth>teeth, etc.) Applying fMRI brain imaging techniques, a consensus has begun to emerge suggesting that the lexical storing of derived stems + suffixes (e.g., teach+{er}) may actually be processed as one single word chunk in the otherwise lexical (word/recognition) temporal-lobe areas of the brain, and not, as intuition would have us believe, as a dual segmented [stem + suffix] lexical structure which has undergone a process much like a morpho-syntactic string). This may be an apparent economical move keeping in line with the classic one- sound-one-meaning association. In noting this, there seems to be a natural tendency in the diachronic study of language to move from (i) rule-driven Inflectional morphology—with more complex rule-driven infrastructures [+Comp] (Comp=complex) to less complex [-Comp] structures—to (ii) association-driven Derivational morphology. This tendency can be easily captured by looking into the way words have evolved over a duration of time—e.g., Break|fast /bre: kfaest/ has evolved from a twin morpheme structure [[Verb Break] + [Noun Fast]] > to Breakfast /bre:kfIst/ [Noun Breakfast] composed of a single morpheme chunk..

Table 2 Summary of experimental effects (Taken from Clahsen et al. 2001: p.26)

Representation	Full priming effect?	Full-form frequency effect?	Source
-t particples: ge[kauf]-t	yes	no	Sonnenstuhl et al. (1999), Clahsen et al. (1997)
-s plurals: [auto]-s	yes	no	Sonnenstuhl&Huth (2001), Clahsen et al. (1997)
-er plurals: [kinder]	no	yes	Sonnenstuhl &Huth (2001) Clahsen et al. (1997)
-n participles: [gelogen]	no	yes	Sonnenstuhl et al. (1999), Clahsen et al. (1997)
-n plurals I: [bauern]	no	yes	Sonnenstuhl&Huth (2001)
-ung nominalizations: [[stift]ung]	yes	yes	Clahsen et al.(2001)
diminutives: [[kind]chen]	yes	yes	Clahsen et al. (2001)
-n plurals II: [[tasche]n]	yes	yes	Sonnenstuhl&Huth (2001)

In sum, Pinker and Clahsen assume that the language faculty has a dual architecture comprising of (i) combinatory rule-based lexicon (leading to the lack of full-form effects) and (ii) a structured non-rule-based lexicon (leading to full-form effects). Questions on specifics will surface in the following sections-namely: How are these two methods represented in the brain?
A Stage-1 Language Acquisition. There is a huge and ever-growing body of data today being collected by developmental linguists in the field which suggests that the brain of a child matures in incremental ways which, among other things, reflects the types of ‘staged’ language development produced by the child for a given maturational stage. The collected data suggest that children’s early multi-word speech demonstrates ‘Low-Scope’ lexical-specific knowledge, and not abstract true-rule formulations attributed to grammar. Somewhat akin to Piagetian notions of language development (see general nativism below): One difference being that it need not be tied here, exclusively, to a cognitive apparatus. This staged, maturational theory of language development accounts for the lack of specific linguistic properties by suggesting that the brain is not yet ready to conceptualize higher and more abstract (High-Scope) forms of linguistic conceptualizations
The idea behind ‘What gets missed out where’ in child speech production has given those linguists interested in morphology and syntax a particularly good peek at how the inside of a child’s brain might go about processing linguistic information—and other information for that matter. As stated above, research initially carried out by Brown and his team (1973), working under a Chomskyan paradigm of linguistic theory, and consequent work by others (cf. Radford) suggests that there is a stage-1 in language acquisition that tightly constrains the child’s speech to simple one-to-two word utterances with no productive forms of verb or noun inflection. One child that appears in the early studies, Allison, provides transcripts between 16-19 months showing no signs of the onset of formal inflectional grammar—only later-on close to two years of age (22-24 months) does inflectional grammar/syntax emerge, and then only in what could be said as a sporadic, optional manner.
This stage-1 is considered to be a grammatical stage with an MLUw (Mean Length of Utterance word) of 2 words or less. More specifically, in the sense of the apparent lack of formal grammar, this shouldn’t be confused with the idea of an earlier a-grammatical stage well before the onset of multi-word speech. (Surely, there can be no grammar or syntax of which to speak if there aren’t multi-word constructions). This grammatical stage-1 therefore differs with the notion of a one-word stage (MLU=1) where supposedly absolutely no grammar/syntax is at work. The grammatical stage-1 is said to begin roughly with the onset of multi-words at about the age of 18 months (+/-20%). It is reasonable to suppose that such a stage would have target semantic meaning—even though, say the arbitrary ‘one-to-one sound-to-meaning’ relationship is not of the target type (e.g., onomatopoeia forms /wuwu/=dog, /miau-miau/ =cat, etc.).
The above notions beg the question: At what point do we have evidence of grammatical categorization? For example, the traditional distributional criterion that defines the Noun class as that category which may follow Determiners (a/the/many/my/one) made not be available to us if, say, Determiners have yet to emerge. Hence, distributional evidence may be lacking in such cases. One way around the dilemma has been to suggest that early stage-1 grammar is categorical in nature simply owing to a default assumption that categorization is part of the innate ability to acquire language (in Chomskyan terms, part of the richly endowed LAD or Language Faculty) and that words are both inherently categorical and semantic in nature. Pinker (1984) claims that the categorization of early stage-1 words should be roughly pegged to their inferred semantic properties. Radford (1990), in a slightly different approach, prefers to consider such early multi-words at stage-1 as lexical in the sense that (i) they have built-in default lexical categorization abilities (forming classes of Nouns, Verbs, Adjectives, Adverb, and Prepositions), but, at the same time, (ii) rely heavily on their semantic-thematic properties. In any event, either description starkly contrasts with a connectionist view-which claims that e.g., the class ‘subject’ emerges through rote-learning of particular framed constructions. Subject-hood is learned as a category via rote associative learning of thematic relations. Now, it remains unclear to me precisely how close such thematic links to category-hood get to Radford’s 1990 interpretation. I would only venture to say that both views share the belief that semantics hold the central cognitive underpinnings upon which syntax can later be built.

This account of stage-1 has been labeled as the Lexical thematic stage-1 in language acquisition (Radford 1990). It is unclear how far Radford would like to go in accepting his stage-1 as cognitively based: the labeling here of lexico-thematic (the term thematic referring to argument structures pegged to semantics) certainly permits some amount of semantics to leak into the discussion. Nevertheless, Radford emphatically rejects the notion that a stage-1 syntax could be exclusively based on semantics. It is here that Radford gets full mileage out of his two-prong converging Lexical-Thematic stage-1 grammar: a stage-1 that is both—

(i) ‘thematic’ in the sense that it leans towards general nativism since simple utterance types at the earliest MLU get directly mapped onto their thematic argument structure; while,

(ii) ‘lexical’ in the sense that the child seems to be fully aware that they are dealing with words based on lexical grammatical categories, and not semantic. This is made apparent by how children know the morphological range of category (e.g., Noun, Verb) selectiveness along with inflection distribution.

Next Page >>
[2] [3]