Towards a 'Converging Theories' Model of Language Acquisition:

 

Continuing Discontinuity

 

Joseph Galasso

 

California State University, Northridge

joseph.galasso@csun.edu

 

(2003b)

            Introduction

            There was a time when the classical split between behaviorism and nativism was easily identifiable, each rationale breaking down along their traditional fault lines. On one hand, you had the 'behaviorists-folk' who believed more or less that all forms of learning, language included, could be somehow reduced and extracted from ambient input found in the environment. If there were to be any talk of innate structure leading to such learning, it would be relegated to innate structure compounding more cognitive mechanisms which underpinned associative-style learning--perhaps something along the lines of an innate memory capacity or an associative linking component of the brain/mind that allowed semantics to link to syntax, thus solving any linking problem (cf. Pinker), or perhaps something along the lines of an innate architecture structure that paved a way for frequency learning (cf. Elman). On the other hand, while 'nativists-folk' agreed that there was something of interest to be said about such accounts of learning (i.e., artificial intelligence and connectionist strands of Computational Theories of the Mind (CTM)), the strong nativists among them saw through the clever guise of CTM and never let themselves be taken in by what appeared to be simply another attempt to reduce true language (a syntactic structure) to being a simple bi-product of mere computation (cf. Fodor).

 

                        This working paper, the broad second segment of 'Twin Working Papers',[1] attempts to review the literature surrounding the two sides and to bear to light reasons why I believe we have made really very little progress in understanding / explaining how a 'rule-based' equation of language actually arises as a computation in the brain. (The problem that belies 'explanation' is well compounded: Darwin's theory of evolution even fails on this test. So, I suppose we are in good company). Having said this, there is good reason nevertheless to promote the Dual Mechanism Model (DMM) as the best possible candidate to eventually bridge the gap between the two sides of the traditional divide. A caveat here follows: As I hope to show, while the DMM may do well in accounting for a number of phenomena, as it is presently understood, it ultimately fails to provide us with any new, comprehensive model towards an explanation of true language. On one side of the argument, the DMM at best simply refashions the same problems the behaviorists were plagued with more than a half decade ago--namely, the overwhelming 'mystery' of how the brain/mind creates rule-driven syntax (top down) from mere cognitive capacity (bottom-up) (the 'bootstrapping' dilemma). To my mind, while the DMM succeeds in descriptively carving out the data roughly into these two distinctive processes (root-based vs. affix-based) (or frequency vs. rule driven), it does little to explain the distinctions outright or to make any sense of how/why the two processes converge (when they do converge) and/or why they don't (when they don't). (Examples of such convergence have recently been reported by Clahsen (2001) who suggests that not only does derivational morphology, indeed a morphological process, actually show processing similarities akin to lexical retrieval tasks, but so too does high frequency regular rule-based inflectional morphology show similarities akin to lexical retrieval tasks--the two processes may actually converge in becoming rote-learned incorporations of otherwise decomposed morpho-phonetic structures).[2] In the ensuing pages, we examine the role the Dual Mechanism Model has in language acquisition while keeping an eye on how it will ultimately fail in offering any viable complete picture of linguistic knowledge. However, having started on this rather pessimistic note, I proceed in good faith to make clear that the DMM is at the moment our best and most promising tool in sorting through the many complexities language has to offer.

 

                        The Dual Mechanism Model credits the Brain/Mind with having two fundamentally different cognitive modes of language processing--this dual mechanism has recently been reported as reflecting inherent qualitative distinctions found between (i) regular verb inflectional morphology (where rule-based stem+affixes form a large contingency), and (ii) irregular verb construction (where full lexical forms seem to be stored as associative chunks). In this paper, we examine the DMM and broaden its scope as a means to covering the overall grammatical development of Child First Language Acquisition.

 

            Converging Theories and the Brain as Self-Referent

            The one major theme behind much of what is expressed within the notes comes to be centered on a driving notion called 'Converging Theories'. The term 'converging', though used more-or-less as a device to merge the two major theories in the field of language acquisition, equally serves a second purpose having to do with a converging of brain processing. Perhaps the leading motivation behind my compiling the notes for the 'Twin Working Papers' sits with trying to understand the brain, its modular aspects, and how the brain comes to bootstrap itself and becomes a mind worthy of producing language.

 

                        Let's start by saying that the brain is self-referent, meaning it takes in only that input (external to itself) which has already been generated in the brain in the first place (internal of itself). Contrary to this, one is often tempted into thinking that the brain processes such information as if the input were truly novel to the brain in some way or other, as if the input were truly objective, and that the brain then takes this novel input and makes sense out of it (viz., to the extent that there exists an anthropic principle behind man's capacity to reason). This doesn't seem to be the case at all. The brain rather first creates, churns out, takes back in, reexamines, and creates anew again and again. That which we are inclined to perceive and thus understand in our environments is exactly that and only that which has already been born to the brain. The brain is not only self-referent in its processing of knowledge, but modular in its allocation of the processing. The modular aspect of the brain, simply put, could be best summed-up by cutting the brain into two halves (frontal vs. temporal): (i) the temporal sensori-motor brain (the 'animal brain'), and the frontal abstract-brain (the 'human brain'). Each halve has its own processing tasks. Each halve can only understand/process that form of knowledge (externalized to the outside) which it originally conceived (internalized from the inside). The sensori-motor brain is instinctively 'knee-jerk-like' in nature in that it solely responds to a kind of self-preserving behavior. This outward manifestation of this behavior is first generated from the animal brain itself. The sensori-brain works in a 'bottom-up' cognitive manner; it easily runs with a neo-Darwinian story of evolutionary adaptation and accounts for much of what we know resides behind more concrete processing: namely, the inputs-outputs of man's sensual word (visual/auditory, etc.). The abstract-brain is a curiosity of sorts; it is rather non-self-preserving in nature and works in a 'top-down' manner of exaptation in the sense that it caters to no known Darwinian adaptive reasoning. The converging of these two modular aspects of the brain allows for the allocation of specific types of knowledge to enter into specific domains. The dual modes gather and identify only those select forms of the input which it first produced--hence, therein lies a kind of circular loop between (i) the subjective preconceived internalization of behavior/mental processing, (ii) the objective release of the behavior/mental processing in the form of output, returning to the (iii) internalization of the output.

 

                        There exists a long linguistic tradition concerning such lines of reasoning. For instance, the inquiry into how children might eventually 'notice' similarities in the form of frequency-driven input (bottom-up) in both represented utterances and encoded events could be reinterpreted into questioning how the very young child is able to 'notice' such input in the first place. The 'noticing problem' has likewise spun-off into other areas of linguistics having to do with word learning and taxonomy, semantic boot-strapping analogies and innate assumptions leading to morphology and syntax. Unfortunately, the noticing problem often suffers either from circularity in one respect or paradox in another: viz., if one means to say children notice in adult-like terms from the outset of their speech, then surely one must advocate an (adult-like) innate mechanism for such noticing in the first place (citing Plato's problem in general along with the specific linguistic problem of poverty of stimulus). However, contrary to the above citation, noticing hypotheses tend to rely on bottom-up sensori-brain methods for dealing with such learning, not nativist top-down assertion of abstraction. For example, stage-1 language development tends to be described as utterance-event pairings iconic in representation, a Stimulus & Response one-to-one association as opposed to a latter developed stage-2 which tends to be described by saying that the child notices non-iconic abstract representations and similarities having to do with imperfections of rule-based paradigms. Clearly, if the first stage of noticing is correct, and, to a degree we believe it is, then surely one must obtain some means of getting a hold on the knowledge (if not via a priori epistemology, then perhaps at least via some biological modular of brain processing).


            Proposal

          This paper proposes new accounts of old issues surrounding child first language acquisition. The general framework of our proposal is based upon hybrid theories--proposals stemming from recent investigations in the areas of PDP-style connectionism, as well as from more naturalistic studies, and sample-based corpora of Child Language Acquisition. Much of what is sketched out here attempts to converge the leading tenets of two major schools-of-thought--namely, Associative Frequency learning and/vs. Symbolic Rule learning. Cast from this new tenor, proponents calling for a Dual Mechanism Account have emerged advocating a dual cognitive mechanism in dealing with processing differences found amongst regular and irregular verb inflection morphology (inter alia). The main task of this paper is (i) to broaden and extend the dual mechanism account--taking it from the current slate of morphology to the larger syntactic level, and (ii) to spawn some theoretical discussion of how such a dual treatment might have further reaching implications behind more general developmental aspects of language acquisition (as a whole), namely (though not exclusively), the twin benchmarks of syntactic development regarding Lexical vs. Functional grammar. Our central claim will be that whatever factors lead to a deficient morpho-phonolgy, say, at a given stage-1 of development--factors that may potentially lead to the postulation of a non-rule based account--these same factors are likely to be carried over, becoming a factor of deficiency in the overarching syntax. Thus, the tone of the discussion is dualistic throughout. Our main goal is two-prong: first, to assert as the null hypothesis that language acquisition is Discontinuous in nature from that of the adult target grammar, and that this discontinuity is tethered to maturational factors which lay deep-seated in the brain--factors which yield fundamental differences in the actual processing of linguistic material, (a so called 'Fundamental Difference Hypothesis'), and second, to show that this early multi-word non-target stage can be attributed to the first leg of this dual-mechanism--i.e., that leg of cognitive/language processing that governs (i) (quasi-) formulaic structures along with (ii) non-parameterizations. We attribute the generation of this two-stage development to maturational scheduling--viz., a Non-Inflectional stage-1 and/vs. an Optional Inflectional stage-2 (where formal grammatical relations are first learned in a lexical bottom-up fashion and then later regroup to generalize across the board in a word class top-down fashion). It is our understanding that the two-staged development involves and shares both a relevant associative style theory of learning (Associative-style Constructive Learning for our former stage-1), while preserving the best of what syntactic rule-driven theories have to offer (Rule-based Generative Acquisition for our latter stage-2)--hence, the entitled term Converging. By analyzing much of what is in the literature today regarding child language acquisition, as well as drawing from the rich body of work presently being undertaken in connectionism, it is our hope that a new hybrid converging theory of language acquisition can be presented in a way that captures what is inherently good from both schools--an alternative theory that bears more flavor of truth than camp rhetoric.

                                                <>

Why--I don't need any 'rule' to see this tree here in front of me. My eyes work just fine. That is, insofar as there exists a single tree. But, how is it that my 'tree' gets destroyed once I move my head ever so slightly to the east and fall into view of a second tree? The mystery of it all lies somewhere in the dismantling, between a single torn branch of lifted foliage, that forces the rule--for how was I ever to know that this second tree was indeed a tree after all? (JG).

                                                                  <>

"Humans use stories that they tell themselves in order to get themselves to work on this or that. These stories often deal with confrontation between areas and ideas. From some point of view, it is almost always the case that these high-level stories are relevant only as motivation and not really relevant to what eventually happens in terms of technical understanding". (Allen Newell)         

                                                <>                  

Sometimes, stories within a certain school split--e.g., formalist debates on the amount of functionalism Chomsky can and should afford to surrender (cf. Pinker & Bloom). Sometimes differing stories converge--Neo-Behaviorists seeking out an innately based architecture (Jeff Elman).

 

0. Overview

Periodically, say every two or three generations, our vows on science are renewed by a sweeping change of reasoning--cerebral airs that deliver their own inextricable kind of 'off-the-beaten-path' hedonism. These solemn changes are few and far between and constitute what the philosopher of science Thomas Kuhn called 'Paradigm Shifts' (a new-way of thinking about and old-something). Unfortunately, these generational spurts often provide very little in the way of true original thinking, and much of what is behind the fanfare quickly reduces to little more than the recasting of old 'brews' into new 'spells'. Perhaps a glimmer of true original thought (a 'new-something') comes our way every two hundred years or so. We are in luck! One of the greatest breakthroughs in science has been born in the latter half of the last century and has made its way onto the scene shrouded by questions surrounding how one should go about rethinking the Human Brain/Mind--questions that have led to eventualities in Computer Programming, Artificial Intelligence (AI), Language/Grammar, Symbolic-Rule Programs and Connectionism.

      Much of what sits here in front of me, at my desk, can be attributed in one way or another to this 'new-something', and whenever there is a new-something, whether it be steam-locomotives to transistors to tampering with DNA, there's bound to be an earful of debate and controversy. And so remnants of this debate have edged their way ever so slowly onto the platform--from the likes of the psychiatrist Warren McCulloch and mathematician Walter Pitts and their pioneering work on early 'neuron-like' networks (leading to connectionism), to the psychologist Donald Hebb (1940s-50s) (and his revolutionary notion of 'nerve learning' based on oscillatory frequency), to the seminal debates between two great personalities in the AI field, Marvin Minsky and Frank Rosenblatt (1950s-60s), to those in the realm of language, Noam Chomsky (1960s-80s). More recently, the debates have taken on a vibrant life of their own by the advances in computer technology. The most clearly articulated of these recent debates has come to us by two leading figures in the research group called Parallel Distributed Processing (PDP)--namely, Jay McClelland and Dave Rumelhart (1980s).

      Most recently, the debates have come to carry a portmanteau of claims--chief among them is the claim that human brain function, and thus human computation, is not analogues to (top-down) symbolic-based computers (from Chomsky 1980), but rather, the brain and its functional computations should be considered on a par with what we now know about (bottom-up) nerve functions and brain cell activations (to Hebb 1940)--as you see, our time-table has been inverted. In other words, the paradigm shift here occurs the moment one rejects the computer as an antiquated model of the brain (and language), and instead, prompts up a newer model of language and thinking based on older models of connections and connectionism (as presently understood in neurological studies). In this vain, it is fair to say that we should no longer view language as a mere gathering and shaping of atomic particles or logical symbols--much like how one might view the atomic nature of computer language as it is composed of a serial string of 0's and 1's--rationing out sub-parts of the structure in more-or-less equal portions in hope at arriving at a larger and more cohesive general frame of language. It could be argued by connectionists that language is not only much more fluid than what any strict rule-driven/symbolic function could provide, but also that language requires a greater measure of freedom and flexibility at the bottom end. Whereas rules originate top-down, it may likely turn out that bottom-up processes better reflect what is actually going-on, at least in the initial learning processes of language. (One nontrivial note here to remember is that there is a fundamental and crucial difference between (AI) artificial computer (chips) and living brain cell (neurons): the latter must secure survival. There is no sense in the notion that silicon chips need to secure survival, since there is no death of a chip. Cells are living organisms that must somehow ensure its survival, and this survival apparatus certainly for the individual cell, must be organized in a bottom-up fashion). Along these lines, much of what is coming out of West Coast schools-of-thought (connectionism) affords the old school of Gestalt psychology a new lease on life. Some connectionists find themselves talking-up the fact that language can't simply be a cohesion of atoms put together in very elegant ways, but that some 'higher-order' of fluidness must exist. Human cognition is more fluid, more context driven. In a token manner of speaking, Kohler might carry-on here about mysterious magnetic fields which suddenly arise in the brain which pull sub-particle visual stimuli together--any notion of a gestalt brain, of course, has long been disputed (I think, and notwithstanding notions of a 'quantum gravity brain' as advocated by the great mathematician Roger Penrose). However, it should be noted that Gestalt psychology continues to pave a way for a serious return in the contexts of connectionism. (In addition, as a historical footnote, let's not forget that while Rosenblatt's work originated with visual perception, it is now viewed that his work, if carried-out in today's climate of connectionism, would have had potentially serious linguistic implications.).

      And so let us turn to language. With specific regards to grammar, the Word-Perception Model of Rumelhart and McClelland (1981, 1986) has made a dramatic impact in the field. Not only has it provided us with a new way of looking at potential brain processing (a quantitative way of looking with regards to weights of connections, thresholds, memory storage, etc.), it also has made rather precise claims about what kinds of material (qualitative) would be difficult to process in such a model: (the need for hidden units regarding 2-degree complex structures and paradigms, recursive complexity and back-propagation, etc.). Clearly, when one can predict with a fair amount of certainty where problems will be had, and then attempt to account for the nature of the problem in terms of the model, then surely the criterion of explanatory value is close to being met. For example, the now conceded fact that 'hidden units' must be pre-installed (p.c. Jeff Elman, as part of the innate apparatus) in order for the full complexity of language to be process via any PDP, I believe, speaks volumes to where we stand today in explanatory value--in fact, hidden units have now become the main rallying cry for those who postulate for rule-based accounts of language (not to mention the nativists among us. See the contentious debates between Marcus vs. Elman on this matter).

      Finally, the typical intransigence that often shapes and defines opposing views has given way to a certain amount of movement leading to a partial compromise between the two leading schools of thought--as called upon by Steven Pinker and Alan Prince. Specifically speaking, Pinker & Prince's somewhat tentative and partial acceptance of a connectionist model regarding only certain types of lexical processes, if nothing else, has in turn buttressed their own allegiances in the pursuit of upholding counter-claims against proponents for a pure 'Single Mechanism Model' (strictly based on associative learning). And so out of this twist of fates, a renewed and rejuvenated interest in rule-driven processes has been gathering momentum in attempting to seek more narrowly confined rule-based analogies for dealing with specific aspects of language/grammar as a whole.

      As suggested by Newell in the quote above, long-standing dichotomies often provide a variety of clever means to think about a wide range of topics. It goes without saying that as a pedagogical device at least, students not only crave a good debate, but more importantly, they often report that new material introduced in the form of a debate procures a much higher level of understanding. Well, this singular debate has been ongoing for centuries, masked under several different labels: nature vs. nurture, innate vs. learned, hard-wire vs. soft-wire abilities, instinct vs. learning, genetic vs. environment, top-down vs. bottom-up strategies, and as presented herein, the Single vs. Dual Mechanism Model.

 

 

[1].      It is a fact that children do not produce 'adult-like' utterances from the very beginning of their multi-word speech. And so much of the debate ongoing in child first language acquisition has been devoted to the nature and extent of 'What gets missed out where'. Theory internal measures have been spawned every which way in effort to account for the lack of apparent adult-like language in young children--Theories abound. Despite some evidence that would seem to point to the contrary, more robust syntactic theories from the outset continue to view the very young child as maintaining an operative level of language closely bound to abstract knowledge of grammatical categories (Pinker 1984, Hyams 1986, Radford 1990, Wexler 1996). For instance, Pinker (1996) has described early language production in terms of a first order (general natives) cognitive account-suggesting a processing 'bottleneck' effect which is attributed to limited high-scope memory to account for the child's truncated syntax of Tense/Agr/Transitive errors (e.g., Her want), and over application Tense errors (e.g., Does it rolls?). Radford (1990) on the other hand, has maintained a second order (special nativist) maturational account affecting syntactic complexity in order to explain the same lack of adult-like speech. It should be noted that these two nativist positions share a common bond in that they are reactions to much of what was bad coming on the heels of work done in the 1970s--theories which sought to account for such errors on a purely semantic level e.g., Bloom (1975), Braine (1976) and to some extent Bowerman (1973). Steering away from potentially non-nativist associative/semantic-based accounts to proper syntactic-based accounts was viewed by most to be a timely paradigm shift--acting as a safeguard against what might be construed as bad-science Behaviorism (of the purely semantic kind). This shift brought us toward a more accurate 'Nativist' stance swinging the Plato vs. Aristotle debate back to Plato's side, at least for the time being (as witnessed in Chomsky's entitled book 'Cartesian Linguistics')--a move keeping in line with what was then coming down the pike in Chomskyan linguistics. One thing that seems to have caught the imagination of developmental linguists in recent years has been to question again the actual infrastructure of the child-brain that produces this sort of immature grammar--namely, a rejuvenated devotion has reappeared in the literature circumscribing new understandings of age-old questionings dealing with Theory of the Brain/Mind.

[2].      For instance, proponents of Behavioral/Associationist Connectionism today (cf. Jeff Elman, Kim Plunkett, Elizabeth Bates, among others) are more than ready to relinquish the old Chomskyan perspective over special nativism ('special' in that language is viewed as coming from an autonomous region in the brain, unconnected to general cognition or other motor skill development, pace Piaget and vs. general nativism), and have rather shifted their locus on an innateness hypothesis based not on natural language (per se) but rather on a type of innateness based on the actual architecture itself that generates language (architecture meaning brain/mind: viz., an innate Architecture, and not an innate Universal Grammar).

 

[3].      For Chomsky, it was this autonomous Language Faculty (that he refers to as a language organ) that allowed this innate language knowledge to thrive and generate grammar. For the connectionist movement, it is the very architecture itself that is of interest--the input/output language result being a mere product of this perfected apparatus. So in brief, the debate over innateness has taken on a whole new meaning--today, perhaps best illustrated by this more narrow debate over General vs. Special Nativism. We shall forgo the meticulous details of specific theories at hand and restrict ourselves to the rather prosaic observation that the child's first (G)rammar (G1) is not at all contemporary with the adult (T)arget grammar (Gt). Notwithstanding myriad accounts and explanations for this, for the main of this paper, let it suffice to simply examine the idea that the two grammars (child and adult)--and we do consider them as two autonomous and separate grammars--must partake in some amount of Discontinuity: (Gt is less than equal to G1, or Gt<G1) and that such a discontinuity must be stated as the null hypothesis tethered to maturational/biological differences in the brain. Hence, G1 represents the (B)rain at B1..(B2..B3¼Bt ), while Gt represents the brain at Bt).

           

[4].      Discontinuity theories have at their disposal a very powerful weapon in fighting off Continuity theories--whether it be language based, or biological based (noting that for Chomsky, the study of Language, for all intents and purposes, reduces to the study of biology). This great weapon is the natural occurrence of maturational factors in learning. In fact, on a biological level, maturation is taken to be the null hypothesis--whether it be e.g., the emergence and consequent loss of baby teeth, to learning how to walk-talk, to the onset of puberty. In much the way the adult achieves, the achievement can be attributed to the onset of some kind of scheduled-learning timetable--for language, it's an achievement mirroring a process in which the nature and level of syntactic sophistication and its allocation is governed in accordance to how the brain, at the given stage, is able to handle the input.

 

[5].      It is common knowledge that (abstract) grammatical relations are frequently a problem for language acquisition systems. Early reflection on this was made by Brown when he discovered that one could not explain why some grammatical morphemes were acquired later than others simply in terms of input. The question was posed as follows: If all morphemes are equally presented in the ambient input at roughly the same time--contrary to what might be believed, parents' speech toward their children is seldom censored so as to bring about a reduced mode of grammatical communication/comprehension--then, what might account for the observed asymmetrical learning? Similarly, Pienemann (1985, 1988, 1989) has made claims for a grammatical sequencing of learning second language based on complexity of morphology. This question led to early notions of a linguistic maturational timetable, much like what Piaget would have talked about regarding the child's staged-cognitive development--maturation being the only way to address such a staged development. Likewise, a Chomskyan position would have it that there must be something intervening in the child's (inner) brain/mind (albeit not tied to cognition) that brings about the asymmetrical learning since there's no change in the (outer) input. Well, one of the first observations uncovered by Brown was that a child's linguistic stage-1 (with multi-word utterances (MLU) lower than 2) went without formal functional grammar. Brown noted that an initial telegraphic stage of learning ensued absent of abstract grammatical makers such as Inflection, Case and/or Agreement.


[6].      Constructivism vs. Generativism: A Brief Summary

            Constructivists' accounts assume that children's grammatical knowledge initially consists of constructions based on high frequency forms in the input. Their models assume polysemy in representation since lexemes are viewed as being stored in a distributional network in order to encode different meanings: sound-to-meaning links are therefore made based on similar phonological to semantic distributions. Furthermore, it is their general claim that such a correlation is strictly associative, and that it holds between the quantity and quality of the exemplars obtained of particular constructions with the constructions of more general schemes that underlie language use. The constructivist model assumes a 'bottom-up' cognitive scaffolding of language learning (somewhat akin to what Piaget had earlier claimed regarding a cognitive underpinning to language development).

           

                        Generativists' accounts, on the other hand, differ with constructivist models in one very simple account--their models credit children (very early on in their speech development) with tacit syntactic knowledge, unrelated in any way to frequency, data-driven constructivist claims which define language as being tethered in someway to cognition. Generativists in this sense draw on parameter-setting mechanisms (as opposed to data-driven mechanisms) to account for language growth. Generativists maintain two versions of a general language development model; both versions speak to a more innateness (top-down) account of language acquisition. The first version is represented herein as Wexler's O(ptional) I(nfinitive) model (ibid). The OI model grants children from the very earliest stages of development with the abstract knowledge of morphological inflection. According to OI accounts, children have access to inflection. The fact that inflections may optionally project (at stage-1) speaks to matters of specific feature spell-outs of the phrasal projections (i.e., all inflectional phrases project, it is rather the features pertaining to the phrases that may go un(der)specified and thus not project). The second model associated with Radford (Radford & Galasso ibid.) claims that children may initially produce some early inflection, but that there is evidence that the child may not be processing such attested inflection in a true syntactic way: (children at this early stage may in fact be treating inflections in a non-syntactic/derivational manner). In addition to this claim, the general idea here is that a very early grammatical stage indeed exists where one finds no true syntactic processing in the child's speech (i.e., there is a 'No-Inflectional' stage-1). What is of interest to us here regarding Radford's 'No Functional stage' model (Radford 1990) is that it readily overlaps with constructivists claims for their stage-one as well. Specifically speaking, it has become a custom for constructivists to say that although they believe there is no syntax for their early stage-1, children's grammar is indeed protracted and that those 'abstract rules' which underwrite syntax proper eventually do emerge at a later stage in the course of the child's language development. Hence, it would seem that Radford's version and the constructivists version might converge and agree regarding the earliest stage of development. Both models predict similar stages of development: (viz., a stage-1 void of any inflectional). Though this concord of predications appear to be true empirically, theoretical concerns are real and would continue to weigh heavily on the mind's of the linguists, thus undercutting any feeble attempt to accord the two positions.

            Constructivism, and beyond.              One consequence of this style of learning was that children were considered to learn by rote-methods, associative means similar to what Skinner had earlier advocated in Behaviorism. It was somewhat tentatively implied here regarding a very early stage-1 that children didn't start learning language as a set of abstract rules of logic (as Chomsky would have us believe in his notion of generative grammar), but that children would first grapple with the linguistic input by gathering data-driven patterns and constructing broad-range syntactic templates based on such distributional analyses of the patterns (a kind of first order frequency learning). Children would only later on, say at a stage-2 of language acquisition, start to employ Chomskyan style rules to generate a target grammar (as a consequence, see 'U-shape learning' discussed in §60). Benchmarks of development thus followed: (i) Recognition of patterns comes first (no attested phonological/morpho-syntactic over-regularizations) (ii) Abstractions of the patterns come after (attested phonological/morpho-syntactic over-regularizations). Data-driven analogies fit well with recently proposed computational models of syntactic acquisition, a model in which children initially form syntactic templates on the basis of distribution analyses of linguistic input (Cartwright & Brent: 1997). Data-driven models trace their antecedents back to the 1960s. For example, Bellugi (1967), Klima and Bellugi (1966), Braine (1963), initially allowed for a certain amount of formulaic misanalysis to enter into the accounting of non-adult-like stage-1 structures. In a contemporary about-face from much of what had been advocated in the Parameter-theory of the 1980s, Rowland and Pine (2000), among others, have returned to the aforementioned 1960s by similarly calling on first bottom-up, data-driven procedures in securing potential syntactic paradigms. According to such constructivists terms, children do not have any general (rule-driven) knowledge of syntactic categories, at least not until they have acquired enough similar templates from which they can abstract a general pattern. This model would readily explain why over-regularizations tend not to occur very early on in children's speech: if the stage in question employs no rules, then, by definition, no over-regularizations of rules can occur. (It is suggested in this context that the onset of over-regularization as attested in the data indicates the later rule-based stage-2 of development). It has been suggested that what one means by 'until they have acquired enough similar templates' is that there may be a frequency based storage threshold at work that converts an overburdened data-driven analysis into rule-based abstraction: i.e., a kind of Critical Mass Hypothesis which speaks to the notion that an eventual rule-driven grammar requires a certain quantitative 'tipping point' to be reached of (i) precise number of patterns to (ii) general abstraction of patterns. Without a compilation of data, no abstraction can be achieved: children must acquire a sufficient amount/number of exemplars before abstracting general patterns from them can be productive. (See §§26, 27 'Less is More hypothesis').

 

[7].      For instance, Rowland & Pine (op. cit) suggest that e.g., early Subject-Auxiliary inversion errors such as *What he can ride in? (along with the optional target structures showing correct inversion What can he ride in?) cannot be accounted for by a rule-driven theory--viz., if the child has access to the rule, the theory would then have to explain why the child sometimes applies the rule, and sometimes fails to apply it. Rowland & Pine rather suggest an alternative account by saying that as a very early strategy for dealing with complex grammar (e.g., Aux. Inversion, Wh-fronting) children learn these semi-grammatical slots as lexical chunks--a sort of lexicalized grammar--whereby they establish formulaic word combinations: e.g., Wh-word + Auxiliary as opposed to Auxiliary + Wh-word combinations. It was shown that aspects of error rate and optionality (versus rule-driven mechanisms) highly correlated to high vs. low frequency rates of certain combinations in the child's input. This early non-rule-based strategy was then able to account for the vast array of the child data--viz., where the number of non-inverted Auxiliaries vs. inverted Auxiliaries was at a significantly higher rate at the initial stage-1 of development. As an example of a non-rule-based account here, they show that when inversions did occur, they typically involved only a certain select few Wh-words, and not the entire class. Hyams (1986, p.85) somewhat agrees with such a reduced structure when she asserts the following: By hypothesis, the modals (or Aux. Verbs) are unanalyzable during this period.


[8].      Moreover, such claims strongly support Stromswold's (1990) statistical data analyses which clearly demonstrate that children at a very early stage-1 might not productively realize an utterance string containing [don't, can't] in e.g., I/me [don't] want, You [can't] play as the syntactic elements [{Aux} + clitic{n't}], but that such strings were more limitedly realized as quasi-formulaic representations of a negative element. In other words, the claim could be extended to mean that for the child at this stage-1, the lexical item don't/can't reduced to the one-to-one sound-meaning of not: e.g., Robin [don't] [=no(t)] play with pens (Adam28) where the verbal inflection {-s} goes missing since it isn't analyzed as an Aux Verb. (Though see Schütze (2001) for some arguments against this position). Likewise, Brown came to similar tentative conclusions by recognizing that (i) verbal inflection seemed not to be generalized across all verbs in the initial stages, and therefore, that (ii) children didn't really start with rules, but rather employed a strategy of 'lexical-learning'. Early stage-1 inflected verbs might then be learned as separate verbs (chunks) thus explaining observable optionality: since, as the story was then told, 'either you know a rule, and so you always apply it, or you don't'. Optionality of verbal inflection was seen as two singular processes of word acquisition in the brain: both uninflected and inflected words were stored as two different items in the lexicon. (See Bloom 1980 for comments). This notion of a stage-1 learning via non-rule-based means implied that the stage was a formulaic stage, and set-up in such a way as to learn by associative processes buttressed by frequency learning.

           

[9]       Having spelled out some of the issues surrounding Constructivism vs. Generativism, one major question seems to prevail throughout: How might it be possible to bridge the gap between a associative/semantic relations and abstract/formal categories? One way to solve the question might be to stipulate that whatever mechanism generativists cling to regarding their account of syntactic development, proponents of a Converging Theories Model (based on the Dual Mechanism Model) likewise evoke the similar generativist stance: in accepting a strong maturational perspective, we are able to take the best of both positions (i.e., no other explanation needs to be posited outside of what remains to be the generative traditional stance). What the converging theories model offers is a middle of the road theory which suggests that a maturational stage-1 of development is universally maintained, irrespective of whether or not one adheres to a generative or constructivist stance. Theory internal measure put aside, a universal biological account of brain development spreads equally across both models.           

 

            The Dual Mechanism Model

[10].    It has recently been hypothesized that the language faculty consists of a dualistic modular structure made up of two basic components: (i) a Lexical component--which has to do with formulating lexical entries (words), and a (ii) Computational component--which is structured along the lines of algorithmic logic (in a Chomskyan sense of being able to generate a rule-based grammar). It is argued that these two very different modes of language processing reflect the 'low-scope' (1st order) vs. 'high-scope' (2nd order) dichotomy that all natural languages share. Low/High scope would be described here in terms of a how and where certain aspects of language get processed in the brain (see also section [§64] on brain studies). In addition to newly enhanced CT brain imaging devices, multidisciplinary data (e.g. linguistic, psychological and biological) are starting to trickle in providing evidence that a dual mechanism is at work in processing language. Results of experiments indicate that only a dual mechanism can account for distinct processing differences found amongst the formulations of irregular inflected words (e.g., go>went, foot>feet) and regular inflected words (e.g., stop>stopped, hand>hands). The former (lexical) process seems to generate its structure in terms of stored memory and is taken from out of the mental lexicon itself in mere associative means: these measures are roughly akin to earlier Behaviorist ideas on frequency learning. The latter regular mode of generating structure is tethered to a Chomskyan paradigm of (regular) rule-driven grammar--the more creative, productive aspect of language/grammar generation. Such regular rules can be expressed as [Stem]+[affix] representations, whereas a stem constitutes any variable word <X> (old or novel) that must fit within the proper categorization (parts-of-speech) stem. For instance, using a simplified version of Aronoff's realization pair format (1994, as cited in Clahsen 2001, p. 11), the cited differences in parsing found between e.g., (i) a regular [Stem + affix] (decomposed) construction vs. (ii) an irregular copular 'Be' [Stem] (full-form) lexical item can be notated as follows:

 

                        a. <[V, 3sg, pres, ind], X+s>

                        b. <[V, 3sg, pres, ind, BE], is>

           

            The regular 3Person/Singular/Present rule in (a) spells out the bracketed functional INFLectional features of Tense/Agreement by adding the exponent 's' to the base variable stem 'X'. The features in (b) likewise get spelled; but rather than in the form of an exponent, the features are built into the lexeme 'BE' by the constant form is. Once the more specific, irregular rule is activated, the default regular rule-base spell-out is blocked-preventing the overgeneralization of *bes.

           

[11].    INFLection.     Recent research conducted by Pinker (MIT), Clahsen (et al.) (Essex), among others, has shown that a dual learning mechanism might be at work in acquisition of a first language. The research first focuses on terminology. It is said that there are two kinds of rules for Inflection: an Inflection based on lexical rules, and an Inflection based on combinatory rules. In short, the types of rules are described as follows:

                        (i) Lexical Rules: Lexical rules (or lexical redundancy rules) are embedded in the lexical items themselves ('bottom-up'). Lexical rules may be reduced to being simple sound rules somewhat akin to statistical learning; for instance, associative regularities are built-up from out of the sequencing of lexical items--e.g., the <sing>sang>sung -> ring>rang>rung> sequencing of an infix (vowel change) inflection (presented below)

                        (ii) True Rules: Word inflection of the former type (i.e., lexical rules) is cited as an inflection not based on rules, but rather encoded in the very lexical item itself. True Rule (or affixation), on the other hand, would be a combinatory symbolic process based on variables ('top-down')--a creative endeavor not bound by associative input. Whereas lexical-based inflections are exclusively triggered by frequency and associative learning methods--i.e., they are not prone to deliver the creative learning of novel words with inflection--novel word inflection is generated (by default) once the true rule-based grammar is in place. One simple example that Pinker and Clahsen give in illustrating lexical/associative Inflection is the irregular verbs construction below:

 

           

[12].   Irregular Verb Constructions: The #ing>#ang>#ung paradigm

 

 

Table 1    

a). sing >

sang >

sung

b). ring >

rang >

rung

c).*bring >

*brang >

*brung

 

 

 

            The cause of this commonly made error in (12c) is due to the fact that the phonological patterning of rhyme #ing>#ang>#ung--as a quasi-past-tense infix (lexical-rule) form--is so strong that it often over-rides and out strips the default regular (true-rule) form of V+{ed} inflection for past tense. (Spanish offers many similar examples where frequency of regular verbs affect the paradigm such as the irregular (correct) Roto (=Broke) over-generalization from the (incorrect) regular inflection *Romp-ido.) (*marks ungrammatical structures).

           

[13].    The erroneously over-generated patterns of *bring>brang>brung (for English) and *Romp-ido (for Spanish) are heavily based on statistical frequency learning in the sense that the sound sequences of other patterns (e.g., ring>rang>rung, and infinitive verb V-{er} respectively) contribute to the associative patterning (a frequency effect forming the sound pattern irregular-rule in the former example and a default regular-rule in the latter example). Recall that structured lexical/associative learning merely generalizes, by analogy, to those novel words that are similar to existing ones. Regular grammatical rules (true rules), on the other hand, based on affixation, may apply across the board to any given (variable) syntactic category, be it similar or otherwise. In one sense, the ultimate character of 'true rules' is that which breaks the iconic representation of more primitive, associative-based processes, whether it be a neuropsychological process or some other process.

 

[14].    The point that the actual over-generalized strings (bring>brang>brung) are not found in the input demonstrates that there is some aspect of a rule evoked here--albeit, a rule based on rhyme association, and thus not a 'pure rule' where true (non-associative) variables would be at work. In other words, these lexical rules attributed to irregular formations are to be generalized as a form of associative pattern learning, and not as a true rule, since they are associated with sound sequencing only. One crucial implication of an Inflection generated by a true-rule is that such inflection could be easily applied to novel or unusual words: viz., words never before heard in the input (contrary to frequency learning of lexical rules discussed above--cf. Brown (1957), Berko (1958).

 

[15].    Expanding on previous studies which examined differences in priming effects between Derivational and Inflectional morphology, Clahsen concludes that the difference in priming effects can only be accounted for by a dual mechanism of learning--interpreting the data to show that high priming effects were connected with productive inflectional forms not listed in the mental lexicon, whereas low priming effects were connected to productive derivational forms associated with stem entries.

 

[16].    With regards to German forms of pluralization, Clahsen et al. (p. 21) note that the same argument can be made for a dual mechanism process--viz., the high priming regular (default) plural '-s' (auto-s) contrasts with the low priming of the irregular plural '-er' (kind-er). The raw findings here suggest that certain irregular inflections in German (e.g., participle {-n}, plural {-er}) might be stored in the lexicon as undecomposed form chunks and that these two processes of storage are activated in very different places and manners in the brain--viz., the findings that irregular inflections spawn reduced priming as compared to regular inflection suggest that regular inflections are built forms based on rules that contain variables which make the basic unmarked stem/root available for priming. It is clear from the table below that regular inflected word forms such as {-t} participles and {-s} plurals produce full priming and no word-form frequency effects. For irregular inflected affix forms such as {-n} participles, {-er} plurals and (irregular) {-n} plurals, the opposite pattern appears. The data suggest that irregular forms are stored as undercomposed stems--hence the emergences of full form frequency effects. Regular forms are captured by the full rule process and are stored in a computational manner that works off of variable+stem algorithms--hence, the lack of full-form frequency effects. These differences in German morphology seem to parallel what we find between English (i) Inflectional morphology and (ii) Derivational morphology where the former seeks out specific rule formulations--e.g., V + {ed} = Past, or N + {s} = Plural, etc. and where the latter seeks out associative style sound-to-meaning learning approaches (as in irregular verbs/nouns e.g., go>went, tooth>teeth, etc.) Applying fMRI brain imaging techniques, a consensus has begun to emerge suggesting that the lexical storing of derived stems + suffixes (e.g., teach+{er}) may actually be processed as one single word chunk in the otherwise lexical (word/recognition) temporal-lobe areas of the brain, and not, as intuition would have us believe, as a dual segmented [stem + suffix] lexical structure which has undergone a process much like a morpho-syntactic string). This may be an apparent economical move keeping in line with the classic one- sound-one-meaning association. In noting this, there seems to be a natural tendency in the diachronic study of language to move from (i) rule-driven Inflectional morphology--with more complex rule-driven infrastructures [+Comp] (Comp=complex) to less complex [-Comp] structures--to (ii) association-driven Derivational morphology. This tendency can be easily captured by looking into the way words have evolved over a duration of time--e.g., Break|fast /bre: kfaest/ has evolved from a twin morpheme structure [[Verb Break] + [Noun Fast]] > to Breakfast /bre: kfIst/ [Noun Breakfast] composed of a single morpheme chunk.

 

 

Table 2     Summary of experimental effects                    (Taken from Clahsen et al. 2001: p.26)

                 

Representation

Full priming effect?

Full-form frequency effect?

Source

-t particples: ge[kauf]-t

yes

no

Sonnenstuhl et al. (1999), Clahsen et al. (1997)

-s plurals: [auto]-s

yes

no

Sonnenstuhl&Huth (2001), Clahsen et al. (1997)

-er plurals: [kinder]

no

yes

Sonnenstuhl &Huth (2001) Clahsen et al. (1997)

-n participles: [gelogen]

no

yes

Sonnenstuhl et al. (1999), Clahsen et al. (1997)

-n plurals I: [bauern]

no

yes

Sonnenstuhl&Huth (2001)

-ung nominalizations: [[stift]ung]

yes

yes

Clahsen et al.(2001)

diminutives: [[kind]chen]

yes

yes

Clahsen et al. (2001)

-n plurals II: [[tasche]n]

yes

yes

Sonnenstuhl&Huth (2001)

 

 

[17].    In sum, Pinker and Clahsen assume that the language faculty has a dual architecture comprising of (i) combinatory rule-based lexicon (leading to the lack of full-form effects) and (ii) a structured non-rule-based lexicon (leading to full-form effects). Questions on specifics will surface in the following sections-namely: How are these two methods represented in the brain?        


[18].    A Stage-1 Language Acquisition.       There is a huge and ever-growing body of data today being tallied by developmental linguists in the field which suggest that the brain of a child matures in incremental ways which, among other things, reflects the types of 'staged' language development produced by the child for a given maturational stage. The collected data suggest that children's early multi-word speech demonstrates 'Low-Scope' lexical-specific knowledge, and not abstract true-rule formulations attributed to grammar. Somewhat akin to Piagetian notions of language development (see general nativism [§31] below): One difference being that it need not be tied here, exclusively, to a cognitive apparatus. This maturational theory of language development accounts for the lack of specific linguistic properties by suggesting that the brain is not yet ready to conceptualize higher and more abstract (High-Scope) forms of linguistic conceptualizations.


[19].    The idea behind 'What gets missed out where' in child speech production has given those linguists interested in morphology and syntax a particularly good peek at how the inside of a child's brain might go about processing linguistic information--and other information for that matter. As stated above, research initially carried out by Brown and his team (1973), working under a Chomskyan paradigm of linguistic theory, and consequent work by others (cf. Radford) suggests that there is a stage-1 in language acquisition that tightly constrains the child's speech to simple one-to-two word utterances with no productive forms of verb or noun inflection. One child that appears in the early studies, Allison, provides transcripts between 16-19 months showing no signs of the onset of formal inflectional grammar--only later-on close to two years of age (22-24 months) does inflectional grammar/syntax emerge, and then only in what could be said as a sporadic, optional manner.

     

[20].    This stage-1 is considered to be a grammatical stage with an MLUw (Mean Length of Utterance word) of 2 words or less. More specifically, in the sense of the apparent lack of formal grammar, this shouldn't be confused with the idea of an earlier a-grammatical stage well before the onset of multi-word speech. (Surely, there can be no grammar or syntax of which to speak if there aren't multi-word constructions). This grammatical stage-1 therefore differs with the notion of a one-word stage (MLU=1) where supposedly absolutely no grammar/syntax is at work. The grammatical stage-1 is said to begin roughly with the onset of multi-words at about the age of 18 months (+/-20%). It is reasonable to suppose that such a stage would have target semantic meaning--even though, say the arbitrary 'one-to-one sound-to-meaning' relationship is not of the target type (e.g., onomatopoeia forms /wuwu/=dog, /miau-miau/ =cat, etc.).

 

 

[21].    The above notions beg the question: At what point do we have evidence of grammatical categorization? For example, the traditional distributional criterion that defines the Noun class as that category which may follow Determiners (a/the/many/my/one) made not be available to us if, say, Determiners have yet to emerge. Hence, distributional evidence may be lacking in such cases. One way around the dilemma has been to suggest that early stage-1 grammar is categorical in nature simply owing to a default assumption that categorization is part of the innate ability to acquire language (in Chomskyan terms, part of the richly endowed LAD or Language Faculty) and that words are both inherently categorical and semantic in nature. Pinker (1984) claims that the categorization of early stage-1 words should be roughly pegged to their inferred semantic properties. Radford (1990), in a slightly different approach, prefers to consider such early multi-words at stage-1 as lexical in the sense that (i) they have built-in default lexical categorization abilities (forming classes of Nouns, Verbs, Adjectives, Adverb, and Prepositions), but, at the same time, (ii) rely heavily on their semantic-thematic properties. In any event, either description starkly contrasts with a connectionist view which claims that e.g., the class 'subject' emerges through rote-learning of particular framed constructions. Subject-hood is learned as a category via rote associative learning of thematic relations. Now, it remains unclear to me precisely how close such thematic links to category-hood get to Radford's 1990 interpretation. I would only venture to say that both views share the belief that semantics hold the central cognitive underpinnings upon which syntax can later be built.

 

[22].    This account of stage-1 has been labeled as the Lexical thematic stage-1 in language acquisition (Radford 1990). It is unclear how far Radford would like to go in accepting his stage-1 as cognitively based: the labeling here of lexico-thematic (the term thematic referring to argument structures pegged to semantics) certainly permits some amount of semantics to leak into the discussion. Nevertheless, Radford emphatically rejects the notion that a stage-1 syntax could be exclusively based on semantics. It is here that Radford gets full mileage out of his two-prong converging Lexical-Thematic stage-1 grammar: a stage-1 that is both--

 

(i) 'thematic' in the sense that it leans towards general nativism since simple utterance types at the earliest MLU get directly mapped onto their thematic argument structure; while,

 

                    (ii) 'lexical' in the sense that the child seems to be fully aware that they are dealing with words based on lexical grammatical categories, and not semantic. This is made apparent by how children know the morphological range of category (e.g., Noun, Verb) selectiveness along with inflection distribution.

 

[23].    One argument against a semantically based stage-1 was that from the very beginning, children's productive multi-word speech (MLU= 2+) yielded Inflectional plurals {+s} and gerund {+ing} endings--the first two morphemes to be acquired according to Brown's morpho-sequencing list. These endings were only attached to syntactic categorial word-classes: e.g., {s} to nouns, {ing} to verbs, etc. There seemed to be no attempt by the young child to generalize such inflections onto pure semantic categories. In other words, if children's word classes at this stage-1 were thematic, rather than syntactic in nature, we would expect that specific inflections would be distributed along semantico-thematic lines: e.g., plural {s} to agent, gerund {ing} to action words, etc. (Radford 1990, p. 41). Such findings are not reported in the data. It was this absence of semantically based grammars which led discussions about possible a priori innate grammatical categories, a grammar based on a syntax (without meaning) rather than a syntax based on semantics (meaning) (cf. general vs. special nativism). Although it is indeed correct to suggest that there seem to be no purely semantically based Inflections at stage-1, one argument against the conclusion of the claim, and seemingly in support of a semantically-based stage-1, would be to suggest that, in fact, most utterances at this stage are instances of formulaic constructions. Only at a later stage-2 would we find instances of real productive inflection--viz., even though on the surface, inflection appears to be utilized at stage-1, the surface structure only mimic input driven phonological patterns.

 

[24].    This 'mixed bag' of a grammatical stage is indeed an argument against 'too-strong-of-claim' syntactic-based model of early grammar (assuming that a syntactic version holds as a buttress for Continuity--we shall take some comfort in it however due to the fact that this strong claim we take will be short lived and relegated to the very earliest of grammatical stages: (=MLU below 2). There is a caveat here. One argument, however, against interpreting from no evidence-namely, the observation that no inflection shows up on argument-themes might be the following: If our stage-1 were in fact formulaic, and not rule-based, then there indeed would be no utterance of an improper formulaic inflection attached to a semantic category simply because this would not have been available in the phonological input. Formula constructions come out of the input in a highly regular manner--based on high frequency, saliency and churn out as formulaic un-analyzable chunks. (See §42 for an account of apparently correct parameterized word order found at an otherwise non-parameterized stage of acquisition).

 

[25].    The argument could run as follows. The fact that children at stage-1 never produce e.g., the action-inflection '-ing' to semantically classed action-words like *up-ing/down-ing/over-ing/on-ing, etc. merely indicates that such strings are not part of the available input (particularly note worthy given that our stage- 1 is semi-formulaic in nature). It will be argued that the very earliest of stages (stage-1), addressed herein, is indeed the very earliest of staged developmental grammar--what may have been even termed a-grammatical in previous theories (viz., the one word stage (cf. Atkinson, 1992; Radford, 1990; among others). Let it be known that I am all too ready to acknowledge and agree that language is indeed built upon pure syntax at our stage-2 of development, (and not on semantics): the classic evidence for a syntactic-based language at the earliest stages has been taken from the child's inflectional system at work on the basis of grammatical categories. Notwithstanding early attempts to cast syntactic analyses to early stages of language, there have been attempts in the child language acquisition literature to construct a dual model for stage-1 based on (i) semantico-thematic relations on one hand, and (ii) categorial syntax on the other. This hybrid model has been considered as a lexical-thematic stage-1 of child language acquisition where mere semantic properties tied together those lexical syntactic categories void of any functional material (as related to the functional categories IP & CP). The most fully articulated version of this hybrid theory could be found in Radford (1990):


            


[26].    The question is then put to us in the following form: Is there any evidence at the earliest phases of stage-1 (say MLU<2) that the child actually analyzes strings as a syntactic structure--as opposed to a formulaic speech-utterance (i) which may be tethered to a variety of gradient meanings, and (ii) which may reduce to mere surface-level syntactic phenomena)? In other words, what may appear on the surface as syntax proper, may in all actuality simply be a result of the surface formulae learned and that real tacit syntactic knowledge is not represented. There seems to be little that hinges on the possible alternatives:

                        If, on the one hand, we consider such semi-formula as syntax proper-making our stage-1 (MLU<2) a syntactic stage--then so be it. We are then forced to reconciling our syntactic stage-1 to the one word stage as previously thought and nothing is lost.

                        If, on the other hand, a lexical-thematic stage-1 involved itself with bridging this narrowing gap between formula and syntax--then so be it. The benefits we have gained by adapting this measure is that it allows us a nice continuity bridge onto the later phrases of stage-1 (MLU +2).

           

[27].    One interesting by-product of such a lexical-thematic stage-1 is that it doesn't specify Word Order: word order being traditionally tied to functional parameterization (see Travis, 1984; Atkinson, 1992; Tsimpli 1992; and Galasso, 1999/2003). Coming on the heels of such semantic-based models of language acquisition, claims have been made suggesting that the cause of a semantic stage-1 is due to memory deficits. As part of a Maturational time-table, the child starts off with a very limited memory attention span--this memory deficit (maturational based) triggers the more 'robust & primitive' semantic-lexical level of language (since the lexical component is more salient) to kick start productive communication (see Newport's 'Less-is-More Hypothesis', S. Felix's non-UG/cognitive approach to L2 learning, as well as J. Elman's work in relation to connectionism. For evolutionary accounts, see Bickerton's Proto-language, 1990).

 

Less-is-More Hypothesis.       According to Newport's 'Less-is-More' Hypothesis, a Radfordian style maturational time-table--dividing our stage-1 from stage-2--would be linked to 'working memory' deficits: Stage-1 starts with early limited memory and thus can solely rely on the more primitive and robust rote-learned and formulaic structures. (One needn't say that all possible structures at stage-1 are rote or formula--let it suffice to say that the flavor of the stage suggests little if any evidence for 'true-rule' formations or parameterizations, citing stage-1 variant Word Orders and null INFLections). This handicap of low memory actually works as an advantage for the child in that it serves to constrain the perceived input to basic degree-0 SV(X) structures--the structures are ready-made by the lower-level cognitive processes and made available to the stage-1 child. Lower-level memory seeks out idiomatic lexical-based categories or lexical based morphemes as opposed to functional, syntactic based morphemes/categories (termed 'l'-morphemes' vs. 'f'-morphemes respectively by Pesetsky (1995) as understood in Distributional Morphology (see [§54] ). (N.B. Felix (1981) as well as Krashen's claim that it is precisely this over-production of the cognitive apparatus/high memory that makes second language learning so fraught with difficulty--having to 'learn' language overtly instead of naturally 'acquiring' it in a natural setting.)

 

[28].    We can better frame arguments that claim for a cognitive/memory dependence for language acquisition by addressing the very nature of syntax. First, syntax requires much more in the way of computational memory. (Or perhaps the question is better framed conversely--viz., more memory forces the computation to reorganize itself by way of syntax.) The emergence of syntax coincides with the onset of higher (quantity) amounts of language material--i.e., a higher number of memorized words/strings leading to longer and a richer complexity of sentences, etc. For instance, Degree-zero structures (say, basic SV sentences, order irrelevant) come at the expense of lower memorizations, while, et vice versa, Degree-1 structures, (embeddings, binding, recursiveness) come at a much higher cost with regards to memorization. Why is that? Well, in one manner of speaking the reason is self serving: simply due to the fact that in order to have a degree-1 sentence, the empirical (maturational) data dictates that a child must have, at some prior time, gone through a degree-0 stage, a process that mirrors memorization capacity. But more to the point, the reason for this mental/computational juggling has to do with how our brains go about making the most out of our limited memory capacity. The very nature of these high amounts of material forces a shift in how the brain can process (parse) the material. It is believed in the neuro-linguistic community that the shift here--both in the quantity and quality of language--triggers the already over burdened process of rote-learning and memorization to be lifted, triggering the share of burden to be replaced by rule-based processes (variables, categories, etc.). Such rule-based learning frees up space in the lexical component of the brain (say, the list of words stored) and allows new routes to be mapped. In other words, such a huge volume of material forces new ways of organizing the input (hence, categorization). In sum, the two-prong development as sketched out above might proceed as follows:

 

 

(i)        At the Micro-Development level (stage-1) the data-stream is reduced for the child in terms of its cognitive saliency: (the data-output is not changed, rather it's the intervening deficiency of the child's mental processing that overall affects these data). The child, working with a primary memory 'tool-kit', allows a small subset-a of language input, this in turn allows the child to ultimately deal with less data enabling rote-learning to take place. (N.B. It is generally acknowledged that any memory deficit or trauma resulting in language attrition would first affect the more abstract levels of language/syntax).

           

 

(ii)       At the Macro-Development level (Stage-2) the data stream is affected by the upsurge in memorization that in turn expands what becomes salient for the child. Perhaps having to do with the triggering of hidden units at the end of stage-1, the child now is in a position of capably taking the data and applying paradigmatic structures--all which lead to formal (stage-2) grammar. Thus, Macro development makes available more memory which in turn spawns new ways of handling the material--the initial process of stage-1 rote association and memory is no longer adequate and syntax proper emerges as a way of handling both the quantity and quality of this newfound material.


[29].    What syntax allows the brain to do is categorize and form analogies based on the vast amount of input, rather than to memorize and store all input as meaningful chunks (with an associative sound-to-meaning relationship imposed). This results ultimately in a finite array of neuro-linguistic networks in the brain. Hence, in a basic input-output model--similar to what we understand to be happening in behaviorist stimulus and response associative models--quantity of input equates to quality of brain processing. As is evident, the classic enigma (chicken and the egg scenario) remains: Is it this newly wired brain which now seeks out the formations of paradigms and variable rules that is responsible for the quantum leap of quality of language, or is it this quality leap in language that somehow drives the changes in the brain? This is tantamount to the classic Nature vs. Nurture debate. My hunch here is that (i) the nature of the raw Data as it is (ii) tied to cognitive processing may be the driving force behind any structural changes that occur in the brain--in other words, language changed the brain and not the other way around. (It may ultimately be impossible to separate the one from the other). But this is only a hunch, and again, it reduces to the same catch-22 scenario (if it is the data that is the driving force behind the change, how do we account for a maturational protracted development, and secondly, surely, how the brain handles and processes the data must be part of the equation for any theory that attempts to account for developmental stages of language). In a certain sense, Newport's 'less-is-more' hypothesis simply restates this same paradox. Regarding architecture and the nature vs. nurture debate, clearly all linguists suppose now that some connection must be made between genes and environment Thus, a two-staged development follows:

 

(i)        Stage-1 comes with low-level memory with strong correlates to semantics and rote-learning. As a consequence, one-to-one sound-to-meaning correspondence ensues explained by more prosaic economic constraints placed on cognition.

 

(ii)       Stage-2 comes with increased memory that (for reasons having to do with processes of parsing, etc.) triggers high level categorization and syntax. One-to-many/many-to-one relations are evoked triggering a highly rich paradigmatic grammar.

 

[30].    Radford (2000) more recently has gone on the record as saying that the Language Faculty specifies a universal set of features--namely, that a child acquiring language has to learn which subset of these features are assembled into the lexical items as +universal (all other features awaiting parameterization via a maturational timetable). The problem for the child is assembling the features into lexical items. To a certain degree, the child needs to build-up lexical items one feature at a time (see Clahsen's Lexical Learning Hypothesis). Thus, the issue for Radford is that there are innate architectural principles--loosely referred to as an Innate Grammar Construction Algorithm--which determine how lexical items project into syntactic structures. This begs the following question: How much of this initial learning deficit cited for our lexical stage-1 is owed to the child's protracted language development being exclusively tied to a maturational based low-scope cognitive template--a potentially semantic based template upon which later formal abstract categories (such as functional categories) can be mapped? It is clear at least that more abstract functional categories come on-line later in the course of development.

           

[31].    General vs. Special Nativism.             This is a nice place to pause and examine the role that our lower-scope cognitive processes might play in deciphering between Stage-1 vs. stage-2 grammar. In brief, there are two schools of thinking on this, both of which could maintain general ties to a Chomskyan paradigm. One school takes an evolutionary stance (Pinker & Bloom) and basically claims that lexical learning leading to grammaticalization is heavily based on what are preexisting cognitive constraints (much in the manner of former Piagetian models of language development). Such linguists would disagree with the notion that a special module in the brain must exist in order for language to manifest. Recall, Chomsky in his strongest claims suggests that the Language Faculty (LF) is an independent autonomous organ found somewhere in the mind/brain (similar to say the liver or the stomach) and that this LF organ shares very little in the way of general cognitive processes--a language module all to its own and without common lineages to other regions or modules of the brain. This notion is referred to in the language acquisition literature as a Double Disassociation Hypothesis (disassociation between formal language and cognition) (see Smith and Tsimpli for some discussion). The second anti-Neo-Darwinian position suggests that a special module in the brain is required for language, and that language learning can be accounted for by reduced/non-cognitive means.                    

 

[32].    Regarding the debate over General vs. Special Nativism, it is still unclear how the debate should be viewed. Much of the argument quickly degenerates into the classic aforementioned 'chicken-and-the-egg' dilemma of being circular in nature: e.g., (i) The Special Nativist claims that the child first needs syntax to uncover the underlying semantics (syntactic-bootstrapping), while (ii) The General nativist insures that in order to properly construct a syntax category in the first place, general properties of (inherent) cognitive-semantics must be observed (semantic-bootstrapping). (Interesting, Chomsky's most recent work on Minimalism suggest that there may be economical constraints on language processing (from out of Logical Form). While it is still unclear how to interpret the wide range of claims on the minimalist table, and Chomsky himself often remains agnostic at these levels of inquiry, such economic constraints could be interpreted as indeed not pertaining to consideration of pure syntax, and rather adhering to more cognitive levels of processing: e.g., Minimalist notions of shortest move, minimal amount of rules, and to a certain degree, the objective essence behind the (PF) phonological form of language as versus the (LF) logic form, etc.). On one hand however, it seems to me that a dualist approach to acquisition (as presented herein) would initially favor a first order semantic-bootstrapping view, given that semantics seem to play an essential role in language acquisition early on before the onset of syntax. (There is no conclusion drawn here, as nothing argued in this paper hinges on that debate).

 

[33].                Why--I don't need any 'rules' to see this tree. My eyes work just fine. That is, insofar as there exists a single tree. How is it that my 'tree' gets destroyed once I move my head ever so slightly to the east and fall into view of a second tree? The mystery of it all lies somewhere in the dismantling, between a single torn branch of lifted foliage, that forces the rule--for how was I ever to know that this second tree was indeed a tree after all?

 

            Well, the above passage makes for a nice analogy, but it merits a closer look. When I look at this cup of coffee in front of me, reach out for it, and drink its contents, it certainly appears to me that I do little more than what my own cognitive abilities lets me achieve--I don't perform any 'abstract rule' formulations, procedures as such: although, I do agree that one could possibly uncover all of the aforementioned procedural content coming together such as e.g., Gestalt psychology, visual cortex processing, contextual/meta-linguistic background of say [+liquid] => drink => mouth, along with muscle motor coordination that allows me to see into space reaching and holding the cup without breaking the glass (etc.). In face of all this possible 'theory' nonetheless, it remains somewhat natural for me to maintain the idea that when I 'see' a tree, I just 'see' a tree (period). But much has come out of Gestalt theory in the past (being somewhat reframed here in the present context of connectionism) that suggests there may be something to this very natural notion of just seeing after all. Gestalt theory on perception states that there are first-order perceptions in which, say, a child might see a line or a slope in a strict iconic representation of the visual field. No rules apply--and there is a strict Stimulus and Response (S&R) equation involved. Regarding language acquisition, this first-order representation could be illustrated by the early onset of vowel recognition (i.e., environmental sound)--and not sound as filtered through assimilation processes, etc. (as seen in the u-shaped model [§61] below). At a later stage of perception, second-order perceptions allow the child to break iconic mappings and allow lines, slopes, etc. to begin to be seen (with less vividness) as e.g., a chair--now, a larger, somewhat more generic unit, which embodies the lower level visual stimuli. It seems to be the case that the role of second-order perceptions is to pull and frame larger aspects of Objects and Events--in linguistic terms, forming Nouns (out of the former) and Verbs (out of the latter). So regarding language, we should be clear that by the time a child reaches the very first stages of language development--where a child is said to begin producing single word utterances--s/he has already moved from the first-order perceptual field into a second-order field. So, the idea that children may have some means to rules, perhaps bootstrapped from Gestalt psychology (the General Nativist Position) may not be totally implausible. However, and more to our point, Newport's 'Less-is-More' hypothesis just as well could be interpreted to fit Gestalt findings: when memory/cognitive capacity is low, children see in a fixed iconic manner, and when memory/cognitive capacity increases, the child reorganized t