Towards a 'Converging Theories' Model of Language
Acquisition:
Continuing Discontinuity
Joseph Galasso
California State University, Northridge
joseph.galasso@csun.edu
(2003b)
Introduction
There
was a time when the classical split between behaviorism and nativism was easily identifiable, each rationale breaking down along their
traditional fault lines. On one hand, you had the 'behaviorists-folk' who
believed more or less that all forms of learning, language included, could be
somehow reduced and extracted from ambient input found in the environment. If
there were to be any talk of innate structure leading to such learning, it
would be relegated to innate structure compounding more cognitive mechanisms
which underpinned associative-style learning--perhaps something along the lines
of an innate memory capacity or an associative linking component of the
brain/mind that allowed semantics to link to syntax, thus solving any linking
problem (cf. Pinker), or perhaps something along the lines of an innate architecture
structure that paved a way for frequency learning (cf. Elman). On the other
hand, while 'nativists-folk' agreed that there was something of interest to be
said about such accounts of learning (i.e., artificial intelligence and
connectionist strands of Computational Theories of the Mind (CTM)), the strong
nativists among them saw through the clever guise of CTM and never let
themselves be taken in by what appeared to be simply another attempt to reduce
true language (a syntactic structure) to being a simple bi-product of mere
computation (cf. Fodor).
This
working paper, the broad second segment of 'Twin Working Papers',[1] attempts to review the literature surrounding the two
sides and to bear to light reasons why I believe we have made really very
little progress in understanding / explaining how a 'rule-based' equation of
language actually arises as a computation in the brain. (The problem that
belies 'explanation' is well compounded: Darwin's theory of evolution even
fails on this test. So, I suppose we are in good company). Having said this,
there is good reason nevertheless to promote the Dual Mechanism Model (DMM) as
the best possible candidate to eventually bridge the gap between the two sides
of the traditional divide. A caveat here follows: As I hope to show, while the
DMM may do well in accounting for a number of phenomena, as it is presently
understood, it ultimately fails to provide us with any new, comprehensive model
towards an explanation of true language. On one side of the argument, the DMM
at best simply refashions the same problems the behaviorists were plagued with
more than a half decade ago--namely, the overwhelming 'mystery' of how the
brain/mind creates rule-driven syntax (top down) from mere cognitive capacity
(bottom-up) (the 'bootstrapping' dilemma). To my mind, while the DMM succeeds
in descriptively carving out the data roughly into these two distinctive
processes (root-based vs. affix-based) (or frequency vs. rule driven), it does
little to explain the distinctions outright or to make any sense of how/why the
two processes converge (when they do converge) and/or why they don't (when they
don't). (Examples of such convergence have recently been reported by Clahsen
(2001) who suggests that not only does derivational morphology, indeed a
morphological process, actually show processing similarities akin to lexical
retrieval tasks, but so too does high frequency regular rule-based inflectional
morphology show similarities akin to lexical retrieval tasks--the two processes
may actually converge in becoming rote-learned incorporations of otherwise
decomposed morpho-phonetic structures).[2] In the ensuing pages, we examine the role the Dual
Mechanism Model has in language acquisition while keeping an eye on how it will
ultimately fail in offering any viable complete picture of linguistic
knowledge. However, having started on this rather pessimistic note, I proceed
in good faith to make clear that the DMM is at the moment our best and most
promising tool in sorting through the many complexities language has to offer.
The
Dual Mechanism Model credits the Brain/Mind with having two fundamentally
different cognitive modes of language processing--this dual mechanism has
recently been reported as reflecting inherent qualitative distinctions found
between (i) regular verb inflectional morphology (where rule-based stem+affixes
form a large contingency), and (ii) irregular verb construction (where full
lexical forms seem to be stored as associative chunks). In this paper, we
examine the DMM and broaden its scope as a means to covering the overall
grammatical development of Child First Language Acquisition.
Converging
Theories and the Brain as Self-Referent
The one major theme behind much of what is expressed
within the notes comes to be centered on a driving notion called 'Converging
Theories'. The term 'converging', though used more-or-less as a device to merge
the two major theories in the field of language acquisition, equally serves a
second purpose having to do with a converging of brain processing. Perhaps the
leading motivation behind my compiling the notes for the 'Twin Working Papers'
sits with trying to understand the brain, its modular aspects, and how the brain comes to bootstrap itself and becomes a mind worthy of producing language.
Let's
start by saying that the brain is self-referent, meaning it takes in only that input (external to
itself) which has already been generated in the brain in the first place
(internal of itself). Contrary to this, one is often tempted into thinking that
the brain processes such information as if the input were truly novel to the
brain in some way or other, as if the input were truly objective, and that the
brain then takes this novel input and makes sense out of it (viz., to the
extent that there exists an anthropic principle behind man's capacity to reason). This doesn't seem to
be the case at all. The brain rather first creates, churns out, takes back in, reexamines,
and creates anew again and again. That which we are inclined to perceive and
thus understand in our environments is exactly that and only that which has
already been born to the brain. The brain is not only self-referent in its
processing of knowledge, but modular in its allocation of the processing. The
modular aspect of the brain, simply put, could be best summed-up by cutting the
brain into two halves (frontal vs. temporal): (i) the temporal sensori-motor
brain (the 'animal brain'), and the
frontal abstract-brain (the 'human
brain'). Each halve has its own processing tasks. Each halve can only
understand/process that form of knowledge (externalized to the outside) which
it originally conceived (internalized from the inside). The sensori-motor brain
is instinctively 'knee-jerk-like' in nature in that it solely responds to a
kind of self-preserving behavior. This outward manifestation of this behavior
is first generated from the animal brain itself. The sensori-brain works in a
'bottom-up' cognitive manner; it easily runs with a neo-Darwinian story of
evolutionary adaptation and
accounts for much of what we know resides behind more concrete processing:
namely, the inputs-outputs of man's sensual word (visual/auditory, etc.). The
abstract-brain is a curiosity of sorts; it is rather non-self-preserving in
nature and works in a 'top-down' manner of exaptation in the sense that it caters to no known Darwinian adaptive reasoning. The
converging of these two modular aspects of the brain allows for the allocation
of specific types of knowledge to enter into specific domains. The dual modes
gather and identify only those select forms of the input which it first
produced--hence, therein lies a kind of circular loop
between (i) the subjective preconceived internalization of behavior/mental
processing, (ii) the objective release of the behavior/mental processing in the
form of output, returning to the (iii) internalization of the output.
There
exists a long linguistic tradition concerning such lines of reasoning. For
instance, the inquiry into how children might eventually 'notice' similarities
in the form of frequency-driven input (bottom-up) in both represented
utterances and encoded events could be reinterpreted into questioning how the
very young child is able to 'notice' such input in the first place. The
'noticing problem' has likewise spun-off into other areas of linguistics having
to do with word learning and taxonomy, semantic boot-strapping analogies and
innate assumptions leading to morphology and syntax. Unfortunately, the
noticing problem often suffers either from circularity in one respect or
paradox in another: viz., if one means to say children notice in adult-like
terms from the outset of their speech, then surely one must advocate an
(adult-like) innate mechanism for such noticing in the first place (citing
Plato's problem in general along with the specific linguistic problem of
poverty of stimulus). However, contrary to the above citation, noticing
hypotheses tend to rely on bottom-up sensori-brain methods for dealing with
such learning, not nativist top-down assertion of abstraction. For example,
stage-1 language development tends to be described as utterance-event pairings
iconic in representation, a Stimulus & Response one-to-one association as
opposed to a latter developed stage-2 which tends to be described by saying
that the child notices non-iconic abstract representations and similarities
having to do with imperfections of rule-based paradigms. Clearly, if the first
stage of noticing is correct, and, to a degree we believe it is, then surely
one must obtain some means of getting a hold on the knowledge (if not via a
priori epistemology, then perhaps at
least via some biological modular of brain processing).
Proposal
This
paper proposes new accounts of old issues surrounding child first language
acquisition. The general framework of our proposal is based upon hybrid
theories--proposals stemming from recent investigations in the areas of
PDP-style connectionism, as well as from more naturalistic studies, and
sample-based corpora of Child Language Acquisition. Much of what is sketched
out here attempts to converge the leading tenets of two major
schools-of-thought--namely, Associative Frequency learning and/vs. Symbolic Rule
learning. Cast from this new tenor, proponents calling for a Dual Mechanism
Account have emerged advocating a
dual cognitive mechanism in dealing with processing differences found amongst
regular and irregular verb inflection morphology (inter alia). The main task of this paper is (i) to broaden and
extend the dual mechanism account--taking it from the current slate of
morphology to the larger syntactic level, and (ii) to spawn some theoretical
discussion of how such a dual treatment might have further reaching
implications behind more general developmental aspects of language acquisition
(as a whole), namely (though not exclusively), the twin benchmarks of syntactic
development regarding Lexical vs. Functional grammar. Our central claim will be
that whatever factors lead to a deficient morpho-phonolgy, say, at a given
stage-1 of development--factors that may potentially lead to the postulation of
a non-rule based account--these same factors are likely to be carried over,
becoming a factor of deficiency in the overarching syntax. Thus, the tone of
the discussion is dualistic throughout. Our main goal is two-prong: first, to
assert as the null hypothesis that language acquisition is Discontinuous in nature from that of the adult target grammar, and
that this discontinuity is tethered to maturational factors which lay
deep-seated in the brain--factors which yield fundamental differences in the
actual processing of linguistic material, (a so called 'Fundamental Difference
Hypothesis'), and second, to show that this early multi-word non-target stage
can be attributed to the first leg of this dual-mechanism--i.e., that leg of
cognitive/language processing that governs (i) (quasi-) formulaic structures
along with (ii) non-parameterizations. We attribute the generation of this
two-stage development to maturational scheduling--viz., a Non-Inflectional
stage-1 and/vs. an Optional Inflectional stage-2 (where formal grammatical
relations are first learned in a lexical bottom-up fashion and then later
regroup to generalize across the board in a word class top-down fashion). It is
our understanding that the two-staged development involves and shares both a
relevant associative style theory of learning (Associative-style
Constructive Learning for our former
stage-1), while preserving the best of what syntactic rule-driven theories have
to offer (Rule-based Generative Acquisition for our latter stage-2)--hence, the entitled term Converging.
By analyzing much of what is in the
literature today regarding child language acquisition, as well as drawing from
the rich body of work presently being undertaken in connectionism, it is our
hope that a new hybrid converging theory of language acquisition can be
presented in a way that captures what is inherently good from both schools--an
alternative theory that bears more flavor of truth than camp rhetoric.
<>
Why--I don't need any 'rule' to see this tree here
in front of me. My eyes work just fine. That is, insofar as there exists a
single tree. But, how is it that my 'tree' gets destroyed once I move my head
ever so slightly to the east and fall into view of a second tree? The mystery
of it all lies somewhere in the dismantling, between a single torn branch of
lifted foliage, that forces the rule--for how was I ever to know that this
second tree was indeed a tree after all?
(JG).
<>
"Humans use stories that they tell themselves in
order to get themselves to work on this or that. These stories often deal with
confrontation between areas and ideas. From some point of view, it is almost
always the case that these high-level stories are relevant only as motivation
and not really relevant to what eventually happens in terms of technical
understanding". (Allen Newell)
<>
Sometimes, stories within a certain school
split--e.g., formalist debates on the amount of functionalism Chomsky can and
should afford to surrender (cf. Pinker & Bloom). Sometimes differing
stories converge--Neo-Behaviorists seeking out an innately based architecture (Jeff Elman).
0. Overview
Periodically, say every two or three
generations, our vows on science are renewed by a sweeping change of
reasoning--cerebral airs that deliver their own inextricable kind of
'off-the-beaten-path' hedonism. These solemn changes are few and far between
and constitute what the philosopher of science Thomas Kuhn called 'Paradigm
Shifts' (a new-way of thinking about and old-something). Unfortunately, these
generational spurts often provide very little in the way of true original
thinking, and much of what is behind the fanfare quickly reduces to little more
than the recasting of old 'brews' into new 'spells'. Perhaps a glimmer of true
original thought (a 'new-something') comes our way every two hundred years or
so. We are in luck! One of the greatest breakthroughs in science has been born
in the latter half of the last century and has made its way onto the scene
shrouded by questions surrounding how one should go about rethinking the Human
Brain/Mind--questions that have led to eventualities in Computer Programming,
Artificial Intelligence (AI), Language/Grammar, Symbolic-Rule Programs and
Connectionism.
Much
of what sits here in front of me, at my desk, can be attributed in one way or
another to this 'new-something', and whenever there is a new-something, whether
it be steam-locomotives to transistors to tampering with DNA, there's bound to
be an earful of debate and controversy. And so remnants of this debate have
edged their way ever so slowly onto the platform--from the likes of the
psychiatrist Warren McCulloch and mathematician Walter Pitts and their
pioneering work on early 'neuron-like' networks (leading to connectionism), to
the psychologist Donald Hebb (1940s-50s) (and his revolutionary notion of
'nerve learning' based on oscillatory frequency), to the seminal debates
between two great personalities in the AI field, Marvin Minsky and Frank
Rosenblatt (1950s-60s), to those in the realm of language, Noam Chomsky
(1960s-80s). More recently, the debates have taken on a vibrant life of their
own by the advances in computer technology. The most clearly articulated of
these recent debates has come to us by two leading figures in the research
group called Parallel Distributed Processing (PDP)--namely, Jay McClelland and Dave Rumelhart
(1980s).
Most
recently, the debates have come to carry a portmanteau of claims--chief among
them is the claim that human brain function, and thus human computation, is not
analogues to (top-down) symbolic-based computers (from Chomsky 1980), but
rather, the brain and its functional computations should be considered on a par
with what we now know about (bottom-up) nerve functions and brain cell
activations (to Hebb 1940)--as you see, our time-table has been inverted. In
other words, the paradigm shift here occurs the moment one rejects the computer
as an antiquated model of the brain (and language), and instead, prompts up a
newer model of language and thinking based on older models of connections and
connectionism (as presently understood in neurological studies). In this vain,
it is fair to say that we should no longer view language as a mere gathering
and shaping of atomic particles or logical symbols--much like how one might view
the atomic nature of computer language as it is composed of a serial string of
0's and 1's--rationing out sub-parts of the structure in more-or-less equal
portions in hope at arriving at a larger and more cohesive general frame of
language. It could be argued by connectionists that language is not only much
more fluid than what any strict rule-driven/symbolic function could provide,
but also that language requires a greater measure of freedom and flexibility at
the bottom end. Whereas rules originate top-down, it may likely turn out that
bottom-up processes better reflect what is actually going-on, at least in the
initial learning processes of language. (One nontrivial note here to remember
is that there is a fundamental and crucial difference between (AI) artificial
computer (chips) and living brain cell (neurons): the latter must secure
survival. There is no sense in the notion that silicon chips need to secure
survival, since there is no death of a chip. Cells are living organisms that
must somehow ensure its survival, and this survival apparatus certainly for the
individual cell, must be organized in a bottom-up fashion). Along these lines,
much of what is coming out of West Coast schools-of-thought (connectionism)
affords the old school of Gestalt psychology a new lease on life. Some
connectionists find themselves talking-up the fact that language can't simply
be a cohesion of atoms put together in very elegant ways, but that some
'higher-order' of fluidness must exist. Human cognition is more fluid, more
context driven. In a token manner of speaking, Kohler might carry-on here about
mysterious magnetic fields which suddenly arise in the brain which pull
sub-particle visual stimuli together--any notion of a gestalt brain, of course,
has long been disputed (I think, and notwithstanding notions of a 'quantum
gravity brain' as advocated by the great mathematician Roger Penrose). However,
it should be noted that Gestalt psychology continues to pave a way for a
serious return in the contexts of connectionism. (In addition, as a historical
footnote, let's not forget that while Rosenblatt's work originated with visual
perception, it is now viewed that his work, if carried-out in today's climate
of connectionism, would have had potentially serious linguistic implications.).
And so
let us turn to language. With specific regards to grammar, the Word-Perception
Model of Rumelhart and McClelland (1981, 1986) has made a dramatic impact in
the field. Not only has it provided us with a new way of looking at potential brain
processing (a quantitative way of
looking with regards to weights of connections, thresholds, memory storage,
etc.), it also has made rather precise claims about what kinds of material (qualitative) would be difficult to process in such a model: (the need
for hidden units regarding 2-degree complex structures and paradigms, recursive
complexity and back-propagation, etc.). Clearly, when one can predict with a
fair amount of certainty where problems will be had, and then attempt to
account for the nature of the problem in terms of the model, then surely the
criterion of explanatory value is close to being met. For example, the now
conceded fact that 'hidden units' must be pre-installed (p.c. Jeff Elman, as
part of the innate apparatus) in order for the full complexity of language to
be process via any PDP, I believe, speaks volumes to where we stand today in
explanatory value--in fact, hidden units have now become the main rallying cry
for those who postulate for rule-based accounts of language (not to mention the
nativists among us. See the contentious debates between Marcus vs. Elman on
this matter).
Finally,
the typical intransigence that often shapes and defines opposing views has
given way to a certain amount of movement leading to a partial compromise
between the two leading schools of thought--as called upon by Steven Pinker and
Alan Prince. Specifically speaking, Pinker & Prince's somewhat tentative
and partial acceptance of a connectionist model regarding only certain types of
lexical processes, if nothing else, has in turn buttressed their own
allegiances in the pursuit of upholding counter-claims against proponents for a
pure 'Single Mechanism Model' (strictly based on associative learning). And so
out of this twist of fates, a renewed and rejuvenated interest in rule-driven
processes has been gathering momentum in attempting to seek more narrowly
confined rule-based analogies for dealing with specific aspects of
language/grammar as a whole.
As
suggested by Newell in the quote above, long-standing dichotomies often provide
a variety of clever means to think about a wide range of topics. It goes
without saying that as a pedagogical device at least, students not only crave a
good debate, but more importantly, they often report that new material introduced
in the form of a debate procures a much higher level of understanding. Well,
this singular debate has been ongoing for centuries, masked under several
different labels: nature vs. nurture, innate vs. learned, hard-wire vs.
soft-wire abilities, instinct vs.
learning, genetic vs. environment, top-down vs. bottom-up strategies, and as presented herein, the Single vs.
Dual Mechanism Model.
[1]. It is a fact that
children do not produce 'adult-like' utterances from the very beginning of
their multi-word speech. And so much of the debate ongoing in child first
language acquisition has been devoted to the nature and extent of 'What gets
missed out where'. Theory internal
measures have been spawned every which way in effort to account for the lack of
apparent adult-like language in young children--Theories abound. Despite some
evidence that would seem to point to the contrary, more robust syntactic
theories from the outset continue to view the very young child as maintaining
an operative level of language closely bound to abstract knowledge of
grammatical categories (Pinker 1984, Hyams 1986, Radford 1990, Wexler 1996).
For instance, Pinker (1996) has described early language production in terms of
a first order (general natives) cognitive account-suggesting a processing
'bottleneck' effect which is attributed to limited high-scope memory to account
for the child's truncated syntax of Tense/Agr/Transitive errors (e.g., Her
want), and over application Tense
errors (e.g., Does it rolls?).
Radford (1990) on the other hand, has maintained a second order (special
nativist) maturational account affecting syntactic complexity in order to
explain the same lack of adult-like speech. It should be noted that these two
nativist positions share a common bond in that they are reactions to much of
what was bad coming on the heels of work done in the 1970s--theories which
sought to account for such errors on a purely semantic level e.g., Bloom
(1975), Braine (1976) and to some extent Bowerman (1973). Steering away from
potentially non-nativist associative/semantic-based accounts to proper
syntactic-based accounts was viewed by most to be a timely paradigm
shift--acting as a safeguard against what might be construed as bad-science
Behaviorism (of the purely semantic kind). This shift brought us toward a more
accurate 'Nativist' stance swinging the Plato vs. Aristotle debate back to
Plato's side, at least for the time being (as witnessed in Chomsky's entitled
book 'Cartesian Linguistics')--a
move keeping in line with what was then coming down the pike in Chomskyan
linguistics. One thing that seems to have caught the imagination of
developmental linguists in recent years has been to question again the actual
infrastructure of the child-brain that produces this sort of immature grammar--namely,
a rejuvenated devotion has reappeared in the literature circumscribing new
understandings of age-old questionings dealing with Theory of the Brain/Mind.
[2]. For
instance, proponents of Behavioral/Associationist Connectionism today (cf. Jeff
Elman, Kim Plunkett, Elizabeth Bates, among others) are more than ready to
relinquish the old Chomskyan perspective over special nativism ('special' in
that language is viewed as coming from an autonomous region in the brain,
unconnected to general cognition or other motor skill development, pace Piaget and vs. general nativism), and have rather shifted their locus on an
innateness hypothesis based not on natural language (per se) but rather on a type of innateness based on the
actual architecture itself that generates language (architecture meaning
brain/mind: viz., an innate Architecture, and not an innate Universal Grammar).
[3]. For
Chomsky, it was this autonomous Language Faculty (that he refers to as a
language organ) that allowed this innate language knowledge to thrive and
generate grammar. For the connectionist movement, it is the very architecture
itself that is of interest--the input/output language result being a mere
product of this perfected apparatus. So in brief, the debate over innateness has
taken on a whole new meaning--today, perhaps best illustrated by this more
narrow debate over General vs. Special Nativism. We shall forgo the meticulous details of specific
theories at hand and restrict ourselves to the rather prosaic observation that
the child's first (G)rammar (G1) is not at all contemporary with the adult
(T)arget grammar (Gt). Notwithstanding myriad accounts and explanations for
this, for the main of this paper, let it suffice to simply examine the idea
that the two grammars (child and adult)--and we do consider them as two
autonomous and separate grammars--must partake in some amount of Discontinuity: (Gt is less than equal to G1, or Gt<G1) and that
such a discontinuity must be stated as the null hypothesis tethered to
maturational/biological differences in the brain. Hence, G1 represents the
(B)rain at B1..(B2..B3¼Bt ), while Gt represents the brain at Bt).
[4]. Discontinuity
theories have at their disposal a very powerful weapon in fighting off
Continuity theories--whether it be language based, or biological based (noting
that for Chomsky, the study of Language, for all intents and purposes, reduces
to the study of biology). This great weapon is the natural occurrence of
maturational factors in learning. In fact, on a biological level, maturation is
taken to be the null hypothesis--whether it be e.g., the emergence and
consequent loss of baby teeth, to learning how to walk-talk, to the onset of
puberty. In much the way the adult achieves, the achievement can be attributed
to the onset of some kind of scheduled-learning timetable--for language, it's an
achievement mirroring a process in which the nature and level of syntactic
sophistication and its allocation is governed in accordance to how the brain,
at the given stage, is able to handle the input.
[5]. It
is common knowledge that (abstract) grammatical relations are frequently a
problem for language acquisition systems. Early reflection on this was made by
Brown when he discovered that one could not explain why some grammatical
morphemes were acquired later than others simply in terms of input. The question
was posed as follows: If all morphemes are equally presented in the ambient
input at roughly the same time--contrary to what might be believed, parents'
speech toward their children is seldom censored so as to bring about a reduced
mode of grammatical communication/comprehension--then, what might account for
the observed asymmetrical learning? Similarly, Pienemann (1985, 1988, 1989) has
made claims for a grammatical sequencing of learning second language based on
complexity of morphology. This question led to early notions of a linguistic
maturational timetable, much like what Piaget would have talked about regarding
the child's staged-cognitive development--maturation being the only way to
address such a staged development. Likewise, a Chomskyan position would have it
that there must be something intervening in the child's (inner) brain/mind
(albeit not tied to cognition) that brings about the asymmetrical learning
since there's no change in the (outer) input. Well, one of the first
observations uncovered by Brown was that a child's linguistic stage-1 (with
multi-word utterances (MLU) lower than 2) went without formal functional
grammar. Brown noted that an initial telegraphic stage of learning ensued
absent of abstract grammatical makers such as Inflection, Case and/or
Agreement.
[6]. Constructivism
vs. Generativism: A Brief Summary
Constructivists' accounts assume that children's
grammatical knowledge initially consists of constructions based on high
frequency forms in the input. Their models assume polysemy in representation
since lexemes are viewed as being stored in a distributional network in order
to encode different meanings: sound-to-meaning links are therefore made based
on similar phonological to semantic distributions. Furthermore, it is their
general claim that such a correlation is strictly associative, and that it
holds between the quantity and quality of the exemplars obtained of particular
constructions with the constructions of more general schemes that underlie
language use. The constructivist model assumes a 'bottom-up' cognitive
scaffolding of language learning (somewhat akin to what Piaget had earlier
claimed regarding a cognitive underpinning to language development).
Generativists'
accounts, on the other hand, differ with constructivist models in one very
simple account--their models credit children (very early on in their speech
development) with tacit syntactic knowledge, unrelated in any way to frequency,
data-driven constructivist claims which define language as being tethered in
someway to cognition. Generativists in this sense draw on parameter-setting
mechanisms (as opposed to data-driven mechanisms) to account for language
growth. Generativists maintain two versions of a general language development
model; both versions speak to a more innateness (top-down) account of language
acquisition. The first version is represented herein as Wexler's O(ptional)
I(nfinitive) model (ibid). The OI
model grants children from the very earliest stages of development with the
abstract knowledge of morphological inflection. According to OI accounts,
children have access to inflection. The fact that inflections may optionally
project (at stage-1) speaks to matters of specific feature spell-outs of the
phrasal projections (i.e., all inflectional phrases project, it is rather the
features pertaining to the phrases that may go un(der)specified and thus not
project). The second model associated with Radford (Radford & Galasso ibid.)
claims that children may initially
produce some early inflection, but that there is evidence that the child may
not be processing such attested inflection in a true syntactic way: (children
at this early stage may in fact be treating inflections in a
non-syntactic/derivational manner). In addition to this claim, the general idea
here is that a very early grammatical stage indeed exists where one finds no
true syntactic processing in the child's speech (i.e., there is a
'No-Inflectional' stage-1). What is of interest to us here regarding Radford's
'No Functional stage' model (Radford 1990) is that it readily overlaps with
constructivists claims for their stage-one as well. Specifically speaking, it
has become a custom for constructivists to say that although they believe there
is no syntax for their early stage-1, children's grammar is indeed protracted
and that those 'abstract rules' which underwrite syntax proper eventually do emerge at a later stage in the course of
the child's language development. Hence, it would seem that Radford's version
and the constructivists version might converge and agree regarding the earliest
stage of development. Both models predict similar stages of development: (viz.,
a stage-1 void of any inflectional). Though this concord of predications appear
to be true empirically, theoretical concerns are real and would continue to
weigh heavily on the mind's of the linguists, thus undercutting any feeble
attempt to accord the two positions.
Constructivism,
and beyond. One
consequence of this style of learning was that children were considered to
learn by rote-methods, associative means similar to what Skinner had earlier
advocated in Behaviorism. It was somewhat tentatively implied here regarding a
very early stage-1 that children didn't start learning language as a set of
abstract rules of logic (as Chomsky would have us believe in his notion of
generative grammar), but that children would first grapple with the linguistic
input by gathering data-driven patterns and constructing broad-range syntactic
templates based on such distributional analyses of the patterns (a kind of
first order frequency learning). Children would only later on, say at a stage-2
of language acquisition, start to employ Chomskyan style rules to generate a
target grammar (as a consequence, see 'U-shape learning' discussed in §60). Benchmarks
of development thus followed: (i) Recognition of patterns comes first (no
attested phonological/morpho-syntactic over-regularizations) (ii) Abstractions
of the patterns come after (attested phonological/morpho-syntactic
over-regularizations). Data-driven analogies fit well with recently proposed
computational models of syntactic acquisition, a model in which children
initially form syntactic templates on the basis of distribution analyses of
linguistic input (Cartwright & Brent: 1997). Data-driven models trace their
antecedents back to the 1960s. For example, Bellugi (1967), Klima and Bellugi
(1966), Braine (1963), initially allowed for a certain amount of formulaic
misanalysis to enter into the accounting of non-adult-like stage-1 structures. In
a contemporary about-face from much of what had been advocated in the
Parameter-theory of the 1980s, Rowland and Pine (2000), among others, have
returned to the aforementioned 1960s by similarly calling on first bottom-up,
data-driven procedures in securing potential syntactic paradigms. According to
such constructivists terms, children do not have any general (rule-driven)
knowledge of syntactic categories, at least not until they have acquired
enough similar templates from which
they can abstract a general pattern. This model would readily explain why
over-regularizations tend not to occur very early on in children's speech: if
the stage in question employs no rules, then, by definition, no
over-regularizations of rules can occur. (It is suggested in this context that
the onset of over-regularization as attested in the data indicates the later
rule-based stage-2 of development). It has been suggested that what one means
by 'until they have acquired enough similar templates' is that there may be a
frequency based storage threshold at work that converts an overburdened
data-driven analysis into rule-based abstraction: i.e., a kind of Critical Mass
Hypothesis which speaks to the notion that an eventual rule-driven grammar
requires a certain quantitative 'tipping point' to be reached of (i) precise
number of patterns to (ii) general abstraction of patterns. Without a
compilation of data, no abstraction can be achieved: children must acquire a
sufficient amount/number of exemplars before abstracting general patterns from
them can be productive. (See §§26, 27 'Less is More hypothesis').
[7]. For
instance, Rowland & Pine (op. cit) suggest that e.g., early Subject-Auxiliary
inversion errors such as *What he
can ride in? (along with the optional
target structures showing correct inversion What can he ride in?) cannot be accounted for by a rule-driven theory--viz.,
if the child has access to the rule, the theory would then have to explain why
the child sometimes applies the rule, and sometimes fails to apply it. Rowland
& Pine rather suggest an alternative account by saying that as a very early
strategy for dealing with complex grammar (e.g., Aux. Inversion, Wh-fronting)
children learn these semi-grammatical slots as lexical chunks--a sort of
lexicalized grammar--whereby they establish formulaic word combinations: e.g., Wh-word
+ Auxiliary as opposed to Auxiliary
+ Wh-word combinations. It was shown
that aspects of error rate and optionality (versus rule-driven mechanisms) highly correlated to high vs.
low frequency rates of certain combinations in the child's input. This early
non-rule-based strategy was then able to account for the vast array of the
child data--viz., where the number of non-inverted Auxiliaries vs. inverted
Auxiliaries was at a significantly higher rate at the initial stage-1 of
development. As an example of a non-rule-based account here, they show that
when inversions did occur, they typically involved only a certain select few
Wh-words, and not the entire class. Hyams (1986, p.85) somewhat agrees with such
a reduced structure when she asserts the following: By hypothesis, the
modals (or Aux. Verbs) are
unanalyzable during this period.
[8]. Moreover,
such claims strongly support Stromswold's (1990) statistical data analyses
which clearly demonstrate that children at a very early stage-1 might not
productively realize an utterance string containing [don't, can't] in e.g., I/me [don't] want, You [can't] play as the syntactic elements [{Aux} + clitic{n't}], but
that such strings were more limitedly realized as quasi-formulaic
representations of a negative element. In other words, the claim could be extended to mean that for the child
at this stage-1, the lexical item don't/can't reduced to the one-to-one sound-meaning of not: e.g., Robin [don't] [=no(t)]
play with pens (Adam28) where the
verbal inflection {-s} goes missing since it isn't analyzed as an Aux Verb. (Though see Schütze (2001) for some arguments against this
position). Likewise, Brown came to
similar tentative conclusions by recognizing that (i) verbal inflection seemed
not to be generalized across all verbs in the initial stages, and therefore,
that (ii) children didn't really start with rules, but rather employed a
strategy of 'lexical-learning'. Early stage-1 inflected verbs might then be
learned as separate verbs (chunks) thus explaining observable optionality:
since, as the story was then told, 'either you know a rule, and so you always
apply it, or you don't'. Optionality of verbal inflection was seen as two
singular processes of word acquisition in the brain: both uninflected and
inflected words were stored as two different items in the lexicon. (See Bloom
1980 for comments). This notion of a stage-1 learning via non-rule-based means
implied that the stage was a formulaic stage, and set-up in such a way as to
learn by associative processes buttressed by frequency learning.
[9] Having
spelled out some of the issues surrounding Constructivism vs. Generativism, one
major question seems to prevail throughout: How might it be possible to bridge
the gap between a associative/semantic relations and abstract/formal categories? One way to solve the question might be to stipulate
that whatever mechanism generativists cling to regarding their account of
syntactic development, proponents of a Converging Theories Model (based on the
Dual Mechanism Model) likewise evoke the similar generativist stance: in
accepting a strong maturational perspective, we are able to take the best of
both positions (i.e., no other explanation needs to be posited outside of what
remains to be the generative traditional stance). What the converging theories
model offers is a middle of the road theory which suggests that a maturational
stage-1 of development is universally maintained, irrespective of whether or
not one adheres to a generative or constructivist stance. Theory internal
measure put aside, a universal biological account of brain development spreads
equally across both models.
The
Dual Mechanism Model
[10]. It
has recently been hypothesized that the language faculty consists of a
dualistic modular structure made up of two basic components: (i) a Lexical component--which
has to do with formulating lexical entries (words), and a (ii) Computational component--which
is structured along the lines of algorithmic logic (in a Chomskyan sense of
being able to generate a rule-based grammar). It is argued that these two very
different modes of language processing reflect the 'low-scope' (1st
order) vs. 'high-scope' (2nd order) dichotomy that all natural
languages share. Low/High scope would be described here in terms of a how and
where certain aspects of language get processed in the brain (see also section
[§64] on brain studies). In addition to newly enhanced CT brain imaging
devices, multidisciplinary data (e.g. linguistic, psychological and biological)
are starting to trickle in providing evidence that a dual mechanism is at work
in processing language. Results of experiments indicate that only a dual
mechanism can account for distinct processing differences found amongst the formulations
of irregular inflected words (e.g., go>went, foot>feet) and regular inflected words (e.g., stop>stopped,
hand>hands). The former (lexical)
process seems to generate its structure in terms of stored memory and is taken
from out of the mental lexicon itself in mere associative means: these measures
are roughly akin to earlier Behaviorist ideas on frequency learning. The latter
regular mode of generating structure is tethered to a Chomskyan paradigm of
(regular) rule-driven grammar--the more creative, productive aspect of
language/grammar generation. Such regular rules can be expressed as
[Stem]+[affix] representations, whereas a stem constitutes any variable word
<X> (old or novel) that must fit within the proper categorization
(parts-of-speech) stem. For instance, using a simplified version of Aronoff's
realization pair format (1994, as cited in Clahsen 2001, p. 11), the cited
differences in parsing found between e.g., (i) a regular [Stem + affix]
(decomposed) construction vs. (ii) an irregular copular 'Be' [Stem] (full-form)
lexical item can be notated as follows:
a. <[V,
3sg, pres, ind], X+s>
b. <[V,
3sg, pres, ind, BE], is>
The
regular 3Person/Singular/Present rule in (a) spells out the bracketed
functional INFLectional features of Tense/Agreement by adding the exponent 's'
to the base variable stem 'X'. The features in (b) likewise get spelled; but
rather than in the form of an exponent, the features are built into the lexeme
'BE' by the constant form is. Once
the more specific, irregular rule is activated, the default regular rule-base
spell-out is blocked-preventing the overgeneralization of *bes.
[11]. INFLection. Recent research conducted by
Pinker (MIT), Clahsen (et al.) (Essex), among others, has shown that a dual
learning mechanism might be at work in acquisition of a first language. The
research first focuses on terminology. It is said that there are two kinds of
rules for Inflection: an Inflection based on lexical rules, and an Inflection
based on combinatory rules. In short, the types of rules are described as
follows:
(i)
Lexical Rules: Lexical rules (or
lexical redundancy rules) are embedded in the lexical items themselves
('bottom-up'). Lexical rules may be reduced to being simple sound rules
somewhat akin to statistical learning; for instance, associative regularities
are built-up from out of the sequencing of lexical items--e.g., the <sing>sang>sung
-> ring>rang>rung>
sequencing of an infix (vowel change) inflection (presented below)
(ii)
True Rules: Word inflection of the
former type (i.e., lexical rules) is cited as an inflection not based on rules,
but rather encoded in the very lexical item itself. True Rule (or affixation),
on the other hand, would be a combinatory symbolic process based on variables
('top-down')--a creative endeavor not bound by associative input. Whereas
lexical-based inflections are exclusively triggered by frequency and
associative learning methods--i.e., they are not prone to deliver the creative
learning of novel words with inflection--novel word inflection is generated (by
default) once the true rule-based grammar is in place. One simple example that
Pinker and Clahsen give in illustrating lexical/associative Inflection is the
irregular verbs construction below:
[12].
Irregular Verb Constructions: The #ing>#ang>#ung paradigm
Table 1
|
a). sing >
|
sang >
|
sung
|
|
b). ring >
|
rang >
|
rung
|
|
c).*bring >
|
*brang >
|
*brung
|
The
cause of this commonly made error in (12c) is due to the fact that the
phonological patterning of rhyme #ing>#ang>#ung--as a quasi-past-tense
infix (lexical-rule) form--is so strong that it often over-rides and out strips
the default regular (true-rule) form of V+{ed} inflection for past tense.
(Spanish offers many similar examples where frequency of regular verbs affect
the paradigm such as the irregular (correct) Roto (=Broke)
over-generalization from the (incorrect) regular inflection *Romp-ido.) (*marks ungrammatical structures).
[13]. The
erroneously over-generated patterns of *bring>brang>brung (for English) and *Romp-ido (for Spanish) are heavily based on statistical
frequency learning in the sense that the sound sequences of other patterns
(e.g., ring>rang>rung, and
infinitive verb V-{er} respectively)
contribute to the associative patterning (a frequency effect forming the sound
pattern irregular-rule in the former example and a default regular-rule in the
latter example). Recall that structured lexical/associative learning merely
generalizes, by analogy, to those novel words that are similar to existing
ones. Regular grammatical rules (true rules), on the other hand, based on
affixation, may apply across the board to any given (variable) syntactic
category, be it similar or otherwise. In one sense, the ultimate character of
'true rules' is that which breaks the iconic representation of more primitive,
associative-based processes, whether it be a neuropsychological process or some
other process.
[14]. The point
that the actual over-generalized strings (bring>brang>brung) are not found in the input demonstrates that there is
some aspect of a rule evoked here--albeit, a rule based on rhyme association,
and thus not a 'pure rule' where true (non-associative) variables would be at
work. In other words, these lexical rules attributed to irregular formations
are to be generalized as a form of associative pattern learning, and not as a
true rule, since they are associated with sound sequencing only. One crucial
implication of an Inflection generated by a true-rule is that such inflection
could be easily applied to novel or unusual words: viz., words never before
heard in the input (contrary to frequency learning of lexical rules discussed
above--cf. Brown (1957), Berko (1958).
[15]. Expanding
on previous studies which examined differences in priming effects between Derivational and Inflectional morphology,
Clahsen concludes that the difference in priming effects can only be accounted
for by a dual mechanism of learning--interpreting the data to show that high
priming effects were connected with productive inflectional forms not
listed in the mental lexicon, whereas low priming effects were connected to productive derivational forms
associated with stem entries.
[16]. With
regards to German forms of pluralization, Clahsen et al. (p. 21) note that the
same argument can be made for a dual mechanism process--viz., the high priming
regular (default) plural '-s' (auto-s)
contrasts with the low priming of the irregular plural '-er' (kind-er). The raw findings here suggest that certain
irregular inflections in German (e.g., participle {-n}, plural {-er}) might be
stored in the lexicon as undecomposed form chunks and that these two processes
of storage are activated in very different places and manners in the
brain--viz., the findings that irregular inflections spawn reduced priming as
compared to regular inflection suggest that regular inflections are built
forms based on rules that contain variables which make the basic unmarked
stem/root available for priming. It
is clear from the table below that regular inflected word forms such as {-t}
participles and {-s} plurals produce full priming and no word-form frequency
effects. For irregular inflected affix forms such as {-n} participles, {-er}
plurals and (irregular) {-n} plurals, the opposite pattern appears. The data suggest
that irregular forms are stored as undercomposed stems--hence the emergences of
full form frequency effects. Regular forms are captured by the full rule
process and are stored in a computational manner that works off of
variable+stem algorithms--hence, the lack of full-form frequency effects. These
differences in German morphology seem to parallel what we find between English
(i) Inflectional morphology and (ii) Derivational morphology where the former
seeks out specific rule formulations--e.g., V + {ed} = Past, or N + {s} =
Plural, etc. and where the latter seeks out associative style sound-to-meaning
learning approaches (as in irregular verbs/nouns e.g., go>went,
tooth>teeth, etc.) Applying fMRI
brain imaging techniques, a consensus has begun to emerge suggesting that the
lexical storing of derived stems + suffixes (e.g., teach+{er}) may actually be
processed as one single word chunk in the otherwise lexical (word/recognition)
temporal-lobe areas of the brain, and not, as intuition would have us believe,
as a dual segmented [stem + suffix] lexical structure which has undergone a
process much like a morpho-syntactic string). This may be an apparent
economical move keeping in line with the classic one- sound-one-meaning
association. In noting this, there seems to be a natural tendency in the
diachronic study of language to move from (i) rule-driven Inflectional morphology--with
more complex rule-driven infrastructures [+Comp] (Comp=complex) to less complex
[-Comp] structures--to (ii) association-driven Derivational morphology. This tendency can be easily captured by looking into
the way words have evolved over a duration of time--e.g., Break|fast /bre: kfaest/ has evolved from a twin morpheme structure [[Verb Break] + [Noun
Fast]] > to Breakfast /bre: kfIst/ [Noun Breakfast] composed of a single morpheme chunk.
Table 2 Summary of
experimental effects (Taken
from Clahsen et al. 2001: p.26)
|
Representation
|
Full priming effect?
|
Full-form frequency effect?
|
Source
|
|
-t
particples: ge[kauf]-t
|
yes
|
no
|
Sonnenstuhl et al. (1999), Clahsen et al.
(1997)
|
|
-s
plurals: [auto]-s
|
yes
|
no
|
Sonnenstuhl&Huth (2001), Clahsen et al. (1997)
|
|
-er
plurals: [kinder]
|
no
|
yes
|
Sonnenstuhl &Huth (2001) Clahsen et al. (1997)
|
|
-n
participles: [gelogen]
|
no
|
yes
|
Sonnenstuhl et al. (1999), Clahsen et al.
(1997)
|
|
-n
plurals I: [bauern]
|
no
|
yes
|
Sonnenstuhl&Huth (2001)
|
|
-ung
nominalizations: [[stift]ung]
|
yes
|
yes
|
Clahsen et al.(2001)
|
|
diminutives: [[kind]chen]
|
yes
|
yes
|
Clahsen et al. (2001)
|
|
-n plurals II: [[tasche]n]
|
yes
|
yes
|
Sonnenstuhl&Huth (2001)
|
[17]. In
sum, Pinker and Clahsen assume that the language faculty has a dual
architecture comprising of (i) combinatory rule-based lexicon (leading to the
lack of full-form effects) and (ii) a structured non-rule-based lexicon
(leading to full-form effects). Questions on specifics will surface in the
following sections-namely: How are these two methods represented in the brain?
[18]. A Stage-1
Language Acquisition. There
is a huge and ever-growing body of data today being tallied by developmental
linguists in the field which suggest that the brain of a child matures in
incremental ways which, among other things, reflects the types of 'staged'
language development produced by the child for a given maturational stage. The collected
data suggest that children's early multi-word speech demonstrates 'Low-Scope'
lexical-specific knowledge, and not abstract true-rule formulations attributed to grammar. Somewhat akin to
Piagetian notions of language development (see general nativism [§31] below):
One difference being that it need not be tied here, exclusively, to a cognitive
apparatus. This maturational theory of language development accounts for the
lack of specific linguistic properties by suggesting that the brain is not yet
ready to conceptualize higher and more abstract (High-Scope) forms of
linguistic conceptualizations.
[19]. The idea
behind 'What gets missed out where'
in child speech production has given those linguists interested in morphology
and syntax a particularly good peek at how the inside of a child's brain might
go about processing linguistic information--and other information for that
matter. As stated above, research initially carried out by Brown and his team
(1973), working under a Chomskyan paradigm of linguistic theory, and consequent
work by others (cf. Radford) suggests that there is a stage-1 in language
acquisition that tightly constrains the child's speech to simple one-to-two
word utterances with no productive forms of verb or noun inflection. One child that
appears in the early studies, Allison, provides transcripts between 16-19
months showing no signs of the onset of formal inflectional grammar--only
later-on close to two years of age (22-24 months) does inflectional
grammar/syntax emerge, and then only in what could be said as a sporadic,
optional manner.
[20]. This
stage-1 is considered to be a grammatical stage with an MLUw (Mean Length of Utterance word) of 2
words or less. More specifically, in the sense of the apparent lack of formal
grammar, this shouldn't be confused with the idea of an earlier a-grammatical
stage well before the onset of multi-word speech. (Surely, there can be no
grammar or syntax of which to speak if there aren't multi-word constructions).
This grammatical stage-1 therefore differs with the notion of a one-word stage
(MLU=1) where supposedly absolutely no grammar/syntax is at work. The
grammatical stage-1 is said to begin roughly with the onset of multi-words at
about the age of 18 months (+/-20%). It is reasonable to suppose that such a
stage would have target semantic meaning--even though, say the arbitrary
'one-to-one sound-to-meaning' relationship is not of the target type (e.g.,
onomatopoeia forms /wuwu/=dog, /miau-miau/ =cat, etc.).
[21]. The above
notions beg the question: At what point do we have evidence of grammatical
categorization? For example, the traditional distributional criterion that
defines the Noun class as that category which may follow Determiners (a/the/many/my/one) made not be available to us if, say, Determiners
have yet to emerge. Hence, distributional evidence may be lacking in such
cases. One way around the dilemma has been to suggest that early stage-1
grammar is categorical in nature simply owing to a default assumption that
categorization is part of the innate ability to acquire language (in Chomskyan
terms, part of the richly endowed LAD or Language Faculty) and that words are
both inherently categorical and semantic in nature. Pinker (1984) claims that
the categorization of early stage-1 words should be roughly pegged to their
inferred semantic properties. Radford (1990), in a slightly different approach,
prefers to consider such early multi-words at stage-1 as lexical in the sense
that (i) they have built-in default lexical categorization abilities (forming
classes of Nouns, Verbs, Adjectives, Adverb, and Prepositions), but, at the
same time, (ii) rely heavily on their semantic-thematic properties. In any
event, either description starkly contrasts with a connectionist view which
claims that e.g., the class 'subject' emerges through rote-learning of
particular framed constructions. Subject-hood is learned as a category via rote
associative learning of thematic relations. Now, it remains unclear to me
precisely how close such thematic links to category-hood get to Radford's 1990
interpretation. I would only venture to say that both views share the belief
that semantics hold the central cognitive underpinnings upon which syntax can
later be built.
[22]. This
account of stage-1 has been labeled as the Lexical thematic stage-1 in language acquisition (Radford 1990). It is
unclear how far Radford would like to go in accepting his stage-1 as
cognitively based: the labeling here of lexico-thematic (the term
thematic referring to argument structures pegged to semantics) certainly
permits some amount of semantics to leak into the discussion. Nevertheless,
Radford emphatically rejects the notion that a stage-1 syntax could be
exclusively based on semantics. It is here that Radford gets full mileage out
of his two-prong converging Lexical-Thematic stage-1 grammar: a stage-1 that is
both--
(i) 'thematic' in the sense that it leans towards
general nativism since simple utterance types at the earliest MLU get directly
mapped onto their thematic argument structure; while,
(ii)
'lexical' in the sense that the child seems to be fully aware that they are
dealing with words based on lexical grammatical categories, and not semantic.
This is made apparent by how children know the morphological range of category
(e.g., Noun, Verb) selectiveness along with inflection distribution.
[23]. One argument
against a semantically based stage-1 was that from the very beginning,
children's productive multi-word speech (MLU= 2+) yielded Inflectional plurals
{+s} and gerund {+ing} endings--the first two morphemes to be acquired according
to Brown's morpho-sequencing list. These endings were only attached to
syntactic categorial word-classes: e.g., {s} to nouns, {ing} to verbs, etc.
There seemed to be no attempt by the young child to generalize such inflections
onto pure semantic categories. In other words, if children's word classes at
this stage-1 were thematic, rather than syntactic in nature, we would expect
that specific inflections would be distributed along semantico-thematic lines:
e.g., plural {s} to agent, gerund
{ing} to action words, etc. (Radford 1990, p. 41). Such findings are not
reported in the data. It was this absence of semantically based grammars which
led discussions about possible a priori innate grammatical categories, a grammar based on a syntax (without
meaning) rather than a syntax based on semantics (meaning) (cf. general vs. special nativism). Although it is indeed correct to
suggest that there seem to be no purely semantically based Inflections at
stage-1, one argument against the conclusion of the claim, and seemingly in
support of a semantically-based stage-1, would be to suggest that, in fact,
most utterances at this stage are instances of formulaic constructions. Only at
a later stage-2 would we find instances of real productive inflection--viz.,
even though on the surface, inflection appears to be utilized at stage-1, the
surface structure only mimic input driven phonological patterns.
[24]. This
'mixed bag' of a grammatical stage is indeed an argument against
'too-strong-of-claim' syntactic-based model of early grammar (assuming that a
syntactic version holds as a buttress for Continuity--we shall take some comfort
in it however due to the fact that this strong claim we take will be short
lived and relegated to the very earliest of grammatical stages: (=MLU below 2).
There is a caveat here. One argument, however, against interpreting from no
evidence-namely, the observation that no inflection shows up on argument-themes
might be the following: If our stage-1 were in fact formulaic, and not
rule-based, then there indeed would be no utterance of an improper formulaic
inflection attached to a semantic category simply because this would not have
been available in the phonological input. Formula constructions come out of the
input in a highly regular manner--based on high frequency, saliency and churn
out as formulaic un-analyzable chunks. (See §42 for an account of apparently
correct parameterized word order found at an otherwise non-parameterized stage
of acquisition).
[25]. The
argument could run as follows. The fact that children at stage-1 never produce
e.g., the action-inflection '-ing'
to semantically classed action-words like *up-ing/down-ing/over-ing/on-ing, etc. merely indicates that such strings are not part
of the available input (particularly note worthy given that our stage- 1 is
semi-formulaic in nature). It will be argued that the very earliest of stages
(stage-1), addressed herein, is indeed the very earliest of staged
developmental grammar--what may have been even termed a-grammatical in previous
theories (viz., the one word stage (cf. Atkinson, 1992; Radford, 1990; among
others). Let it be known that I am all too ready to acknowledge and agree that
language is indeed built upon pure syntax at our stage-2 of development, (and
not on semantics): the classic evidence for a syntactic-based language at the
earliest stages has been taken from the child's inflectional system at work on
the basis of grammatical categories. Notwithstanding early attempts to cast
syntactic analyses to early stages of language, there have been attempts in the
child language acquisition literature to construct a dual model for stage-1
based on (i) semantico-thematic relations on one hand, and (ii) categorial
syntax on the other. This hybrid model has been considered as a
lexical-thematic stage-1 of child language acquisition where mere semantic
properties tied together those lexical syntactic categories void of any
functional material (as related to the functional categories IP & CP). The
most fully articulated version of this hybrid theory could be found in Radford
(1990):

[26]. The
question is then put to us in the following form: Is there any evidence at the
earliest phases of stage-1 (say MLU<2) that the child actually
analyzes strings as a syntactic structure--as opposed to a formulaic
speech-utterance (i) which may be tethered to a variety of gradient meanings,
and (ii) which may reduce to mere surface-level syntactic phenomena)? In other
words, what may appear on the surface as syntax proper, may in all actuality simply be a result of the
surface formulae learned and that real tacit syntactic knowledge is not
represented. There seems to be little that hinges on the possible alternatives:
If,
on the one hand, we consider such semi-formula as syntax proper-making our
stage-1 (MLU<2) a syntactic stage--then so be it. We are then forced
to reconciling our syntactic stage-1 to the one word stage as previously
thought and nothing is lost.
If,
on the other hand, a lexical-thematic stage-1 involved itself with bridging
this narrowing gap between formula and syntax--then so be it. The benefits we
have gained by adapting this measure is that it allows us a nice continuity
bridge onto the later phrases of stage-1 (MLU +2).
[27]. One
interesting by-product of such a lexical-thematic stage-1 is that it doesn't
specify Word Order: word order being traditionally tied to functional
parameterization (see Travis, 1984; Atkinson, 1992; Tsimpli 1992; and Galasso,
1999/2003). Coming on the heels of such semantic-based models of language
acquisition, claims have been made suggesting that the cause of a semantic stage-1
is due to memory deficits. As part of a Maturational time-table, the child
starts off with a very limited memory attention span--this memory deficit
(maturational based) triggers the more 'robust & primitive'
semantic-lexical level of language (since the lexical component is more
salient) to kick start productive communication (see Newport's 'Less-is-More
Hypothesis', S. Felix's non-UG/cognitive approach to L2 learning, as well as J.
Elman's work in relation to connectionism. For evolutionary accounts, see
Bickerton's Proto-language, 1990).
Less-is-More Hypothesis. According to
Newport's 'Less-is-More' Hypothesis, a Radfordian style maturational
time-table--dividing our stage-1 from stage-2--would be linked to 'working
memory' deficits: Stage-1 starts with early limited memory and thus can solely
rely on the more primitive and robust rote-learned and formulaic structures.
(One needn't say that all possible structures at stage-1 are rote or
formula--let it suffice to say that the flavor of the stage suggests little if
any evidence for 'true-rule' formations or parameterizations, citing stage-1
variant Word Orders and null INFLections). This handicap of low memory actually
works as an advantage for the child in that it serves to constrain the
perceived input to basic degree-0 SV(X) structures--the structures are
ready-made by the lower-level cognitive processes and made available to the
stage-1 child. Lower-level memory seeks out idiomatic lexical-based categories
or lexical based morphemes as opposed to functional, syntactic based
morphemes/categories (termed 'l'-morphemes' vs. 'f'-morphemes respectively by
Pesetsky (1995) as understood in Distributional Morphology (see [§54] ). (N.B.
Felix (1981) as well as Krashen's claim that it is precisely this over-production
of the cognitive apparatus/high memory that makes second language learning so
fraught with difficulty--having to 'learn' language overtly instead of naturally
'acquiring' it in a natural setting.)
[28]. We can
better frame arguments that claim for a cognitive/memory dependence for
language acquisition by addressing the very nature of syntax. First, syntax
requires much more in the way of computational memory. (Or perhaps the question
is better framed conversely--viz., more memory forces the computation to
reorganize itself by way of syntax.) The emergence of syntax coincides with the
onset of higher (quantity) amounts of language material--i.e., a higher number
of memorized words/strings leading to longer and a richer complexity of
sentences, etc. For instance, Degree-zero structures (say, basic SV sentences,
order irrelevant) come at the expense of lower memorizations, while, et vice
versa, Degree-1 structures, (embeddings, binding, recursiveness) come at a much
higher cost with regards to memorization. Why is that? Well, in one manner of
speaking the reason is self serving: simply due to the fact that in order to
have a degree-1 sentence, the empirical (maturational) data dictates that a
child must have, at some prior time, gone through a degree-0 stage, a process
that mirrors memorization capacity. But more to the point, the reason for this
mental/computational juggling has to do with how our brains go about making the
most out of our limited memory capacity. The very nature of these high amounts
of material forces a shift in how the brain can process (parse) the material.
It is believed in the neuro-linguistic community that the shift here--both in
the quantity and quality of language--triggers the already over burdened process
of rote-learning and memorization to be lifted, triggering the share of burden
to be replaced by rule-based processes (variables, categories, etc.). Such
rule-based learning frees up space in the lexical component of the brain (say,
the list of words stored) and allows new routes to be mapped. In other words,
such a huge volume of material forces new ways of organizing the input (hence,
categorization). In sum, the two-prong development as sketched out above might
proceed as follows:
(i) At
the Micro-Development level (stage-1) the data-stream is reduced for the child
in terms of its cognitive saliency: (the data-output is not changed, rather
it's the intervening deficiency of the child's mental processing that overall
affects these data). The child, working with a primary memory 'tool-kit',
allows a small subset-a of language input, this in turn allows the child to
ultimately deal with less data enabling rote-learning to take place. (N.B. It
is generally acknowledged that any memory deficit or trauma resulting in
language attrition would first affect the more abstract levels of
language/syntax).
(ii) At
the Macro-Development level (Stage-2) the data stream is affected by the
upsurge in memorization that in turn expands what becomes salient for the
child. Perhaps having to do with the triggering of hidden units at the end of
stage-1, the child now is in a position of capably taking the data and applying
paradigmatic structures--all which lead to formal (stage-2) grammar. Thus, Macro
development makes available more memory which in turn spawns new ways of
handling the material--the initial process of stage-1 rote association and
memory is no longer adequate and syntax proper emerges as a way of handling
both the quantity and quality of this newfound material.
[29]. What
syntax allows the brain to do is categorize and form analogies based on the
vast amount of input, rather than to memorize and store all input as meaningful
chunks (with an associative sound-to-meaning relationship imposed). This
results ultimately in a finite array of neuro-linguistic networks in the brain.
Hence, in a basic input-output model--similar to what we understand to be
happening in behaviorist stimulus and response associative models--quantity of
input equates to quality of brain processing. As is evident, the classic enigma
(chicken and the egg scenario) remains: Is it this newly wired brain which now
seeks out the formations of paradigms and variable rules that is responsible
for the quantum leap of quality of language, or is it this quality leap in
language that somehow drives the changes in the brain? This is tantamount to
the classic Nature vs. Nurture debate. My hunch here is that (i) the nature of
the raw Data as it is (ii) tied to
cognitive processing may be the driving force behind any structural changes
that occur in the brain--in other words, language changed the brain and not the
other way around. (It may ultimately be impossible to separate the one from the
other). But this is only a hunch, and again, it reduces to the same catch-22
scenario (if it is the data that is the driving force behind the change, how do
we account for a maturational protracted development, and secondly, surely, how
the brain handles and processes the data must be part of the equation for any
theory that attempts to account for developmental stages of language). In a
certain sense, Newport's 'less-is-more' hypothesis simply restates this same
paradox. Regarding architecture and the nature vs. nurture debate, clearly all
linguists suppose now that some connection must be made between genes and
environment Thus, a two-staged development follows:
(i) Stage-1
comes with low-level memory with strong correlates to semantics and
rote-learning. As a consequence, one-to-one sound-to-meaning correspondence
ensues explained by more prosaic economic constraints placed on cognition.
(ii)
Stage-2 comes with increased memory that (for reasons having to do with
processes of parsing, etc.) triggers high level categorization and syntax.
One-to-many/many-to-one relations are evoked triggering a highly rich
paradigmatic grammar.
[30]. Radford
(2000) more recently has gone on the record as saying that the Language Faculty
specifies a universal set of features--namely, that a child acquiring language
has to learn which subset of these features are assembled into the lexical
items as +universal (all other features awaiting parameterization via a
maturational timetable). The problem for the child is assembling the features
into lexical items. To a certain degree, the child needs to build-up lexical
items one feature at a time (see Clahsen's Lexical Learning Hypothesis). Thus,
the issue for Radford is that there are innate architectural principles--loosely
referred to as an Innate Grammar Construction Algorithm--which determine how
lexical items project into syntactic structures. This begs the following
question: How much of this initial learning deficit cited for our lexical
stage-1 is owed to the child's protracted language development being
exclusively tied to a maturational based low-scope cognitive template--a
potentially semantic based template upon which later formal abstract categories
(such as functional categories) can be mapped? It is clear at least that more
abstract functional categories come on-line later in the course of development.
[31]. General
vs. Special Nativism. This
is a nice place to pause and examine the role that our lower-scope cognitive
processes might play in deciphering between Stage-1 vs. stage-2 grammar. In
brief, there are two schools of thinking on this, both of which could maintain
general ties to a Chomskyan paradigm. One school takes an evolutionary stance
(Pinker & Bloom) and basically claims that lexical learning leading to
grammaticalization is heavily based on what are preexisting cognitive
constraints (much in the manner of former Piagetian models of language
development). Such linguists would disagree with the notion that a special
module in the brain must exist in order for language to manifest. Recall,
Chomsky in his strongest claims suggests that the Language Faculty (LF) is an
independent autonomous organ found somewhere in the mind/brain (similar to say
the liver or the stomach) and that this LF organ shares very little in the way
of general cognitive processes--a language module all to its own and without
common lineages to other regions or modules of the brain. This notion is
referred to in the language acquisition literature as a Double Disassociation
Hypothesis (disassociation between formal language and cognition) (see Smith
and Tsimpli for some discussion). The second anti-Neo-Darwinian position
suggests that a special module in the brain is required for language, and that
language learning can be accounted for by reduced/non-cognitive means.
[32]. Regarding
the debate over General vs. Special
Nativism, it is still unclear how the
debate should be viewed. Much of the argument quickly degenerates into the
classic aforementioned 'chicken-and-the-egg' dilemma of being circular in
nature: e.g., (i) The Special Nativist claims that the child first needs syntax
to uncover the underlying semantics (syntactic-bootstrapping), while (ii) The
General nativist insures that in order to properly construct a syntax category
in the first place, general properties of (inherent) cognitive-semantics must
be observed (semantic-bootstrapping). (Interesting, Chomsky's most recent work
on Minimalism suggest that there may be economical constraints on language
processing (from out of Logical Form). While it is still unclear how to
interpret the wide range of claims on the minimalist table, and Chomsky himself
often remains agnostic at these levels of inquiry, such economic constraints
could be interpreted as indeed not pertaining to consideration of pure syntax,
and rather adhering to more cognitive levels of processing: e.g., Minimalist
notions of shortest move, minimal amount of rules, and to a certain degree, the
objective essence behind the (PF) phonological form of language as versus the
(LF) logic form, etc.). On one hand however, it seems to me that a dualist
approach to acquisition (as presented herein) would initially favor a first
order semantic-bootstrapping view, given that semantics seem to play an
essential role in language acquisition early on before the onset of syntax.
(There is no conclusion drawn here, as nothing argued in this paper hinges on
that debate).
[33]. Why--I
don't need any 'rules' to see this tree. My eyes work just fine. That is,
insofar as there exists a single tree. How is it that my 'tree' gets destroyed
once I move my head ever so slightly to the east and fall into view of a second
tree? The mystery of it all lies somewhere in the dismantling, between a single
torn branch of lifted foliage, that forces the rule--for how was I ever to know
that this second tree was indeed a tree after all?
Well,
the above passage makes for a nice analogy, but it merits a closer look. When I
look at this cup of coffee in front of me, reach out for it, and drink its
contents, it certainly appears to me that I do little more than what my own
cognitive abilities lets me achieve--I don't perform any 'abstract rule'
formulations, procedures as such: although, I do agree that one could possibly
uncover all of the aforementioned procedural content coming together such as
e.g., Gestalt psychology, visual cortex processing, contextual/meta-linguistic
background of say [+liquid] => drink => mouth, along with muscle motor
coordination that allows me to see into space reaching and holding the cup
without breaking the glass (etc.). In face of all this possible 'theory'
nonetheless, it remains somewhat natural for me to maintain the idea that when
I 'see' a tree, I just 'see' a tree (period). But much has come out of Gestalt
theory in the past (being somewhat reframed here in the present context of
connectionism) that suggests there may be something to this very natural notion
of just seeing after all. Gestalt theory on perception states that there are
first-order perceptions in which, say, a child might see a line or a slope in a
strict iconic representation of the visual field. No rules apply--and there is a
strict Stimulus and Response (S&R) equation involved. Regarding language
acquisition, this first-order representation could be illustrated by the early
onset of vowel recognition (i.e., environmental sound)--and not sound as
filtered through assimilation processes, etc. (as seen in the u-shaped model
[§61] below). At a later stage of perception, second-order perceptions allow
the child to break iconic mappings and allow lines, slopes, etc. to begin to be
seen (with less vividness) as e.g., a chair--now, a larger, somewhat more
generic unit, which embodies the lower level visual stimuli. It seems to be the
case that the role of second-order perceptions is to pull and frame larger
aspects of Objects and Events--in linguistic terms, forming Nouns (out of the
former) and Verbs (out of the latter). So regarding language, we should be
clear that by the time a child reaches the very first stages of language
development--where a child is said to begin producing single word
utterances--s/he has already moved from the first-order perceptual field into a
second-order field. So, the idea that children may have some means to rules,
perhaps bootstrapped from Gestalt psychology (the General Nativist Position)
may not be totally implausible. However, and more to our point, Newport's 'Less-is-More'
hypothesis just as well could be interpreted to fit Gestalt findings: when
memory/cognitive capacity is low, children see in a fixed iconic manner, and
when memory/cognitive capacity increases, the child reorganized t