Publications & Papers - Towards a 'Converging Theories' Model of Language Acquisition

42.

Looking at the data (Galasso: 1999), we indeed find a strong correlation between SAS strings and mixed word order alongside DAS strings and fixed order.

Table 5 Word Order

Files: 8-16 SAS	SV	VS	DAS= SVX / Other
Age: 2;4-2;8 n.=	87	78	290 5

I. Some token examples include:

(a) SV: Daddy cooking. Him go
(b) OV: Dog kick (= I kick dog). A egg cook. (= I cook egg).
(c) VS: Open me (= I open). Work bike (=Bike works)

In terms of structure, before on the onset of DASs, a Proto XP could be assigned to our SAS stage providing the variable word orderings:

43.

In addition to general word order variability, Wh-word order patterns emerge in our early files (age 2;4-3;0) showing semi-formulaic consistencies when examined in light of the general acquisition of complex structure—as mentioned above regarding SAS vs. DAS complexity. Our data evidence a pattern showing Non CSV (Non Comp Subject Verb) ordering which could be interpreted as formulaic in nature. This stage roughly overlaps with our SAS stage mentioned above. Like Kayne on Word order, Cinque (1990) has formulated a strong universal position claiming that all Wh-elements universally position within the Spec-of-CP. Recall, that CP is a functional category that should have a delayed onset time under any maturational theory (cf. Radford: 1990). Here too we need to weaken the strong position by adding the stipulation that in order for this Spec-CP analysis to hold, the subject must simultaneously surface forcing the Wh-element to raise and preposition in Spec-CP. Otherwise, very early (stage-1) Wh-arguments (e.g., What, Who) seemingly get initially misanalyzed as base-generated 3Person Pronoun/Quantifiers placed in superficial subject Spec-VP position. This miscategorization often results in Agreement errors where the Wh-word, seen as incorrectly taking the thematic-role of the subject, agrees (by default) with the verb. Consider the structures of the two following CP- structures below:

Table 6 Wh-word order

	Non CSV	Wh Spec-CP (CSV)
Files 1-21 n.=	78	0
Files 22-25 n.=	120	80

44.

In sum, arguments could be devised suggesting that early Wh-structures are prime examples of semi-formulaic strings base generated (VP insitu). A later second stage (or even overlapping stage) may thus be seen as converting formulaic processes into rule driven processes whereby syntactic manifestations of Wh-movement occur with or without Auxiliary inversions. (See Stromswold (§6) above for Non-Aux inversions). Regarding formulaicy, Pine & Lieven (1997), Pine et al. (1998) claim that a non-rule based account is what is behind the formation of early correct wh-questions (a U-shape learning take on the data). While adopting a constructivist account in explaining the high rate of correctly inverted Wh + Aux combinations, they go on to predicted that correctly inverted questions in a the child’s stage-1 data would be produced by those wh + aux combinations that had occurred with high frequency in the child’s input. They go on to specify that there is evidence that the earliest wh-questions produced with an Aux. can be explained with reference to three formulaic patterns that begin with a limited range of wh-word + aux. Combinations (e.g., “whydon’t” you/she) (Rowland & Pine, 2000). Such findings on early formulaic structures parallel what Tomasello (1992) and Newport (op. cit.) suggest regarding an initial stage-1 that reflects a processing deficit tied to functional grammar. In other words, child stage-1 processing which shows a bias toward the modeled high frequency lexical input (vs. rule driven analogy) may arise due to constraints imposed by the low memory bottle-neck of distributional learning (Braine 1987, 1988)

Lexical Stage-1: A Recap

45.

In light of the above data, and the collections of data elsewhere, it could be argued for our stage-1 that the child’s utterances involve pure projections of thematic argument relations. In Minimalist terms, the operation ‘Merge’ would directly reflect thematic properties and that this operation is innately given by the Language Faculty: Verbs directly theta-mark their arguments as in predicate logic expressions:

Table 7 Argument/Predicate Structure

Token Utterance:	(d)addy work	(m)ommy see daddy
Predicate Logic:	work(d)	see(m,d)

The above Word Order/Syntax includes (SV) and (SVO) patterns and is structured below:

[vP[N Dad][[v0][VP[V work]]] [vP [N Mom][[v0][VP [V see][N dad]]]]

(vP=light-verb Phrase).

46.

In both example above, the Nouns (Daddy & Mommy) contain no formal features (such as person or case) and so don’t agree with the verb. The verb likewise carries no Tense or Agreement features. In this sense, theta-marking directly maps onto the semantics of lexical word classes—viz., ‘pure merger’ involves only theta-marked lexical items. It is therefore claimed that there is no Indirect theta-marking capacity at stage-1 such that oblique or prepositional markers would enter into the syntax: for example, the PP ‘to work’ in Daddy goes to work, would be thematically reduced in the operation Merge as Daddy go work (work =Nouns and not infinitive verb). Such utterances are wide spread for our stage-1 as was revealed in the section above. In addition to seemingly direct thematic based syntax/grammar, numerous other studies have shown that, indeed, children inappropriately overextend semantic (causative) alternations of verbs such as giggle vs. tickle by indiscriminately giving them identical thematic argument structures Thematic role ‘Patient’ in their intransitive forms: e.g., don’t giggle me! vs. don’t tickle me! (Bowerman, 1973). If we wish to make claims that such overgeneralizations are a result of some innate linking rule, then clearly some sort of default semantic-based linking rule must be up for discussion. In any event, the lack of non-semantic [-Interpretable] formal features certainly dispels the notion of syntax and leads us to look at such early stage-1 lexical items as being stripped of their formal features, and projecting quasi-semantic information on a class of their own-perhaps to the point that each lexical item is learned and projected in isolation

47.

In conjunction to an isolative lexicon, and much in the same spirit with Pine et al. above, Morris et al. (ms 1999) has sketched out a theoretical proposal (based on PDP-style connectionism) that relegates verb-argument structures in children’s stage-1 grammar to individual ‘min-grammars’-that is, each word is learned (‘bottom-up’) in isolation in that there are no overarching abstractions (‘top-down’) that link one verb’s argument structure to another. In other words, there are no argument rules, only isolated word meanings-each argument structure is a separate grammar unto itself (p. 6). It is only at a second stage-2 that the child is seen as corresponding the semantic as well as the syntax over from one word to another. For example, the verbs eat and drink, hit and kick, etc. will merge at stage-2 in ways that will project this overarching abstract structure regarding transitivity, thematic structure, etc. Hence, stage-2 is defined as the benchmark in emergence of true syntax and rule formation.

48.

In sum, what the above sketch has to offer us is the proposal that children start off (stage-1) with rote-learned items and then strive to find commonalities—the child then builds-up this lexicon from brute memory and only later (stage-2) does she slowly start to form levels of abstraction. The claim is that children learn grammatical relations over time—the bottom-up processes mimic the maturational processes behind language acquisition (viz., first a stage-1 ‘bottom-up’ lexical learning followed by a stage-2 ‘top-down’ rule formation).

49.

—Insert Interpretable features Radford here—

50.

Distributional Morphology. A second but similar line of reasoning, likewise motivated by outcomes in Chomsky’s Minimalist Program (see Marantz 1995, 1997) calls for morphology to be the all encompassing aspect of grammar—doing away all together with the lexicon as maintained under so call ‘lexicalist hypotheses’, as well as dispensing, to a certain degree, with traditional notions of syntax that sought to derive a syntactic model outside of the lexicon in a seemingly top-down manner. The theory’s basic core calls on a number of assumptions: viz., (i) that syntactic hierarchical structures ‘resonant all the way down to the word’ (or perhaps more accurately described ‘as being essentially derived from the word’); (ii) that the notion of ‘word’ is broken up into two properties— the word shell of phonology, (or as it is termed in DM, the Idiom), and the word’s selectional morphological features. The distinctions are articulated in terms of morphology by the following labeling: the ‘l’-morpheme—which pertains to the idiom aspect of the sound-meaning relation—and the ‘f’-morpheme—which correlates to the abstract morphological features. These two labels may be seen as correlating to Radford’s usage of +/-Interpretable features (above) where the [+Interp] feature distinction pertains to lexical item’s semantic properties (part of which would be the Idiomatic aspect of the word as used in DM, along with its phonological make-up (i.e., ‘l’-morpheme), and where [-Interp] would correlate to the more formal and abstract syntactic properties (i.e., the ‘f’-morpheme). The two-prong theory today is seen as part and parcel of a formal language system. Traditional parts of speech such as ‘Noun’ are redefined as a bundle of features that make-up a single l-morpheme type (called Root). The Noun root or ‘l’-morpheme is defined by how the root entertains certain local relations or governing conditions which it imposes on its complement hosts—e.g., how the Noun root might c-command or license its Determiner (in a local Specifier position) or a ‘Verb (in a local Complement position). A classic example here would be how the same lexical item Destroy appears as a ‘noun’ Destru(ction) when its nearest adjacent licenser is a Determiner (The destruction), or how the item takes on the role of a verb when its nearest adjacent licensers are Tense/Agreement and Aspect ( Destroy-(s), (is) destroy-ing, (have) destroy-ed) (marking Tense, and Participle respectively). This model now places the burden of syntax not with exterior stipulations, but rather with interior conditions that seem to flow up-ward from the lexical items itself and into the relevant projecting phrase. In this new definition (taken right out of MP, ‘Bare Phrase Structure’), the ‘phrase’ is reorganized as simply the sum of the total interacting ‘f’-morpheme parts; the ‘word; is thus redefined as nothing more than a ‘buddle-of-features’ that project out of the phonological shell. This new analysis will hold a number of consequences for how we come to understand language acquisition. For starters, much of what is being spelled out here concerns a two-stage acquisition of language development and that this dual stage can be accounted for the dual mechanism model as advanced in this paper. What I am on about here can be summarized as follows regarding language acquisition:

(i) Syntax, as understood in Chomsky’s Pre-Minimalist’s terms, may for all intents and purposes reduce to specific bundle-of-features that are encoded in ‘parts-of-speech’ words, (rendering a seemingly bottom-up learning mechanism where ‘meaning’ governs not only how words are learned, but how their syntactic properties project).

(ii) Syntax may no longer be considered as a top-down generator of sentence types, and so words have the capacity to emerge in a early stage of language merely encoded with ‘l-morphology or [+Interp] features. In this way, one may be able to define an early stage-1 word as exhibiting more or less only the phonological shell of the word void of its otherwise embedded syntax. If this is indeed the case, a viable maturational story can likewise hold for the onsets of ‘f’-morphology [-Interp] features for the given word. Much in the manner of Roger Brown’s observation leading to a sequence of morphological development (staring with -ing and ending with the Aux. Clitic etc.), a similar story could likewise hold regarding how certain features mature and then merge in a word—a maturation of features however which would not delay the onset of the word in phonological terms (or ‘l’-morpheme values), but would only delay the relevant selectional properties (or ‘f’-morpheme values, etc.) associated with its functional grammar.

The twin notions above would ultimately buttress any theory which would see language development as a maturational interplay of features—as captured here in our discussion of a Converging Theories Model.

51.

--Insert DP-analysis stage-1 here—

52.

A typical Chomskyan syntactic tree asserts that functional features (features having to do with M(ood), T(ense) and Agr(eement)) are assumed to be projected in a top-down way: these functional features are understood to be what is behind the notion of movement—lexical items move up the tree in order to acquire and check-off these features. The following question certainly could be formulated in Chomskyan terms: ‘why can’t lexical items have such features embedded in their sub-categorical entries, and if they can, what then would motivate movement other than some ad hoc stipulation requiring features to be checked-off in a overall top-down environment’? Chomsky’s interpretations are clear here—some top-down (deductive) measure must be constructed in order to establish a proper rule-driven syntax. Well, this may in all likelihood be correct, but where/what is it in the system that says that the syntax must start out in this way. Consider the tree below (reduced showing only M & T/Agr features):

The tree above positions the T/Agr features, along with their specific phrases, as having a top-down representation. If such a tree is completely available early-on in language acquisition—as the Continuity view would maintain—than there should be no reason why a child would exhibit 100% omission of say a top-down Agr feature in the way that would affect only certain words and not others. (When only certain words show individual residual affects, e.g., regarding subcategorization, syntax etc., then a strong claim can be made that the overarching phrase structure is not what is behind the phenomenon, but rather specific lexical-parameterizations may be involved.) (See J. Fodor 1997, Baker 2002 for a seemingly bottom-up treatment of lexical parameterization). In other words, if the structure is in place (from top-down) to deliver the feature of Agr (as with Case), than it would be hard to explain away the fact, if observed in the data, that some words could maintain Case while others (which should maintain Case in the target language) do not. Guasti and Rizzi (2001) say: ‘When a feature is not checked in the overt syntax UG makes it possible to leave its morphological realization fluctuating’. Fine. But, this is seemingly a bottom-up problem. It seems that such optionality would have nothing to do with the phrase (per se). What do we say when the feature itself (as projected from the tree top-down) seems to select some words over others regarding inflection? Surely, if this is a top-down venture, then the features should project onto all verbs (for the appropriate phrase), and not just a select few. But this is in fact what we find at our stage-2 of language development—some words may (optionally) inflect/project the specific feature while others completely by-pass it (entirely).

53.

--Insert data from Radford & Galasso here--

54.

This gives us the flavor of specific words (and not word classes) taking on functional features (bottom-up). The problem here is how does one maintain the higher-ordinance structure of functional grammar originating from the latter two upper layers of the tree while selecting the functional projection on only a select handful of words. One way around the dilemma may be to suggest that the lexical word itself has part of the (upper-branching) tree embedded in the very lexical item itself (as in sub-categorization). In this way, a specific word may reflect a specific functional feature or parameter while another word may not (on a specific lexeme by lexeme basis)—in all actuality, what we are talking about here is that (i) the initial process of the acquisition of functional grammar involves one word at a time (in a bottom-up way), and that (ii) only at a later more developed stage does such feature projection extend to the overall class of words (which then extent to phrases). Following in the spirit of Lexical Parameterization (Borer), Janet Dean Fodor in a similar vain has tentatively suggested in some recent work that parameterization may affect certain words (as in lexical feature specificity) and not others (outside of the scope of its word class) (talk presented at the University of Essex, 1997). One outcome of this would assume that children establish parameter values (perhaps piece-meal) and not grammars as wholes. An example of such bottom-up parameterization or say feature specificity (only selecting [+/-Nom] Case marking here) might then be diagrammed in the following manner:

Such an exclusively bottom-up parameterization method would however obscure correlations often found in the data regarding Case and/or Agreement—such as a seemingly top-down holistic correlation which seeks to link (i) [+Nom] Case if in an agreement relation with a Verbal INFL, (ii) [+Gen] Case if in an agreement relation with a nominal INFL, (iii) Default Case otherwise. It may be that such correlations do come on-line after an initial ‘non-phrase’ parameterization stage—hence, an initial and not fully fledged parameterized stage would meagerly work with individual words, delaying class-parameterization to a slightly later stage.

55.

A growing body of research recently undertaken by developmental linguistics suggests that children’s (stage-1) multi-word speech may in fact reflect low-scope lexical specific knowledge rather than abstract categorical-based knowledge. As discussed above, this distinction clearly points to a possible language acquisition processes as proceeding from out of a dual mechanism in the brain. For example, regarding verb inflection, studies (Tomasello & Olguin, Olguin & Tomasello, Pine & Rowland) have shown that the control children have over morphological inflection very early in the multi-word stage is largely individually rote learned—that is, there is no systematic relationship between stem and inflection, nor is there any transfer from ‘supposed’ knowledge of an infection to other stems. In other words, at the very earliest stages of multi-word speech, there is little or no productively of transferring the knowledge of one verb to another. This may suggest a stage-1 based not on complete paradigm formation, but rather on (semi)-formulaicy.

56.

Rowland suggests that a distributional learning mechanism capable of learning and reproducing the lexical-specific patterns that are modeled in the input may be able to account for much of what we find in the early stage-1 data. Input of a high frequency nature will then trigger rote learning associations and patterns that will manifest in the speech production of young children. This notion of rote-learned vs. rule-based or non-systematic vs. systematic behavior (respectively) can be further investigated by looking into what has become known as the U-shape learning curve. For instance, indications of systematic (rule-based) behaviors can be seen in overgeneralization. In other words, if overgeneralizations appear with, say, the morphological inflection {s} as in the portmanteau forms for either Verb or Noun—e.g., I walk-s, feet-s (respectively), than a sound argument could be made that rules have been employed-albeit, rules which have erroneously over-generated. (In fact, if children in the process of their early language acquisition are never seen to over-generalize rule-like formations, this is very often a sign of potential Specific Language Impairment (SLI), a result of some neo-cortical brain malfunction which has disturbed the normal syntactic structuring of rules and paradigms.) And so, we rightly extend the argument that if rules are being applied at a given stage, than a rule-based grammar has been activated: Right you say. Well, as it turns out, there are some very interesting findings which suggest that apparent ‘look-a-like’ rules at stage-1 are in fact imposters and don’t really behave as ‘true’ rules.

57.

U-Shaped Learning. One of the most striking features of language acquisition is the apparent so called U-shaped Learning Curve found straddling the two stages of language acquisition. In brief, the U-shaped curve is understood in the following way:

(i)Inflection. Children’s earliest Inflected/Derivational word types are, in fact, initially correct-that is, it appears to be the case amongst very early MLU that children have correct formulation of rules. (It goes without saying that typical early MLU utterances indeed have no tense markings to speak of (cf. Wexler & Radford’s Maturational Theory). The point here is that whenever a small sampling of Tense does appear in early MLU speech, it always appears correctly). An example of this is the early emergence in the data of the past tense and participle affixes [ed] and [en] e.g., talked/gone (respectively). The initial Past Tense and Plural forms are correct, regardless of whether or not these forms are regular (talked/books) or irregular (went/sheep). However, and what is at the heart of this striking development, it also appears that this initially correct performance stage is then followed by a period of attrition during which the children actually regress—that is, at this slightly later stage in development, they do not only lose some forms of affixation, but in addition, produce incorrect over-generalizations in tense forms (go>goed>wented), and plural forms (sheeps), as well as non-inflected tensed forms e.g., talk-ø/go-ø (=past tense). To recap, the first occurrence of inflectional overgeneralization roughly at age 2 years that supports a rule-based grammar is preceded by a phase without any errors at all.

(ii)Phonology. Similar to what one observes regarding a u-shape grammatical/inflectional development, children also appear to follow a u-shape learning curve with regards to phonology. An example of this is the often cited early productions of e.g., (i) slept /slept/, cooked /kƯkt/, played /plae:d/ > to (ii) sleeped /slipId/, cooked /kƯkId/, played /plae:Id/ > and back to (iii) slept /slept/, cooked /kƯkt/, played /plae:d/ (respectively) completing a

U-shaped morpho-phonetic curve yielding ^/t//d/^/t//d/.

What appears to be good examples of ‘rule-based’ inflection and assimilation in (i) and (ii) (above respectively) is in all actuality nothing more than the product of a ‘parrot-like’ imitation sequence—more akin to iconic pattern processing derived from stimulus and response learning. The child can be said to engage in segmental, phonetic-based rules only when s/he appears to process the rules yielding an incorrect overgeneralization of past marker {ed} typically pronounced as the default /Id/ which forms the middle-dip portion of the u-shape curve. Recall, in terms of phonology, the child has three allophonic variations to chose from:

a. {ed} => /t/ “walked” /wa:kt/
b.& {ed} => /d/ “played” /ple:d/
c. {ed} => /Id/ “wanted” /wantId/

It seems that a default setting with regards to phonology (place & manner of articulation) is minus Comp(lex) where [-Comp] denotes one feature distinction over a two or more features (for instance, bilabials /b/ /m/ would have a [-Comp] feature whereas labio-dentals and inter-dental /f/ /q/ (respectively) would have a [+Comp] since both lip and tooth are involved. In addition, it seems that plus voicing [+V] typically wins out over minus voicing [-V]. By using these default settings, we naturally get voiced plosive /b/ d/ /g/, nasals /m/ /n/, as our very first sequence of consonants along with [+V] vowels. By taking this default status, the /Id/ should be the allophone of choice, and it often is. In this manner of speaking, adherence to the default setting suggests at least some formation of the rule: defaults work within rule-based paradigms and so should be considered as a quasi-rule-based generation as opposed to a pure imitation sequence.

58.

The first two stages of development that form this apparent u-shape curve has been interpreted as manifesting the application of qualitatively different processes in the brain—representing different modes or stages in the course of language acquisition. This u-shaped curve arguably provides some support for our stage-1 to be defined in terms of a formulaic stage rather than as a syntactic and true-rule learning stage. The second up-side of the u-shaped curve is found to coincide with an independent syntactic development—the emergence of a Finiteness marker, and that this finiteness marker only emerges at our functional stage-2 (see Clahsen). In sum, the three stages could be described in the following way:

(i) The first period of the first up-side curve (correct production) correlates with a style of rote-learning. This more primitive mode of learning suggests that the mental lexicon is bootstrapped by mere behaviorist-associative means of learning. In such a rote-learning stage, lexical items (either regular or irregular inflections) are stored in an independent mental lexical heavily based on memorization of formulaic chunks and associations and are processed in a different part of the brain. It is of no surprise that irregular verb past inflection (go>went) out number regular verb past inflection (talk>talk-ed): The former being stored in the lexicon as a formulaic chunk, while the latter indicating the morphological rule formation [V+ {ed}]. Hence, our dual converging theories model postulates for a sharp contrast and disassociation between regular vs. irregular inflection. This seemingly early correct production is therefore due to a low-scope, phonological ‘one-to-one & sound-to-meaning’ relationship with no relevance to rules. Hence, our formulaic past tense inflection is not realized as [stem + affix] [talk-{ed}], but rather as one unanalyzable chunk [talked].

(ii) The second stage then marks the onset of a rule process (albeit, not necessarily the mastery of it). Here, the child is seen as letting go with the formulaic lexical representation in favor of rule formations: i.e., patterns of concatenate stems appear along side inflectional affixes. Thus, irregular forms often get over-generalized with the application of the rule resulting in e.g., goed/wented/sheeps. This overgeneralization stage maps onto a chronological functional categorical stage of language acquisition where rule-based mechanisms are becoming operative. Thus, the over-generalized up-swing of the u-shaped curve is linked to children’s syntactic development: over-generalization of inflection appears when the child ceases using bare-stems (as in stage-1) to refer to past events.

(iii) The third and final stage marks the second up-side swing of the u-shaped curve and represents the correct target grammar.

59.

It is thus proposed that this tri-staged learning process—from correct to incorrect to correct again—can more properly be accounted for by a dual learning mechanism in the brain: (i) an initial mechanism that has no bearing on rules and is pinned to a type of process best suited for more behavioristic associative learning, such as base lexical learning, irregular verb learning, lexical redundancy formations, etc

Brain Related Studies

60.

Much of the theory behind a dual model of language has become buttressed by recent developments in Brain Related studies. There is now an ongoing stream of data coming in that tells us the brain does indeed process different linguistic input in very different ways. Some of the first analyses using fMRI (functional Magnetic Resonance Imaging), and other brain-related measures show that irregular inflection processes (go>went) seem to be located and produced in the temporal lobe/motor strip area of the brain, a processing area strictly associated with basic word learning referred to as the lexical component, or Lexicon). On the other hand, regular inflection processes e.g., (stop>stopped), where the rule [stem]+[affix] is applied, point to areas of the brain which generate rule formations, i.e., the computational component. In other words, there seems to be a clear indication that the two types of linguistic processes are dissociated. This same disassociation seems to project between how one processes derivational morphology—here, being equated to irregular and/or whole lexical word retrieval—and inflectional morphology.

61.

Wakefield and Wilcox (=W&W) (1994: 643-654) have recently concluded that a discontinuity theory—along the lines proposed by Radford—may have an actual physiological reality as based on a biological ‘maturation’ of brain development. Their work consists of two segments: the first being a theory of the relationship between certain aspects of brain maturation and certain transitions in grammatical representation during the course of language acquisition, the second being a preliminary investigation to access the validity of the theory by testing some of the specific hypothesis that it generates. In their model, it is the left posterior aspect of the brain, at the junction of the parietal, occipital, and temporal lobes (POT) that generates semantically relevant, modality-free mental representations by allowing signals from all neocortically-represented sensory modalities to converge in a single processing region. In turn, the linguistically relevant contributions of Broca’s area, located in the inferior portion of the left frontal lobe imparts abstract structure to those representations with which it interacts—including (functional) grammatical components as well as the semantic components. The idea here is that we can now tentatively spot functional abstract grammar within the frontal lobe areas of the brain, and show how such grammatical aspects relate to the more primitive, prosaic elements of lexical-semantics (as spotted in the temporal lobe regions). The trick here is to see if the two regions are initially talking to one another (as in neuro-connectivity), say at our grammatical stage-1. Using PET/ERP-language studies, a sketchy two-prong picture emerges suggesting that the neural mechanism(s) involved split along lexical and functional grammatical stages of language development. It is clear that Broca’s area is involved not only with the generation of abstract hierarchical structure, but, with the representation of lexical items belonging to functional categories. However, the studies reveal that in order for Broca’s area to work at this highly abstract level of representation, the frontal lobe which houses Broca’s area must also connect to the POT region of the brain—in this sense, a real conversation must be carried out between the (first order) semantic properties of language (POT) and their functional counterparts. This relationship parallels the lexical-functional dichotomy found in all language.

62.

The W&W study suggests that the maturational development of language follows from brain development—and can be summarized below:

a. The lexical stage-1 of language acquisition naturally arises from a disconnect between the more primitive POT (temporal-lobe/lexical-grammar region) and the hierarchical Broca’s area (frontal-lobe/functional grammar).

b.& This disconnect has to do with the biological development of myelination in the bundle of axons that connect the two areas together. Myelination of axons is then said to mature at roughly that chronological stage where we find a lexical (staged) grammar merging with a functional (staged) grammar.

c. With respect to the brain/language relationships in the child, it is important to recognize that during the period of time typically associated with the initial stages of language acquisition, the brain is still in a relatively immature state. Neural plasticity begins with the sensory motor-strip temporal area (POT), and then proceeds to move to secondary areas (Broca’s area) related to the frontal lobe region.

Conclusion: A Converging Theories Model

63.

In the history of all pursuit of science, it has traditionally been the case that science precedes and develops via different methods and theories. Converging approaches always strive to expose inherent weakness in their opposing theories. It goes without saying that convergence methods go far in peeling away biased assumptions which often lead to half-correct assertions. Taking what is good from one theory and throwing away what is not is just common-sense science. For example, on one ‘converging; hand, Chomsky has asserted that syntax is the result of the creative human brain set-up in such a way as to manipulate true-rules. It creates, from nothing external to itself, the structure of language. (See special nativism above). In restricting ourselves to the point at hand, Chomsky has assimilated much of his arguments from the long line of rational philosophy and has converged such reasoning into how he believes an autonomous language structure (internal) might be construed. His belief that syntax is autonomous directly paves a way for him to distinguish between species-specific (human/hominid) language and other modes of cognitive-based primitive communications (animal/pongid). His now famous debates—first between Skinner (Behaviorism) and later with Piaget (Constructivism)—can be readily reduced back to Converging Methodologies between (philosophy and cognition) which sought to return language to seventeenth century nativist assumptions. Later, he would go on to extend such arguments to fight off pure pragmatic/socio-linguistic pursuits of linguistics—saving the study of language from becoming strictly a ‘humanities’ field of study which emphasized social phenomena with little if any analytical worth: (cf. Quine, Rorty pace Chomsky). Taking his notion of an autonomous syntax further, the natural next step to take would be to say that all other aspects of language (whatever they may be) that can’t fall under this autonomous rule-based syntactic realm might be conversely tethered to both behaviorism and associationism as part of an underlying cognitive mechanism. Chomsky has himself expressed the possibility that general mundane concepts--many of which contain inherent sub-categeorial features that are extremely convoluted and abstract, yet from which we go on to readily attach labels (=words)—may be preconceived and innate: however, he goes on to suggest that such conceptual innateness may be tethered to cognition as a universal ability to get at meaning (Chomsky 2000: p.61-62):

These conceptual structures appear to yield semantic connections of a kind that will, in particular, induce an analytic-synthetic distinction, as a matter of empirical fact.

These elements (he cites concepts such as locational nature, goal, source of action, object moved, etc.) enter widely into lexical structure¼ and are one aspect of cognitive development.

64.

On one hand, what Chomsky seems to be saying is that (i) Functional Grammar, or Syntax (par excellence) is autonomous and disassociated from all other aspects of the mind/brain-including meaning and/or cognition. Thus, syntax is created from out of the mind’s creative and independent eye (with all aforementioned nativist trappings). However, and to the point of this section, Chomsky doesn’t hesitate to attribute those non-syntactic aspects of language, say word learning (based on frequency learning and associationism, to cognition. This, I believe, goes to the heart of the matter--namely, that a form of converging theories has been evoked here and could be summarized as follows:

Chomsky and Converging Theories

1. Syntax proper (labeled herein as Functional Grammar) is creatively formed by a true-rule process via an innately given Language Acquisition Device (LAD) (more recently called the Language Faculty)—comprising of initial grammatical default settings of which are called Universal Grammar. For example, this is where the more abstract Inflectional rules are housed: the functional features of number/person/case/agreement/tense e.g. Plural [N+ {s}], Past Tense [V+{ed}], etc. Of course, the ‘Wugs-Test’ of Berko goes directly under this category. Meaning is detached from syntax.

2. Word learning (labeled herein as Lexical Grammar) is formed via a one-to-one iconic association between sound and meaning. This process of both word learning on (i) a phonological level, and word learning on (ii) a semantic/conceptual level, is more akin to past behavioristic notions of learning. Very young children (at our stage-1) may exploit and over-extend such processes—this is apparently what we find regarding formulaic type utterances, Irregular Verb/Noun lexical learning and retrieval, as well as Derivational morphology.

65.

Connectionism In view of Chomsky’s assertion that Syntax is autonomous, there can be by definition no primitive lower-level capacities at work in syntax—namely, nothing that hinges on perception, sound, object movement, spatio-temporal, etc. Although we share with our primates such low-scope abilities, more than anything else, it is our ability to work with abstract rules which creates the unsurpassable, and ever widening gap between human language and animal communication—the former based on true-rules & syntax, the latter based on more primitive behavioristic modes of learning. Regarding the higher-level processes having to do with syntax/grammar, the bootstrapping problem as discussed above does provide a way for lower-level processes associated with connectionism to serve as a springboard for later rule-based grammar. For instance, it is now widely assumed (cf. Plunkett, Elman, among others) that something like a connectionist system most provide the neurological foundations for the apparent symbolic mind. In other words, a symbol processing system might sit on top of a connectionist implementation of the neurological system. Such an heteroarchical layered approach to language would be similar to stating that in order to talk about Darwinian Biology, one must first acknowledge the underlying universals of Physics. However, having said this, and more to the point of Chomsky’s reference to autonomous syntax, a symbol processing system would operate according to its own set of principles. Recently, the notion of hidden units/rules providing crucial feedback loops in connectionist processors have been interpreted (much to the chagrin and potential demise of the pure connectionist group) as a form of a quasi innate symbolic devise--cleverly hidden in the actual architecture itself. (See the on-going debates between Marcus vs. Elman, Elman vs. Marcus on this). Nonetheless, it is now becoming commonly accepted in connectionist circles that a number of local architectural constraints are indeed necessary in order to bring about a sufficiently qualitative approximation of computation worthy of language: constraints such as the right number of units (hidden and overt), layers, types of connections etc. Notwithstanding camp rhetoric and inevitable spin involve—again, arguments tantamount to the old nature vs. nurture debate--there however may be something to the notion that such hidden units serve as a bridge between the two systems (and for that matter, the two schools of thought). Moreover, there is a certain degree of truth to the analogy stating hidden unit tabulations spawn symbolic rule paradigms. From this dualistic approach, it is possible to tentatively sketch out some shape of what a “Converging Theories Model might look like in the face of such aforementioned assertions:

66.

67.

The above lexical categories are substantive in meaning and are akin to more behavioristic processes such as rote-learning, formulaic mimics, frequency-based memorization-as discussed in a Piaget-style cognitive learning model. The above functional categories are non-substantive in meaning and are akin to true autonomous syntactic theory (Chomsky), and the creative ability to carry out preconceived rules with novel items (Brown, Berko). From a neurological standpoint however, Chomsky’s idea of an autonomous syntax is treated as a quintessential impossibility-the obvious being that--there must be a syntax positioned in the brain, and thus buttressed by any or all cognitive apparatus. Our Converging Theory Model as sketched above partially gives this argument to the neuro-science position by claiming that indeed one aspect of language is tethered to a low-scope cognitive apparatus. Where we beg to differ is that “higher-scope” rule-driven processes are not cognitive bound and rather rely on their own set of structure dependency conditions to survive--a structure dependency that has no bearing on the exterior cognitive realm. This analogy follows in the wake of Steven Pinker’s hypothesis stating that a dual mechanism of language is at work in language acquisition—(i) one based on cognitive universal in relation to the lexical component of the brain (say, our stage-1 child grammar), and (ii) one based on true-rule formation related to the computational component of the brain (say, our stage-2 child grammar). (Recall our brief discussion on Radford (§19ff) in defining a functional stage-2 grammar based on pure syntactic (high-scope) [+/-Interpretable features).

68-69

A Final note

References (In Preparation)

Back to Index >>
[1] [2]