Why did early Indo-European languages seem to be morphologically complex?

Apparently there is a general trend that languages lose morphological marking over time. For example, according to this question PIE had 8 noun cases (nominative, accusative, genitive, etc), Latin 5, Romance 2 or even 1. Doesn't this show that less inflection is more natural?

My question is why the early languages had many morphological distinctions at the first place. I mean, why did the old Proto-Indo-European folks invent 8 different words to call "a lion", or dozens of words for "to eat", etc. To me, it sounds more natural to call a lion a lion regardless of whether it's nominative or accusative. What analyses have been made about this?

This puzzling mystery is completely resolved when you realize that recursive embedded grammar is a feature that is not present in ancient languages, and appears only well after the evolution of writing. When you need to handle recursion the case-systems and complex morphology of pre-literate language becomes unnatural.

All modern fully embedded grammars are essentially the same--- they are described by a context-free replacement grammar which allows adjectives, adverbs, and verb arguments to be replaced by multi-word phrases which serve the same role. The reason is not that this is a fundamental defining property of language. The reason is because the qualitative ideas behind context-free grammars were invented in Greek and Roman times, and Cicero and Aristotle explicitly prescriptively advocated writing this way.

This type of embedded recursive grammar is extremely successful at producing convenient written expressions of complex ideas in a short, but not unduly taxing, form. Due to its convenience, all of the old-world languages adopted the recursive grammar of Cicero et al., one by one, as they acquired bi-lingual speakers of European languages and translations of European recursive works. Once you have multiple embedding, it is very difficult to stop doing it, and you can easily invent a way to do it in any language.

This is the reason that in modern European languages, and in those of India, Asia, or Africa, recursive clausal embedding works in almost the exact same way, a way described well by a context free grammar, with potentially unlimited center, initial, and final embedding. This is like a virus, spreading via bilingual speakers, and only languages which were isolated from Europe by oceans were immune.

Thankfully, a few languages maintained their non-embedded form, due to cultural isolation, most notably Piraha (which has no center embedding, as described in the revolutionary work of Everett) and Warlpiri (which has no full recursive grammar as well). The Native American language as a rule did not have a full context-free structure, and neither do ancient Sanskrit, ancient Chinese, ancient Hebrew, or any ancient language other than (remarkably) ancient Greek and Latin.

This idea is explicitly described and argued by Fred Karlsson in "Constraints on multiple center-embedding of clauses".

Cicero's Remarkable Invention

Embedding with context-free recursive structure became so ubiquitous, that every literate person learns this structure before adolescence, and forgets that it did not come naturally. This structure was invented, not discovered, and it was invented by structure-conscious writers in Greek and Roman times. It spread by emulation to other languages, sometimes by conscious effort of literary folks to popularize this form of expression.

This means that scholars, who for obvious reasons structurally tend to be the most highly literate members of society, all see that every language that they learn has a roughly isomorphic recursive grammar that describes how to produce complex sentences, a grammar which is fundamentally based on a context-free replacement generative grammar. This comes as a shock--- it is a jarring realization which begs for an explanation

I am a native Hebrew speaker, and I remember learning English as a child. I remember that I was miserable for a while, because everything was new. When I finally learned enough vocabulary to make complex sentences, I was immediately struck by the fact that these sentneces, unlike simple constructions, are word-for-word identical to Hebrew complex sentences. I didn't have to learn anything more! I knew immediately how to produce any complex sentence without effort.

It is the same as you learn a new computer language. After learning a few function words and idioms, the structure of the complex expressions is immediately apparent, if you already know another programming language. The reason is that computer languages are all based on the notion of context-free grammars, explicitly abstracted from natural language by Chomsky and Schutzenberger. In the 1950s, Noam Chomsky gave a definition of a language grammar which made the embedding structure the primary ingredient. A language grammar is context free when it allows an arbitrarily deep center-embedding, and Chomsky hypothesized that all the world's languages are described by context free grammars because the original human language was described by a context free grammars.

This is true of all old-world languages, and without a historical appreciation, just by looking at the structure of the languages, one can mistakenly come to believe that this structure is very ancient, and the common source is in prehistoric times. This fallacy is so compelling, that it was unchallenged dogma until Everett's work of 2005.

A Language evolution fallacy

If you see that all birds share a hole in the hip-bone, and all dinosaurs do, you are justified in concluding that birds and dinosaurs have a common ancestor which had a hole in the hip-bone. The reason is Darwin's evolution--- this was the main prediction of the theory. The characteristics of the common ancestor are preserved by all descendents, and if you see two species with a common trait, you can be pretty sure that it was because they evolved in the same family.

This explains why life-forms organize in a heirarchical cladistics tree. Languages also come in a cladistics-like tree, and this is because the transmission of language is much like the transmission of genes, it preserves certain word-sounds and structures in a diverging evolving form.

But unlike evolution, bilingual speakers can transmit nontrivial structure horizontally between very distantly related languages. So that in languages, you find creoles, which in biology would be like an oak-tree/lizard hybrid. You find languages like English whose vocabulary is split almost 50/50 between Germanic and Latin roots, and which are clearly Germanic with enormous Latin influence. You find completely alien loan-words in English like "Kimono" and "Feng-Shui" which come from some of the most distantly related languages in the world.

But most significantly, grammatical constructions are also shared. The fact that all languages recurse the same way suggests one of two things:

The common ancestor of all languages recursed this way

Recursion was invented at one spot, and spread horizontally.

Experience with Darwinian evolution suggests the first option, and this is Chomsky's hypothesis. It's dead wrong. The correct answer is number 2.

This means that every one of the world's languages (except for Greek and Latin and their descendents) has a grammatical discontinuity, the moment when it became recursive. This is usually something you can see--- it is a sharp revolutionary advance on the past, and it leads to a golden-age of literature in the coming centuries.

Morphological pressures in pre-recursive and post-recursive languages

In pre-recursive languages, there is no fundamental reason to put the preposition marker before the word, and a very good reason to put it after--- there is already a definite/indefinite marker before the word taking up space.

If you say, in Hebrew "the mountain", you say "ha-har", which in syllable terms, puts a syllable before the word. Now if you say "I walked to the mountain" "halachti la-har, you are putting two syllables before the word. In Hebrew, the two syllables are merged to one, "la" which is much like "to-the" becoming "t'a", as in "I walked t'a mountain". But ignore that.

A word has two ends, and it is much clearer to put the definite marker on one end, and the case-marker (the preposition) on the other end, so that they don't have to fight. This makes best use of the phoneme space, and this is the preferred solution.

"I walked the-mountain-ward" (Halachti ha-har-a)

But this solution is the casing solution, and it interferes with embedding in a way described in the body of this question: Did case systems dissappear to make embedding easier? . When you replace the mountain by an embedded phrase, it puts a syllable in the middle of the embedded phrase in such a way that it is difficult to shear off.

This puts pressure on languages to shed case systems and other morphological transformations in favor of stand-alone function words with a common syntax, beginning at the date that recursive embedding becomes common among all speakers of the language.

Ancient Embedding styles

Just so that I am clear--- all languages embed conceptually, they only don't embed grammatically. The concepts in a non-recursive language are not simpler than in a recursive language, they are just expressed more verbosely.

I want something. Namely, to be clear. All languages have conceptual embedding. They do not need grammatical embedding, not necessarily. If a language has no embedding, then the concepts are not simpler. You just say things more verbosely.

So there is no implication that speakers of Piraha are somehow less than human, or not fully capable of philosophizing, or anything like that. These ideas only come if you associate grammatical recursion with language, an association which is false.

dude, you really like long posts, do you ? :p

@Louis Rhys:

that's nothing: physics.stackexchange.com/questions/14056/… .

"In pre-recursive languages, there is no fundamental reason to put the preposition marker before the word, and a very good reason to put it after--- there is already a definite/indefinite marker before the word taking up space." Wrong! There are lots of languages where there is no "definite/indefinite marker before the word."

@RonMaimon, I am afraid I don't quite understand what you mean. And please don't "dude" me.

@GastonÜmlaut:

Relative clauses are present in ancient Hebrew, Ancient Sanskrit and (I am sure) ancient Chinese. I'm sure you can find them in some form in Piraha too. Recursion is not relative clauses, it is nested relative clauses "This is the house that covered the fly that flew on the barn that Jack built". This you won't find until later.

Ron, this is an excellent point that I started thinking about a long time ago. At the time, I guess I figured we just don't have enough data from prehistory (by definition) to draw a conclusion. But even just contrasting the epic of Gilgamesh with the Iliad, there's a huge difference. It seems that this isn't mentioned often enough in linguistics.

... show 11 more comments

@ Ron Maimon: Your doubt is misplaced, evidently based on secondary sources, which are apparently making incorrect claims about the primary sources. The example that I have cited is from the Rigveda (the earliest Sanskrit), which scholarly consensus takes to have been composed between 1200-1000 BCE, and this particular example is from the oldest stratum of the Rigveda (cf. Oldenberg 1888 and Renou 1947). Thus, as a matter of fact, all of the oldest records of Indo-European languages (the Rigveda and Atharvaveda, Old Hittite, Homeric Greek) attest to languages that exhibit recursion.

Right. Clause recursion has nothing to do with writing. Language has nothing to do with writing. It only looks that way because the oldest examples of recursion are written. Naturally; unwritten words leave no fossils.

... show 23 more comments

Interesting response from Ron Maimon, although I am not sure I understand it fully. How can we be sure that proto-Indoeuropeans did not have elements embedded within other elements in a syntactic structure? Nor do I think that English is more "complex" syntactically than Greek or Latin, maybe less regular. Where there are analytic rules, they are obscure and differ spuriously between British and US speakers, as well as within British or US speakers. The word "complexity" sounds self-gratifying enough. A better word for that might be sloppiness. The fact is that proto-Indoeuropeans in a pre-literate age, had a highly inflected grammar, whose origin seems a mystery. Counter-intuitively, with the advent of schooling, languages started to become grammatically simplified. One interesting study I read was that illiterate reciters of epics from India could recite long epics from memory, whereas their children who went to school had to make notes and write everything down and had trouble in reciting these epics. So the explanation may be that illiteracy frees up certain brain resources that can be allocated to an inflected grammar among other things. That still would not explain how proto-Indoeuropean grammar became so highly inflected.

Culturally, it would cursorily appear that languages became less inflected with the introduction of large numbers of foreign speakers, e.g. in the case of ancient Latin being succeeded by less inflected Romance languages or in the creation of creoles (e.g. modern English). Overall, however, there seems a tendency, even in the emerging less-inflected languages, for scholars to use a more regular and more inflected speech than non-scholars or non-experts. There is, in contrast, also a view or trend among non-scholars that speaking in a non-inflected way is more expressive, but this sounds like some kind of conceit. There is a neurological condition when damage in the speech area induces patients who struggle to express themselves to invent new words of no known meaning. I can see how some people with natural or clinical deficits in the grammar and vocabulary of a language perceived themselves as more expressive in avoiding convention and coming up with the occasional new word, using a foreign word or an irregular expression. Occasionally it might have even sounded like poetic license of a sort and could catch on. Bizarre words like bloke, dude, funk, etc seem to belong to that department.

Ancient writers can be very much enjoyed today. Their style has been often imitated and certainly they had no problem with expressiveness, just because their language was more inflected - that is self-deception.

I didn't say that the modern languages are more expressive, I said they are more easily compatible with recursion. When you have inflected languages, you don't do a simple textual replacement for a word with a phrase which occupies the same structural role in the sentence, you need to do extra fiddling around. So when you have recursion there is suddenly a pressure to jettison all dongles. I noticed this as a child, in Hebrew, where all the books advocate "Shmi Ron" (name-o-mine [is] Ron), while EVERYONE really says "Ha-shem sheli Ron" (The name of me [is] Ron), even though it is longer!

In this case, I knew from direct experience why this was happening--- you sometimes say "Ha-shem shel ha-ach sheli hagadol ze Ron" (The name of the brother of-mine the-big it [is] Ron), and the "shel" construction allows for easy recursion, and makes one universal rule for all such constructions, recursive or not, while the "i" at the end doesn't work with phrases, and only works to modify single words. The general rule is easy to induct from this--- when you have people recursing all the time, they want stand-alone words and no casing or attached possessives, these make recursion hard.

Someone else claimed that English is an expressive language, perhaps because it is non-inflected, and my criticism had nothing to do with what you had said. You make some good points but it remains counter-intuitive how you go the other way around from recursion to inflection, assuming that was how the proto-Indoeuropean language developed.

You never go the other way around. There's no way proto-Indo-European was any more recursive than anything else.