New York Times bestselling author and renowned linguist, John McWhorter, explores the complicated and fascinating world of languages. From Standard English to Black English; obscure tongues only spoken by a few thousand people in the world to the big ones like Mandarin - What Language Is celebrates the history and curiosities of languages around the world and smashes our assumptions about "correct" grammar.
An eye-opening tour for all language lovers, What Language Is offers a fascinating new perspective on the way humans communicate. From vanishing languages spoken by a few hundred people to major tongues like Chinese, with copious revelations about the hodgepodge nature of English, John McWhorter shows readers how to see and hear languages as a linguist does. Packed with Big Ideas about language alongside wonderful trivia, What Language Is explains how languages across the globe (the Queen's English and Surinam creoles alike) originate, evolve, multiply, and divide. Raising provocative questions about what qualifies as a language (so-called slang does have structured grammar), McWhorter also takes readers on a marvelous journey through time and place-from Persian to the languages of Sri Lanka- to deliver a feast of facts about the wonders of human linguistic expression.
Page through a grand old book on what was once known as natural history—as we all do so often, of course—and you’ll find that almost all drawings of marine life are rendered from the perspective of someone standing on the shore.
There will be some fish bobbing around out in the waves, and maybe some flying fish doing what they do. But clams, squid, sea anemones, and such will be lying on the beach, or artfully positioned on conveniently placed rock formations, or even just dangling from the margins of the picture. This was standard procedure in illustration until past the middle of the nineteenth century.
It all looks nice enough. But wouldn’t it seem more natural to draw a squid swimming in the water, springy, fierce, and alert, instead of putrifying on a rock?
But then, what was “natural” to someone in 1840? Even if they were a naturalist? One thing that wasn’t natural, if you think about it, was imagining what an underwater scene looked like—for a simple reason. People then didn’t have the technology to ever be underwater for very long, and certainly not to be able to see much while even making a stab at it.
There were no diving bells or submarines. You might take a deep breath, hold your nose, and dive under for a look, but water is often muddy and it’s hard to see through it when it’s moving.
Plus, you can only hold your breath for so long—and certainly not long enough to plunge a mile down and get a peep at anglerfishes and such.
In England, it was only after a home aquarium mania in the 1850s that people started to get a sense of what aquatic creatures looked like in life, such that illustrators began drawing underwater marine scenes. Before this, as modern as the British were in so many ways, even those with advanced educations, three names, and salad forks had no way of picturing undersea life in the “Jacques Cousteau” style that is second nature to us. To start thinking of sea creatures that way, you had to see them that way. And in many ways, quite often, to be a linguist is to feel like you’re underwater in 1840 while everybody else is up on the beach laying jellyfish out on rocks.
IT’S BECAUSE SO MUCH about language is so hard to see. Or hear.
So from what it’s easy to see and hear, we learn that there are languages, and then in many parts of the world there are assorted “dialects.” These “dialects” are, in some sense, lesser than languages. Part of the difference would seem due to the fact that, as one typically supposes, a language is a collection of words. English has enough words to fill a doorstop like The Oxford English Dictionary (actually the printed version could practically serve as a garden wall). Some “dialect” out there in the rain forest does not—and therefore qualifies as something different from, well, a language language. And then there’s the writing issue: if a language isn’t fixed on the page, then surely, we suppose, it has not achieved its full power. A certain transiency hangs about it; it’s just a “dialect,” in other words.
Because of this it becomes natural that if asked which was more complex, French or the language of a tiny group in New Guinea called the Nasioi, most people would immediately suppose that the answer was French—a “developed” language, after all. The truth, however, begins with the observation that if you thought French’s two genders were annoying, imagine having to deal with Nasioi’s one hundred!
Down underwater, what we see is a world with six thousand languages, period, whether or not they ever see the printed page and even if their vocabularies number only in the tens of thousands. If anything, the languages that are a little “sub-ordinary,” a little “special,” as we might designate a certain congenitally ungifted sort of person, are typically rock-star ones like English, French, and Mandarin Chinese.
But who’d know? We’ll never meet a Nasioi, much less have any reason to learn the language. Besides, we’re too busy attending to other notions about our own language, such as that one of the gravest flaws of the Anglophone is a noisome propensity to use the language “illogically.” We are taught that a language is sensible, tidy—such that we treat it as an oddity that English is shot through with random inconsistencies. Richard Lederer has heightened the festivity quotient of many an e-mail inbox via excerpts that get around from his Crazy English book along the lines of “Why are loosen and unloosen the same?” or “If we conceive a conception and receive at a reception why don’t we grieve a greption?” or observations such as that there’s no egg in eggplant and no ham in a hamburger.
This stuff is, in fact, but the tip of an iceberg of nonsensicality in English—underwater you can see the rest, but we humans are terrestrial. Not to mention territorial—don’t even get most of us started on what happens when languages mix together. Spanish full of English words is “Spanglish,” reviled by many and thought of as “an issue” by others. And there was even a time when more than a few had a serious problem with English having taken on so many words from French and Latin. After all, a real language is “pure.”
HERE’S WHAT IT FEELS LIKE to be underwater.
One reads a perfectly pleasant newspaper article about people in the Caucasus Mountains, a patch of a region home to several dozen languages. The one closest to famous is Georgian. One of the other ones mentioned in the article, spoken by only about twelve hundred people in a few villages, is called Archi. And in the article, what do we learn about Archi? Only that it is a language “of unknown origin.” Otherwise the article is about jokes Archi people and nearby Caucasian language–speaking groups tell about one another.
Of course I’m not waiting for a newspaper writer to give a linguistics lesson about Archi. But given the “on the beach” perspective on language that reigns, it’s hard not to feel like something has happened when a language like this is flagged in passing, especially as some orphan. Attention must be paid—if not in the article, then somewhere. That “unknown origin” business, for example, with the quiet implication that as a kin-less sort of thing, Archi is all alone, as unclassable as it is unknown, outside of the light, less than something—a “dialect,” perhaps?
To be sure, if the idea is that a language’s “origins” must be on paper, then Archi is lost indeed—it has been a spoken language rather than a written one, like all but about two hundred of the world’s six thousand languages. But paper isn’t the only way to tell where a language came from.
A group of similar languages, such as French, Spanish, Italian, Portuguese, and Romanian, begin as one language, which splits off into several when populations become separated over time. Linguists can compare the word for something in related languages and deduce what that word was in the parent language. We know that the method works pretty well in cases where even the parent language was written down for posterity. For example, hand is main in French, mano in Spanish and Italian, mão in Portuguese, and mâna˘ in Romanian. No linguist is surprised, based on the techniques of what is called comparative reconstruction, that the Latin word for hand is manus.
In the same way, Archi is one of a passel of kittens, a language family called Northeast Caucasian. If the word for, say, tongue is mac in Archi—which it is—and mott in Chechen (the one language of this family whose speakers are today known to the outside world), maz in Lak, mez in Lezgian, mic in Bezhta, mott’ in Batsbi, muz in Udi, and mici in Tindi and so on, then linguists can use these words to roll back the tape and see that the original word in the parent language was maʒ i, even though that language was never written down.
So Archi is not of “unknown origin” at all—it has kin, sprung from a paterfamilias language probably spoken about six thousand years ago. If anything, it is of better-known origin than most of us are as people. Most of us have no records of our ancestors further back than four or five generations and certainly couldn’t reconstruct them from our DNA and our relatives’. The world is bursting with language families of this kind, whose ancestral languages can be reconstructed in the same way as the Caucasian word above. In some cases the ancestral languages can be shown to possibly be related to one another.
Or, while we’re on “tongue”—the newspaper writer is hardly the first person to write of languages of the Caucasus with a subtle sense that they have not quite “arrived” in the sense that, say, German has. The grand old author of language books for the general public, Mario Pei, though a god of mine, repeatedly termed the Caucasus languages “tongues,” as opposed to the “languages” he refers to elsewhere. “Among the widespread languages, Arabic is the one having the greatest variety of guttural sounds; but the tongues of the Caucasus are generally conceded to be the ones having the richest assortment of consonant-sounds,” he had it. Or, in the very first issue of the flagship journal of linguistics, Language, wouldn’t you know that one of the articles was titled “The Influence of Caucasian Idioms on Indo-European Languages.” Mind you, the writer here did not mean “idiom” in the sense of kick the bucket or “Whatever!” but just as a speech variety different from a language such as an Indo-European one. Spoken by small numbers of people often preliterate until recently and almost never written down—surely these are tongues, idioms. And whatever those are, it’s something different from a language.
That plays into the on-the-beach view of language—Archi, as it were, drying out on some piece of driftwood. But what a marvel is Archi when you can see it alive! For one thing, while English has a couple dozen consonants, Archi has over seventy-five! Just enunciating this “tongue” would leave most of ours in knots. And Pei even slipped a bit in designating Caucasian consonantal richness a world record: the Khoi-San “click” languages of southern Africa, also obscure and unwritten, are the ones with the biggest consonant inventories on earth. Pei squeezed by this with “The Hottentot-Bushman languages of southwest Africa use grunts and clicks as normal parts of their speech-sounds,” apparently thinking that the “grunts and clicks” are not consonants. But they are, as central in indicating meaning as b and h are in distinguishing a bat from a hat: for example, the symbols! and | indicate certain click sounds, and in the Nama click language of Namibia, !hara means “examine” while |hara means “dangle.” But at least Pei, for some reason, called these “tongues” languages—and in any case, Archi is vastly richer in sounds than what an English speaker has any reason to think of as normal.
And then Archi’s grammar steams and jangles with so much more “stuff” than English that it can be hard to imagine that people actually speak it casually, courting and cooking and dozing off in it, as opposed to the occasional gifted adolescent being sequestered to spend decades mastering it as a stunt to show off at festivals.
The following chart shows When they were going up to Tura, they saw a bear in Archi. The translation of each word is lined up underneath, with dashes between roots and their prefixes and suffixes. The double letters mean that you pronounce the consonant with special vigor. The double letter alone makes a difference in words’ meanings just like b and h in bat and hat—χat is “scratch” but χχat is “beam.”
THE WORD ORDER COMES OUT AS “TURA TO UP GOING THEY, BEAR SAW BY THEM,” BUT THAT’S JUST THE BEGINNING. THE WORD FOR went has a they shoved right in the middle of it—they is just one sound (!), b, and χatti becomes χa(b)tti, as if in English we said not “they travel” but “tra-they-vel.” Then χatti is an irregular form, and not just for went but something more specific, what I translated as “went towards”: as we have went for go in the past, Archi has an irregular go form for when the going hasn’t finished yet and is just towards the end, a shade of meaning that verbs take suffixes for just as they do for pastness (and of course the ordinary past form of go is irregular, too!). And then you have to have they yet again, as the -ttib at the end—but that is a form that you use only in a subordinate clause!
And in real life χa(b)tti-ttib isn’t uttered with helpful parentheses and dashes: it just goes by in a flash—χabttittib—and as part of a sentence, not on display by itself, and nobody puts spaces between words when they talk, so—Turališijattišiχabttitti bχχamsbakkulijijmes . . . and that’s just one sentence in a whole story! And in case you were wondering, χ is made by trilling your uvula—which sounds a little less exotic if compared to the similar gurgly r in French, but still.
A “tongue” indeed—and never mind that (get your uvula ready again) if the word χχams for bear were a subject, then often it would need a special ending, or that those endings are highly irregular, or that what I marked as “I heard” is a suffix that indicates that it’s something you heard rather than experienced—and you have to use it.
And finally, there’s so much irregularity in Archi that the rules almost seem like the exceptions. The plural marker in English is -s. But from these words, can you tell what the plural marker is in Archi?
ME EITHER. SURE, WE HAVE man and men—but in Archi those are bošor and kɬele! Yet this is a mere humble “idiom”—in which, between verb prefixes and suffixes plus a breathtaking muchness else, a verb can occur in 1,502,839 different forms. To those of us underwater, reading an article highlighting Archi as what people tell jokes in is like watching someone get a prize for building a card house out of little paintings that are, despite no one considering it worthy of mention, Titians.
IT’S NOT ONLY THE OBSCURE “TONGUES” that harbor so much more than meets the eye (or ear), but ones more familiar, such as English. A language is a fecund, redolent buzzing mess of a thing, in every facet, glint, and corner, even in single words. For our purposes, let’s take that word idiom as in the “Caucasian idioms.” A bread-and-butter etymology like “From the Greek idioma, ‘peculiarity, peculiar phraseology’” is fine, but there’s more.
Just as we know that Northeast Caucasian’s word for tongue was maʒi, from comparing words in the Romance, Germanic, and Slavic languages, plus Lithuanian, Irish and Welsh, Greek, Albanian, Armenian, Hindi, Persian, and plenty of others, we know that there was once a single ancestor to all of these languages in which a word for self was swe. Or better, swe-, because that language, which we call Proto-Indo-European, was very heavy on suffixes, and we can assume that swe came dressed in them much of the time.
Now, one of the most easily perceived ways that swe comes down to us is in that little se word that pops up in French, Spanish, and the gang in reflexive constructions, still meaning “self.” Recall: French Il se lave or Spanish Él se lava, “He washes himself.” For the record it’s also where our own self comes from, with an antique suffix frozen on, the meaning of which has been long lost.
In other cases, swe wended down different paths we’d never think of now. If you’re by yourself or within yourself, you’re apart from others, and that was the meaning of a swe rendition in Latin, sed, with the w dropped out and another one of those suffixes, -d, stuck on: “apart.” Hence when sed joined with cura, “care,” as in “worry,” the result was sed-cura “apart from worry,” which became securus, which English inherited as secure.
In Greek, swe went its own “separate” way. Your self is particular to you, such that the swe rendition swed (with another one of those mysterious frozen suffixes) took on that connotation. But not quite in that form. For one thing, Proto-Indo-European made swed into an adjective with a -yo suffix, and thus swed-yo, “self-ic,” as it were. And then there was a Greek “particularity”—s and w at the beginning of words had a way of flaking off, like h did in many regional British varieties (‘orse, ’ouse, and so on). That’s why the Proto-Indo-European root sreu, “flow,” lives on in English as stream, but in Greek as the s-less rhythmos (the source of our rhythm), and it’s why the root werg became work in English but in Greek became organon (with no w)—and thus words like organic in English.
This meant that swedyo became the Greek word edyo and before long, with the “eh” sound changing to an “ee” as it often does in languages over time, idio. One result: a person with a plethora of particular qualities may possibly be a weirdo, and there’s a short step from weirdo to idiot. But then another result: one thing particular to a person could be their language, especially as language is so deeply tied to psychology and identity. Naturally, then, idio could start to refer to one of the most particular particularities of being human, one’s idiom. The m began as the first sound in one of Greek’s first-person-singular endings, à la Spanish’s -o in hablo, “I speak”: idiou-mai meant “I make my own.” Latin took that on as idioma, but as a noun, with the m having no meaning of its own. Then English borrowed that, and never gave it back.
Which means, though, that idiom is the tail end of an imperceptibly gradual process of change in sound, meaning, and suffixation, such that the first i is all that’s left of the original swe—the sw- flaked off and the e morphed into an i. The d is the remnant of that ancient Proto-Indo-European suffix, the io of that other one that used to make a noun into an adjective and is now just a glide of the tongue, and the m is left over from a chunk of Greek we English speakers wouldn’t recognize as a suffix if it bit us on the leg.
That’s how words are in any language. Good old swe, for the record, is also frozen into sullen (how you might feel when you’re by yourself ), ethnic (that is, your own people—with that Greek sw- lost again), and boatswain (the swain is “your own man” up there on the boat). It’s even in the Irish organization name Sinn Fein, and not the Sinn but the Fein. The name means “We Ourselves”—Proto-Indo-European w, as in swe, became f in Irish, which is why werə-o-, “true,” became Latin’s vir, “man” (source of English’s virile), but is fíor in Irish today. Swe is even fossilized like an insect in amber in sober—Latin’s sed, “apart,” again, plus ebrius, familiar to us from inebriated, and thus “not drunk” was sedebrius, “apart from drunk.” Later sedebrius merged into sebrius . . . and then vowels fidgeted over time as always, and so sobrius, and you can imagine the rest.
And that’s just the spawn of swe. Every word we utter— within unbroken strings like Turališijattišiχabttittibχχamsbakkul ijijmes, all day every day, carries baggage from merry morphing just as idiom does. The way this morphing happens shows that language is a very different thing than it tends to seem.
Namely, in honor of the word idiom and its muttly history, with individual sounds tracing to different sources, I’d like to split the word up in a similar way so that it can show us the nature of human language more directly. In this book we will see that language, whether a “language,” “tongue,” or “idiom,” is:
I: Ingrown. All speech varieties have to indicate things that are pretty obvious even if left unmentioned. The “tongues,” however, have a navel-gazing way of taking this further than English speakers would imagine. Archi and its “I heard” marker is an example, as is the fact that if you do something to a bear it’s a χχams, but if the bear does something to someone or something it’s a χχamssi. In fact, unwritten languages tend to be especially ingrown, because the big-dude globe-striding languages have usually been streamlined by earnest but semicompetent adults learning them out of necessity, unable to do it as well as they could have as children. If Archi were spoken by countless millions instead of a count-ful twelve hundred, it would probably be a lot less dazzlingly complicated.
D: Dissheveled. All speech varieties are messy, full of illogical things that Richard Lederer could write books about and then some. Archi plurals are crazier than person versus people as often as not. Or, another sed-story: Latin for “to separate” was cernere. Sed + cernere, “to separate apart,” became secernere, from whose participial form secretus English got secret. Note that “separate apart” was the kind of thing that would be decried in Comments sections today as “redundant,” like irregardless with its ir-, and yet it was the source of a word now considered quite correct. Quests to make language usage “logical” look, from underwater, quizzical given that all languages are, at heart, jerry-rigged splotches doing the best they can despite countless millennia of unguided, slow-but-sure kaleidoscopic distortion.
I: Intricate. Despite the kaleidoscopic accretions and destructions, anything a human being speaks is a coherent system with rules, unless the human being is a toddler, brain damaged, or making their way in a foreign language. This is even true of speech varieties deeply marked by adult learning, such as Black English, created by adult slaves making their way in English by hook or crook and never mastering some of the rules of Standard English. In Black English, new rules have been born that make Black English more like Archi in some ways than the language The Wall Street Journal is written in. Rules are about much more than tables of endings and issues about whether something is a subject or an object.
O: Oral. A speech variety is not primitive just because no one writes it. Writing is merely a scratching down of what speaking sounds like, and the speaking is ingrown, dissheveled, and intricate just as the written reflection of it is. The now common concern about the looseness of writing online and in texting is a distraction born of an “onshore” perspective on what a language is. Writing is not “the language” itself, and how someone writes an e-mail has nothing to do with the marvelous complexity of how the same person talks, channeling their inner Archi in whichever language they speak. This, in turn, shows the fallacy in the alarm at speech increasingly coloring how we write. What has been strange is the separate development of writing from speech, and the blending of the two that modern communications technology allows is, in its way, creating a less artificial culture of language worldwide.
M: Mixed. “Pure” languages don’t exist. Likely as many humans speak more than one language as do not, and in the same mouth, languages are no more likely to stay separate than two liquids. As former members of the Soviet Union, Archi speakers learn Russian in school and use it as the main language of communication with people beyond the twelve hundred Archis. As such, they are unlikely to speak Archi for longer than a few minutes without using a Russian word, just as Latin speakers used Greek words like idiom and English speakers started taking so many of those same words from Latin for themselves. Linguists have encountered no language that isn’t penetrated with words, and even grammatical constructions, from other languages. Some readers will recall my showing in Our Magnificent Bastard Tongue that English’s use of do in questions and negative sentences—don’t you know?—is a steal from Celtic languages like Welsh.
AT THE CLOSE OF his On the Origin of Species, Charles Darwin famously wrote after his argument for natural selection as the source of the variety of the world’s animals and plants:
There is grandeur in this view of life, with its several powers, having been originally breathed into a few forms or into one; and that, whilst this planet has gone cycling on according to the fixed law of gravity, from so simple a beginning endless forms most beautiful and most wonderful have been, and are being, evolved.
In the same way, there is grandeur in this view of language.
How and why human language emerged is hotly disputed, but we know that at some point it happened, as “so simple a beginning.” And today we have “endless forms most beautiful and most wonderful.” Not a few “languages” and a bunch of evanescent, rootless “idioms,” but six thousand awesomely ingrown, messy, intricate, oral, and mixed creations.
To get a sense of what I mean, allow me to show you what the languages of the world look like from down here underwater.