Chinese Language Facts

About one-fifth of the world speaks some form of Chinese as its native language, making it the language with the most native speakers. The Chinese language (spoken in its standard Mandarin form) is the official language of the Peoples Republic of China and the Republic of China, one of four official languages of Singapore, and one of six official languages of the United Nations.

The Chinese language is a member of the Sino-Tibetan family of languages. Although most Chinese view the many varieties of spoken Chinese as a single language, the variations in spoken language are comparable to those of Romance languages; the written language has also changed over time, though far more slowly than the spoken language, and hence has been able to transcend much of the variation in spoken language.

There is a lot of controversy around the terminology used to describe the subdivisions of Chinese, with some preferring to call Chinese a language and its subdivisions dialects, and others preferring to call Chinese a language family and its subdivisions languages. There is more on this debate later on. On the other hand, even though Dungan is very closely related to Mandarin, not many people consider it Chinese, because it is written in Cyrillic and spoken by people outside of China who are not considered Chinese in any sense.

Cantonese is unique among non-Mandarin regional languages in having a widely used written standard. The other regional languages do not have widely used alternative written standards, but many have local characters or use characters that are archaic in baihua.

It is common for speakers of Chinese to be able to speak several variations of the language. Typically in southern China, a person will be able to speak the official Putonghua, the local dialect, and occasionally either speak or understand another regional dialect, such as Cantonese. Such polyglots will frequently code switch between Putonghua and the local dialect, depending on situation. Sometimes, the various dialects are mixed from other dialects, depending on geographical influence. A person living in Taiwan, for example, will commonly mix pronunciations, phrases, and words from Mandarin and Minnan, and this mixture is considered socially appropriate under many circumstances.

The complex interaction between the Chinese written and spoken languages can be illustrated with Cantonese. There are two standard forms used in writing Cantonese: formal written Cantonese and colloquial written Cantonese. Formal written Cantonese is very similar to written Mandarin and can be read by a Mandarin speaker without much difficulty. However, formal written Cantonese is rather different from spoken Cantonese. Colloquial written Cantonese is more similar to spoken Cantonese but is largely unreadable by an untrained Mandarin speaker.

The terms and concepts used by Chinese to think about language are different from those used in the West, partly because of the unifying effects of the Chinese characters used in writing, and partly because of differences in the political and social development of China in comparison with Europe. Whereas after the fall of the Roman Empire, Europe fragmented into small nation-states, the identities of which were often defined by language, China was able to preserve cultural and political unity through the same period.

The relationship between the Chinese spoken and written languages is somewhat complex. This complexity is compounded by the fact that the numerous variations of spoken Chinese have gone through centuries of evolution since at least the late-Han dynasty. However, written Chinese has changed much less than the spoken language.

In the field of software and communications internationalization, CJK is a collective term for Chinese, Japanese, and Korean, and the rarer CJKV a collective term for the same plus Vietnamese, all of which are double-byte languages, as they have more than 256 characters in their alphabet. The computerized processing of Chinese characters involves some special issues both in input and character encoding schemes, as the standard 100+ key keyboards of todays computers dont allow input of that many characters with a single key-press.

In Japan and Korea, Han characters were adopted and integrated into their languages and became Kanji and Hanja, respectively. Japan still uses Kanji as an integral part of its writing system; however, Koreas use of Hanja has diminished (indeed, it is not used at all in North Korea).

Until the 20th century, most formal Chinese writing was done in wenyan, translated as Classical Chinese or Literary Chinese, which was very different from any of the spoken varieties of Chinese in much the same way that Classical Latin is different from modern Romance languages. Chinese characters that are closer to the spoken language were used to write informal works such as colloquial novels.

Relationship between spoken and written Chinese

Spoken Chinese is a tonal language related to Tibetan and Burmese, but genetically unrelated to other neighbouring languages, such as Korean, Vietnamese, Thai, and Japanese. However, these languages were strongly influenced by Chinese in the course of history, linguistically and also extralinguistically. Korean and Japanese both have writing systems employing Chinese characters, which are called Hanja and Kanji, respectively. In North Korea, Hanja has been completely discontinued and Hangul is the sole way to express their language, while in South Korea, Hanja is used as a form of bold face. Along with those two languages, Vietnamese also contains many Chinese loanwords and formerly used Chinese characters.

The Chinese written language employs the Han characters, which are named after the Han culture to which they are largely attributed. Chinese characters appear to have originated in the Shang dynasty as pictograms depicting concrete objects. The first examples we have of Chinese characters are inscriptions on oracle bones, which are occasionally sheep scapula but mostly turtle plastrons (lower shells) used for divination purposes. Over the course of the Zhou and Han dynasties, the characters became more and more stylized. Also, additional components were added so that many characters contain one element that gives (or at least once gave) a fairly good indication of the pronunciation, and another component (the so-called radical) gives an indication of the general category of meaning to which the character belongs. In the modern Chinese languages, the majority of characters are phonetically based rather than logographically based.

The Chinese writing system is mostly logographic, i.e., each character expresses a monosyllabic word part, also known as a morpheme. This is helped by the fact that 90%+ of Chinese morphemes are monosyllabic. The majority of modern words, however, are multisyllable and multigraphic. Multisyllabic words have a separate logogram for each syllable. Some, but not all, Han characters are ideographs, but most Han Chinese characters have forms that were based on their pronunciation rather than their meanings, so they do not directly express ideas.

In addition to the previously noted divisions, there is also Putonghua and Guoyu, the official languages of the Peoples Republic of China and the Republic of China, respectively. These are based on the dialect of Mandarin as spoken in Beijing, and are intended to transcend all of China as a common language of communication. It is therefore the common Chinese language (as these are often called) that is the language of government, of the media, and of instruction in schools.

Chinese linguistics map 2The maps above depict the subdivisions (languages or dialect groups) within Chinese. The seven main groups are Mandarin (represented by the lines drawn from Beijing), Wu, Xiang, Gan, Hakka, Cantonese, and Min (which linguists further divide into of 5 to 7 subdivisions on its own, which are all mutually unintelligible). Linguists who distinguish ten instead of seven major groups would then separate Jin from Mandarin, Pinghua from Yue, and Hui from Wu. There are also many smaller groups that confound efforts at classification, such as: Dungan, a dialect of northwestern Mandarin spoken among Chinese-descended Muslims in Kyrghyzstan; Danzhou-hua, spoken on Hainan Island; Xiang-hua ?? (not to be confused with Xiang ?), spoken in western Hunan; and Shaozhou-Tuhua, spoken in northern Guangdong.

One major difference between Chinese concepts of language and Western concepts is that Chinese makes a sharp distinction between written language (wen) and spoken language (yu). This distinction extends to the distinction between written word (zi) and spoken word (hua). The concept of a distinct and unified combination of both written and spoken forms of language is much less strong in Chinese than in the West. There are a variety of spoken Chinese, the most prominent of which is Mandarin. There is however only one uniform written script. (See section below).

Since the May Fourth Movement (1919), the formal standard for written Chinese has been baihua, or Vernacular Chinese, the grammar and vocabulary of which are similar, but not identical, to the grammar and vocabulary of modern spoken Mandarin. Although few new works are written in classical Chinese, it is taught in middle and high school and forms part of college entrance examinations.

China language mapMost linguists classify all of the variations of Chinese as part of the Sino-Tibetan language family and believe that there was an original language similar to Proto Indo-European from which the Sinitic and Tibeto-Burman languages descended. The relations between Chinese and the other Sino-Tibetan languages is still unclear and an area of active research, as is the attempt to reconstruct proto-Sino-Tibetan. The main difficulty in this effort is that, while there is very good documentation that allows us to reconstruct the ancient sounds of Chinese, there is no written documentation concerning the division between proto-Sino-Tibetan and Chinese. In addition, many of the languages that would allow us to reconstruct proto-Sino-Tibetan are very poorly documented or understood.

Chinese characters are understood as morphemes that are independent of phonetic change. Thus, although the number one is yi in Mandarin, yat in Cantonese and tsit in Hokkien, they derive from a common ancient Chinese word and still share an identical character: ?. Nevertheless, the orthographies of Chinese dialects are not identical. The vocabularies used in the different dialects have also diverged. In addition, while literary vocabulary is often shared among all dialects (at least in orthography; the readings are different), colloquial vocabularies are often different.

