Post by Thoithoi O'Cottage on Sept 7, 2019 10:34:08 GMT 5.5
What dictionary makers of languages that have been in writing widely for a millennium (not to say more) or at least a half (for example, English) is to include or exclude words depending on whether they (words) occur in written text and the frequency of their occurrence. Linguistics in general and the principles of lexicography in particular, as we know them now, are built on the experience and practices of linguistic or lexicography practitioners that proved most convenient in their business. These linguists and practitioners being mostly from the West, specially from among the most prominent languages known all over the world (at least at the time when dictionary making began) in speech, writing and printing, the guiding principles were born and brought up in the linguistic, social and cultural environment of the West, particularly of England, which now has broadened to cover the United States. The primary guiding principle of contemporary lexicography is ransack the entire body of written texts, or rather printed texts now (because your private writings such as notes on your fridge doors or shopping list on your phone don't count as printed simply because they are not published) for material.
Unfortunately, however, these well-meaning principles rather often get in the way of lexicography in less-written or less-printed languages than be the guide because if the lexicography of such lexicographically late languages is reduced to what is sourced in the printed (let's generously cover written in this case), it would be tantamount to linguists recognizing these languages in a form emaciated or reduced to their bones, which is unfair and outright wrong. The literature (to include everything that is written and printed) of no language has in it all words and idioms spoken by its native speakers, and the body of words and idioms found in written and printed texts is usually what is considered the standard version or dialect of that language. Moreover, the literature of a language that has recently taken to writing and more recently printing covers as much as a quarter of the language's "standard" dialect, let alone more.
It is essential here to note that the volume and frequency of writing, and also printing in recent times, matters particularly for these languages. Languages like Manipuri that are claimed to have had written for at least a millennium don't have a body of written and printed words and idioms large enough not to derail communication between the speakers of these languages if they were to discard all the words and idioms not found in this printed source. The entire speakers of Manipuri, except only a meager handful, were illiterate until only recently and written texts were of no practical use to them linguistically--the texts were not distributed and the texts did not influence the speakers of the language. For written texts to be meaningful linguistically to the contemporary speakers of the language in which the texts are written, the texts have to be available to the speakers and the speakers should necessarily be able to interact with them (meaning that they should at least be literate). In the case of Manipuri, the volume of written texts was too meager to be linguistically consequential to the speakers of Manipuri and the frequency with which texts were produced was unsustainable (it would not have been able to sustain language, at least the way only writing has sustained Sanskrit without many speakers). Manipur had its first printing press in the later part of the second decade of the twentieth century (it was very rarely used), saw journalism for the first time in the early 1920s (Meitei Chanu (1922) was hand-written and ran for only four or five issues, and I would not consider the very short-stint handwritten magazine Meitei Leima during 1917-18 linguistically significant contemporaneously--though historically important--not because it lasted only for some issues but because it was published outside Manipur and its impact in Manipur was only historically important), and saw its first printed newspaper in the early 1930s (Tongjam Gokulchandra's Dainik Manipur Patrika (23 March 1933)) and first monthly Lalit Manjuri Patrika edited by Arambam Dorendrajit (September 1933). These were only the beginning and the entire volume of Manipuri's written and printed text is not yet large enough for a fairly comprehensive dictionary to be sourced from.
This points to the unfortunate fact that the lexicography of a language with a small amount of written and printed texts cannot do justice to the language if it source its material only from written and printed texts. This brings us face to face with a whole new problem--if lexicography opens the door to speech, where do we draw the line to determine what to include and what to exclude. Is every distinct chunk of speech sounds in a meaningful order a word to be included in a dictionary? If yes, how old is the word? Are dashanihe, mei tai (probably the literal translation of the English hot), neiba, touraroige and pot kappa words? If these are words, isn't olong (humorous clipping of hoirong) a word? Where do you draw the line?
Since the lexicography of a language with a small volume of written and printed texts cannot be guided entirely by the established principles of lexicography, are followed by the makers of Oxford English Dictionary and Merriam-Webster Dictionary, there is a pressing need for a new set of principles to guide the practical lexicography of smaller languages. Building a body of principles would take time, and it cannot be fully arbitrary--we cannot form organizations to determine this for the entire language and impose their determinations on other organizations though there can be small houses with their own deliberations. Linguists should ideally collate and compare such practices and develop a general set of principles without committing the mistake of imposing. During this, there will be a lot of disagreements, which can get nasty, but that is not necessarily bad, and the speaking and writing community will spontaneously do their linguistic selection across time and, in time, there will emerge non-arbitrarily agreed-on best practices for a more evolved set of guiding principles to surface (which will also keep evolving).
Unfortunately, however, these well-meaning principles rather often get in the way of lexicography in less-written or less-printed languages than be the guide because if the lexicography of such lexicographically late languages is reduced to what is sourced in the printed (let's generously cover written in this case), it would be tantamount to linguists recognizing these languages in a form emaciated or reduced to their bones, which is unfair and outright wrong. The literature (to include everything that is written and printed) of no language has in it all words and idioms spoken by its native speakers, and the body of words and idioms found in written and printed texts is usually what is considered the standard version or dialect of that language. Moreover, the literature of a language that has recently taken to writing and more recently printing covers as much as a quarter of the language's "standard" dialect, let alone more.
It is essential here to note that the volume and frequency of writing, and also printing in recent times, matters particularly for these languages. Languages like Manipuri that are claimed to have had written for at least a millennium don't have a body of written and printed words and idioms large enough not to derail communication between the speakers of these languages if they were to discard all the words and idioms not found in this printed source. The entire speakers of Manipuri, except only a meager handful, were illiterate until only recently and written texts were of no practical use to them linguistically--the texts were not distributed and the texts did not influence the speakers of the language. For written texts to be meaningful linguistically to the contemporary speakers of the language in which the texts are written, the texts have to be available to the speakers and the speakers should necessarily be able to interact with them (meaning that they should at least be literate). In the case of Manipuri, the volume of written texts was too meager to be linguistically consequential to the speakers of Manipuri and the frequency with which texts were produced was unsustainable (it would not have been able to sustain language, at least the way only writing has sustained Sanskrit without many speakers). Manipur had its first printing press in the later part of the second decade of the twentieth century (it was very rarely used), saw journalism for the first time in the early 1920s (Meitei Chanu (1922) was hand-written and ran for only four or five issues, and I would not consider the very short-stint handwritten magazine Meitei Leima during 1917-18 linguistically significant contemporaneously--though historically important--not because it lasted only for some issues but because it was published outside Manipur and its impact in Manipur was only historically important), and saw its first printed newspaper in the early 1930s (Tongjam Gokulchandra's Dainik Manipur Patrika (23 March 1933)) and first monthly Lalit Manjuri Patrika edited by Arambam Dorendrajit (September 1933). These were only the beginning and the entire volume of Manipuri's written and printed text is not yet large enough for a fairly comprehensive dictionary to be sourced from.
This points to the unfortunate fact that the lexicography of a language with a small amount of written and printed texts cannot do justice to the language if it source its material only from written and printed texts. This brings us face to face with a whole new problem--if lexicography opens the door to speech, where do we draw the line to determine what to include and what to exclude. Is every distinct chunk of speech sounds in a meaningful order a word to be included in a dictionary? If yes, how old is the word? Are dashanihe, mei tai (probably the literal translation of the English hot), neiba, touraroige and pot kappa words? If these are words, isn't olong (humorous clipping of hoirong) a word? Where do you draw the line?
Since the lexicography of a language with a small volume of written and printed texts cannot be guided entirely by the established principles of lexicography, are followed by the makers of Oxford English Dictionary and Merriam-Webster Dictionary, there is a pressing need for a new set of principles to guide the practical lexicography of smaller languages. Building a body of principles would take time, and it cannot be fully arbitrary--we cannot form organizations to determine this for the entire language and impose their determinations on other organizations though there can be small houses with their own deliberations. Linguists should ideally collate and compare such practices and develop a general set of principles without committing the mistake of imposing. During this, there will be a lot of disagreements, which can get nasty, but that is not necessarily bad, and the speaking and writing community will spontaneously do their linguistic selection across time and, in time, there will emerge non-arbitrarily agreed-on best practices for a more evolved set of guiding principles to surface (which will also keep evolving).