Zipf’s Law and Statistical Data on Modern Tibetan, pdf

Looking for translations, or for help with translations and transliterations? This is the place.
Post Reply
User avatar
kirtu
Former staff member
Posts: 6399
Joined: Mon Jan 18, 2010 5:29 pm
Location: Baltimore, MD

Zipf’s Law and Statistical Data on Modern Tibetan, pdf

Post by kirtu »

Zipf’s Law and Statistical Data on Modern Tibetan, pdf

A demonstration that modern Tibetan fits Zipf's Law.
In this paper, a large scale modern Tibetan text corpus is built, which includes about 190 thousands documents, 67.21 million words, 93.66 million syllables in total. Based on the corpus, statistics are made in several language units in different granularities. Statistical data show that
: a syllable has 3.26 letters or 2.20 super characters in average, while a sentence has 75.40 letters or 63.14 super characters. The top 10 super characters, syllables, words take up 66.3156%, 16.5556%, 24.6415% of the corpus respectively. Curves for the n-gram frequency-rank list of
super chars, syllables and words are plotted. It shows that when all the n-gram phrases for n = 1, 2, . . . , 5 are put together and sorted by frequency in descending order, the frequency-rank curves in log-log axes can be fitted well by a straight line for the unit of syllable and word respectively. But for the unit of super character, we didn’t find a curve that can be fitted well enough by a straight line even if we combine all the n-grams for n = 1, 2, . . . , 10.
Kirt's Tibetan Translation Notes

"Even if you practice only for an hour a day with faith and inspiration, good qualities will steadily increase. Regular practice makes it easy to transform your mind. From seeing only relative truth, you will eventually reach a profound certainty in the meaning of absolute truth."
Kyabje Dilgo Khyentse Rinpoche.

"Only you can make your mind beautiful."
HH Chetsang Rinpoche
Post Reply

Return to “Language”