IJLLL 2018 Vol.4(4): 272-275 ISSN: 2382-6282
DOI: 10.18178/IJLLL.2018.4.4.186

Higher Order Character Frequency Distribution in Modern Chinese Texts: Application of Zipf's Law

Si Xiaolian
Abstract—To investigate the distribution of Chinese characters used in modern Chinese written texts, the higher order character frequency distribution of the Selected Works of Deng Xiaoping and Ordinary World was researched using Zipf's law. The results show that higher order frequency characters in modern Chinese written texts are consistent with Zipf’s law; however, there are a significant number of low-frequency characters. The higher order character frequency distributions are satisfactorily consistent with Zipf’s law. Most of the coefficients of determination (R2) of the fitted straight lines are greater than 0.9, indicating excellent goodness of fit. Character frequency and higher order character frequency distribution patterns have important significance for establishing statistics-based computational language models for modern Chinese.

Index Terms—Zipf's law, character frequency, higher order character frequency, Chinese texts.

Si Xiaolian is with the College of Chinese language and Literature, Northwest Normal University, Lanzhou, Gansu 730070, PR China (e-mail: sixiaolian1979@163.com).

[PDF]

Cite:Si Xiaolian, "Higher Order Character Frequency Distribution in Modern Chinese Texts: Application of Zipf's Law," International Journal of Languages, Literature and Linguistics vol. 4, no. 4, pp. 272-275, 2018.

Copyright©2008-2017. International Journal of Languages, Literature and Linguistics. All rights reserved.
E-mail: ijlll@ejournal.net