Abstract—This paper identifies some issues in English Language sentences which are interpreted by Hindi speakers. Sentences may seem grammatically correct but since they may not have equivalent constructs in Hindi Language, it may be difficult for NLP processes to interpret as correctly as human mind. This gap of knowledge transfer from a language to another by NLP processes would need additional knowledge base. Often, NLP systems need to use such knowledge base either as rule base or empirical formulations identified out of statistical methods on large set of bilingual corpus. Bilingual parallel corpus, though essential, is not easily available. Grammar mapping of a language to another is also difficult. The structures in a sentence which may not have proper mapping can be viewed as noise. 1000 unique English Language sentences from a 460000 word corpus were identified as representative sentences. These sentences were translated manually as well as using Machine Translation System. The outputs were compared to find out most common issues wherein MT did not interpret as correctly as human being. This misinterpretation by NLP system has been marked as noise. This paper identifies ten categories of such noises.
Index Terms—NLP processes, knowledge base, bilingual corpus, grammar mapping, noise, machine translation, recursive transition networks (RTN), finite state transducers (FST).
The authors are with the Linguistics Dept., Lucknow University, Lucknow, India (e-mail: seemashukla@jssaten.ac.in).
[PDF]
Cite:Seema Shukla and Usha Sinha, "Noise Issues in Sentence Structure for Morphological Analysis of English Language Sentences for Hindi Language Users," International Journal of Languages, Literature and Linguistics vol. 1, no. 1, pp. 56-59, 2015.