![]() Invitation Archives |
|
Parsing Bengali Text - an Intelligent Approach
1. IntroductionGoutam Kumar Saha Scientist-F, Centre for Development of Advanced Computing, (CDAC), Kolkata Mail to: CA- 2 / 4 B, Baguiati, Deshbandhu Nagar, Kolkata 700059, INDIA sahagk@gmail.com
Syntax deals with the structure of natural language sentences. A parser produces a structural description on applying a given computational grammar on the input sentence. In general, a parser is expected to identify structural and thematic relations in the sentence. The assignment of structural description to words depends upon the grammar that is, a description language and a set of structural constraints. A parser attempts to analyze the sequence of symbols presented to it based on the grammar [1,2,3,4]. Intelligent parsing technique through semantic analysis using tagged lexicons and word's major class (on ontology) proves to be helpful for more accurate machine translation. Human language like Bangla is very rich in inflections, vibhakties (suffix) and karakas, and often they are ambiguous also. That is why Bangla parsing task becomes very difficult. At the same time, it is not easy to provide necessary semantic, pragmatic and world knowledge that we humans often use while we parse and understand various Bangla sentences. Bangla consists of total eighty-nine part-of-speech tags. Bangla grammatical structure generally follows the structure: subject-object-verb (S-O-V) structure. We also get useful POS information from various inflections at morphological parsing. We describe here an intelligent parsing algorithm for Bangla sentences that relies on semantic analysis (through Ontology and rules) of a Bangla sentence during the task of parsing. Bangla consists of eighty-nine parts-of-speech tags [6]. Intelligent parsing is very important for machine translation [5,7] also.
2. Intelligent Approach to Bengali Parsing
Again in Bangla language often we do not use verb. For an another example, the parsing of the Bangla sentence (with missing verb say denoted by E) RAM BHALO (good) CHHELE (boy) (in English, Ram is a good boy) is shown below.
3. Intelligent Bangla Parsing Algorithm The following steps explain how the proposed intelligent parser works.
Step 1. Start Parsing Step 3. Start annotating POS to a Bangla word Step 4. If a word is not in Lexicon, assign it as Noun-Proper. Step 5. If POS-Ambiguity, resolve it by Bangla grammar rules. Step 6. If word-sense-ambiguity, apply N-gram model to assign correct word-sense. (Resolving word-sense ambiguity using Bi or Tri gram i.e., scanning 2-3 surrounding words for knowing an ambiguous word by its surrounding words.) Step 7. End While Step 8. Repeat step 2 through 7 until the end of sentences in text. Step 9. End
4. Conclusion
References 2. D. Yarowsky, "Decision list for lexical ambiguity resolution: Application to accent restoration in Spanish & French," Proc. 32nd Annual meeting of the ACL, Las Cruce, NM, 1994. 3. Robert C. Berwick, et al., Principle Based Parsing: Computation and Psycholinguistics, Kluwer Academic, Boston, 1991. 4. D. Jurafsky and J. Martin, Speech and Language Processing, Pearson, 2003. 5. Goutam Kumar Saha, "English to Bangla Translator: The BANGANUBAD," International Journal-CPOL, Vol. 18 (4), pp.281-290, USA, 2005. 6. Goutam Kumar Saha, et al, "Computer Assisted Bangla Words POS Tagging," Proc. International Symposium on Machine Translation NLP and TSS (iSTRANS-2004), New Delhi, pp 248-251, 2004. 7. Goutam Kumar Saha, "The E2B Machine Translation: A New Approach to HLT," ACM Ubiguity, Vol. 6(32), 2005, ACM Press, USA.
Author's Biography Source: Ubiquity Volume 7, Issue 13 (April 4, 2006 - April 10, 2006) www.acm.org/ubiquity [Home] [About Ubiquity] [The Editors]
|