Identification of POS Tags for the Khasi Language based on Brill’s Transformation Rule-Based Tagger
DOI:
https://doi.org/10.13053/cys-26-2-4058Keywords:
Natural language processing (NLP), computational linguistic, part-of-speech (PoS), PoS tagging, khasi language, khasi corpus, lexical morphology, transformation rule-based taggingAbstract
Khasi is a Mon-Khmer language that belongs to the Austro-Asiatic language family. Khasi language is spoken by the indigenous people of the state Meghalaya in the North-Eastern part of India. The main purposeof this paper is to develop Part-of-Speech (PoS) tagger for the Khasi language using a Rule-based approach. To work on POS tagging, one needs a grammatically tagged corpus. However, the Khasi language does not have a standard corpus for PoS tagging. Therefore, another aim or purpose of this paper is to develop a Khasi lexicon or POS corpus and using the Rule-Based Brill’s Transformation to automatically tag the given Khasi text. While anticipating the challenges in building such acorpus, this paper has brought out an analysis based on the Khasi corpus of around 1,03,998 words in its initial phase. We also show in this paper how the Khasi corpusis created. By using Brill’s Transformation rule-based learning on the created corpus in this preliminary study, accuracies of 97.73% and 95.52% were obtained on validating data and testing data respectively. This workis the first attempt to investigate POS tagging using the rule-based model with the designed Khasi POS corpus.Downloads
Published
2022-06-15
Issue
Section
Articles
License
Hereby I transfer exclusively to the Journal "Computación y Sistemas", published by the Computing Research Center (CIC-IPN),the Copyright of the aforementioned paper. I also accept that these
rights will not be transferred to any other publication, in any other format, language or other existing means of developing.I certify that the paper has not been previously disclosed or simultaneously submitted to any other publication, and that it does not contain material whose publication would violate the Copyright or other proprietary rights of any person, company or institution. I certify that I have the permission from the institution or company where I work or study to publish this work.The representative author accepts the responsibility for the publicationof this paper on behalf of each and every one of the authors.
This transfer is subject to the following conditions:- The authors retain all ownership rights (such as patent rights) of this work, except for the publishing rights transferred to the CIC, through this document.
- Authors retain the right to publish the work in whole or in part in any book they are the authors or publishers. They can also make use of this work in conferences, courses, personal web pages, and so on.
- Authors may include working as part of his thesis, for non-profit distribution only.