Named Entity Recognition on Code-Mixed Cross-Script Social Media Content
DOI:
https://doi.org/10.13053/cys-21-4-2850Keywords:
Named entity recognition, code-mixed cross-script, Bengali-English social media contentAbstract
Focusing on the current multilingual scenario in social media, this paper reports automatic extraction of named entities (NE) from code-mixed cross-script social media data. Our prime target is to extract NE for question answering. This paper also introduces a Bengali-English (Bn-En) code-mixedcross-script dataset for NE research and proposes domain specific taxonomies for NE. We used formal as well as informal language-specific features to prepare the classification models and employed four machine learning algorithms (Conditional Random Fields, Margin Infused Relaxed Algorithm, Support Vector Machine and Maximum Entropy Markov Model) for the NE recognition (NER) task. In this study, Bengali is considered as the native language while English is considered as the non-native language. However, the approach presented in this paper is generic in nature and could be used for any other code-mixed dataset. The classification models based on CRF and SVM performed well among the classifiers.Downloads
Published
2017-12-23
Issue
Section
Articles of the Thematic Issue
License
Hereby I transfer exclusively to the Journal "Computación y Sistemas", published by the Computing Research Center (CIC-IPN),the Copyright of the aforementioned paper. I also accept that these
rights will not be transferred to any other publication, in any other format, language or other existing means of developing.I certify that the paper has not been previously disclosed or simultaneously submitted to any other publication, and that it does not contain material whose publication would violate the Copyright or other proprietary rights of any person, company or institution. I certify that I have the permission from the institution or company where I work or study to publish this work.The representative author accepts the responsibility for the publicationof this paper on behalf of each and every one of the authors.
This transfer is subject to the following conditions:- The authors retain all ownership rights (such as patent rights) of this work, except for the publishing rights transferred to the CIC, through this document.
- Authors retain the right to publish the work in whole or in part in any book they are the authors or publishers. They can also make use of this work in conferences, courses, personal web pages, and so on.
- Authors may include working as part of his thesis, for non-profit distribution only.