A Framework that Uses the Web for Named Entity Class Identification: Case Study for Indian Classical Music Forums

Autores/as

  • Joe Cheri Ross Indian Institute of Technology Bombay
  • Aditya Joshi Indian Institute of Technology Bombay
  • Pushpak Bhattacharyya Indian Institute of Technology Bombay

DOI:

https://doi.org/10.13053/cys-20-3-2464

Palabras clave:

Named Entity Recognition, Named Entity Class Identification, Music Data

Resumen

Identification of named entity(NE) class (semantic class) is crucial for NLP problems like coreference resolution where semantic compatibility between the entity mentions is imperative to coreference decision. Short and noisy text containing the entity makes it challenging to extract the NE class of the entity through the context. We introduce a framework for named entity class identification for a given entity, using the web when the entity boundaries are known. The proposed framework will be beneficial for specialized domains where data and class label challenges exist. We demonstrate the benefit of our framework through a case study of Indian classical music forums. Apart from person and location included in standard semantic classes, here we also consider raga1, song, instrument and music concept. Our baseline approach follows a heuristic based method making use of Freebase, a structured web repository. The search engine based approaches acquire context from the web for an entity and perform named entity class identification. This approach shows improvement compared to baseline performance and it is further improved with the hierarchical classification introduced. In summary, our framework is a first-of-its-kind validation of viability of the web for NE class identification.

Biografía del autor/a

Joe Cheri Ross, Indian Institute of Technology Bombay

Is a PhD Student in the Department of Computer Science and Engineering. His primary area of research is music information retrieval. His current focus is on extracting information from music related text using natural language processing methods.

Aditya Joshi, Indian Institute of Technology Bombay

Is a PhD student at IITB-Monash Research Academy, a joint PhD program between Indian Institute of Technology Bombay, India and Monash University, Australia. His primary area of research is sentiment analysis.

Pushpak Bhattacharyya, Indian Institute of Technology Bombay

Is Vijay and Seeta Vashee Chair Professor at Indian Institute of Technology Bombay, and also the Director of Indian Institute of Technology Patna. With a research experience of over 25 years, he has conducted innovative research in several disciplines of NLP. He has also authored a book titled ‘Machine Translation’.

Descargas

Publicado

2016-09-30