Concept Discovery through Information Extraction in Restaurant Domain

Authors

  • Nadeesha Pathirana University of Moratuwa, Faculty of Engineering, Department of Computer Science and Engineering
  • Sandaru Seneviratne University of Moratuwa, Faculty of Engineering, Department of Computer Science and Engineering
  • Rangika Samarawickrama University of Moratuwa, Faculty of Engineering, Department of Computer Science and Engineering
  • Shane Wolff University of Moratuwa, Faculty of Engineering, Department of Computer Science and Engineering
  • Charith Chitraranjan University of Moratuwa, Faculty of Engineering, Department of Computer Science and Engineering
  • Uthayasanker Thayasivam University of Moratuwa, Faculty of Engineering, Department of Computer Science and Engineering
  • Tharindu Ranasinghe CodeGen International, Trace Expert City

DOI:

https://doi.org/10.13053/cys-23-3-3277

Keywords:

Word embedding, word2vec, gloVe, hierarchical clustering

Abstract

Concept identification is a crucial step in understanding and building a knowledge base for any particular domain. However, it is not a simple task in very large domains such as restaurants and hotel. In this paper, a novel approach of identifying a concept hierarchy and classifying unseen words into identified concepts related to restaurant domain is presented. Sorting, identifying, classifying of domain-related words manually is tedious and therefore, the proposed process is automated to a great extent. Word embedding, hierarchical clustering, classification algorithms are effectively used to obtain concepts related to the restaurant domain. Further, this approach can also be extended to create a semi-automatic ontology on restaurant domain.

Downloads

Published

2019-09-25