SNEIT: Salient Named Entity Identification in Tweets

Priya Radhakrishnan, Ganesh Jawahar, Manish Gupta, Vasudeva Varma

Abstract


Social media is a rich source of information and opinion, with exponential data growth rate. However social media posts are difficult to analyze since they are brief, un structured and noisy. Interestingly, many social media posts are about an entity or entities. Understanding which entity is central (Salient Entity) toa post, helps better analyze the post. In this paper we propose a model that aids in such analysis by identifying the Salient Entity in a social media post, tweets inparticular. We present a supervised machine-learning model, to identify Salient Entity in a tweet and propose that the tweet is most likely about that particular entity. We have used the premise that, when an image accompanies a text, the text most likely is about theentity in that image, to build a dataset of tweets and salient entities. We trained our model using this dataset. Note that this does not restrict the applicability of our model in any way. We use tweets with images only to obtain objective ground truth data, while features for the model are derived from tweet text. Our experiments show that the model identifies Salient Named Entity with an F-measure of 0.63. We show the effective ness of the proposed model for tweet-filtering and salience identification tasks. We have made the human annotated dataset and the source code of this model publicly available.

Keywords


Entity salience, named entity recognition, semantic search, named entity extraction

Full Text: PDF