Mining a Trending Topic: U.S. Immigration on the Context of Social Media

Esteban Castillo, Ofelia Cervantes

Abstract


This paper presents a text mining approach for extracting valuable patterns from social media documents in the contextof U.S. immigration. The paper points out the uncovering of statistical features alongside linguistic elements based on graph techniques. The use of graphs provide rich data structures for representing lexical and syntactic aspects of texts, allowing the discovery of complex patterns that used by experts could provide valuable insight. The proposed method is applied over a Twitter-Reddit dataset that comprise English and Spanish language samples from 2016 up to 2019. Experimental results showed that our interpretation of classic statistic techniques provide a baseline understanding of the topic while a more robust analysis (graphs) permits to uncover/predict hidden patterns over large amount of samples. In particular, the use of a co-occurrence graph helped to obtain relevant words, phrases and sentences while a user-interaction graph allow to detect important users, communities and interactions among themselves.

Keywords


Text mining; Statistics; Graph mining; Social network analysis; Natural language processing; Big data

Full Text: PDF