Mining Ethnic Content Online with Additively Regularized Topic Models

Autores/as

  • Murat Apishev Moscow State University - Yandex
  • Sergei Koltcov National Research University Higher School of Economics
  • Olessia Koltsova National Research University Higher School of Economics
  • Sergey Nikolenko National Research University Higher School of Economics - Steklov Institute of Mathematics at St. Petersburg
  • Konstantin Vorontsov Yandex - Moscow Institute of Physics and Technology

DOI:

https://doi.org/10.13053/cys-20-3-2473

Palabras clave:

Topic modeling, additive regularization of topic models, computational social science

Resumen

Social studies of the Internet have adopted large-scale text mining for unsupervised discovery of topics related to specific subjects. A recently developed approach to topic modeling, additive regularization of topic models (ARTM), provides fast inference and more control over the topics with a wide variety of possible regularizers than developing LDA extensions. We apply ARTM to mining ethnic-related content from Russian-language blogosphere, introduce a new combined regularizer, and compare models derived from ARTM with LDA. We show with human evaluations that ARTM is better for mining topics on specific subjects, finding more relevant topics of higher or comparable quality.

Biografía del autor/a

Murat Apishev, Moscow State University - Yandex

Is an M.Sc. student at the Moscow State University and Junior Developer at the Search Department, Yandex, Moscow, Russia. He received his B.Sc. degree from the Moscow State University at 2015. His research interests include machine learning, parallel algorithms, and topic modeling.

Sergei Koltcov, National Research University Higher School of Economics

Is the Deputy Director of the Laboratory for Internet Studies and the Associate Professor at the Department of Applied Mathematics and Computer Science at the National Research University Higher School of Economics, St.Petersburg. He received his Ph.D. in physics from the Institute for Analytical Instrumentation of the Russian Academy of Science at St.Petersburg in 2000. His research interests include mathematical modeling in various fields: topic modeling, sentiment analysis, electronic/ionic optics, mass spectrometry, gas dynamics, and statistical physics.

Olessia Koltsova, National Research University Higher School of Economics

Is the Director of the Laboratory for Internet Studies and Associate Professor at the Department of Sociology at the National University Higher School of Economics, St. Petersbugr. As an academic commited to interdicplinary data driven research, she leads various collective projects in the sphere of Internet and society, as well as in methods of large-scale automatic internet data analysis for social science. In recent years, she has published on online community structure, user content topical composition and sentiment, relation of internet to protests, electoral preferences, entrepreneurial success, and other topics. She is also the author of News Media and Power in Russia, Routledge, 2006.

Sergey Nikolenko, National Research University Higher School of Economics - Steklov Institute of Mathematics at St. Petersburg

Is a Senior Researcher at the Laboratory for Internet Studies, National Research University Higher School of Economics, and Laboratory of Mathematical Logic at the Steklov Institute of Mathematics at St. Petersburg. He received his M.Sc. summa cum laude from St. Petersburg State University at 2005 and Ph.D. from the Steklov Institute of Mathematics at St. Petersburg at 2009. His research interests include networking algorithms and systems, machine learning and probabilistic inference, bioinformatics, and theoretical computer science.

Konstantin Vorontsov, Yandex - Moscow Institute of Physics and Technology

Is the Head of Intelligent Systems Department at the Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, Professor at the Moscow Institute of Physics and Technology (State University), and Professor of the Russian Academy of Sciences. He received his Sc.D. from the Computing Center of RAS at 2010. His research interests include machine learning, information retrieval, generalization bounds, topic modeling, and exploratory search.

Descargas

Publicado

2016-09-30