Spotting Fake Reviews using Positive-Unlabeled Learning

Authors

  • Huayi Li University of Illinois at Chicago
  • Bing Liu University of Illinois at Chicago
  • Arjun Mukherjee University of Houston
  • Jidong Shao Dianping Inc., Shanghai

DOI:

https://doi.org/10.13053/cys-18-3-2035

Keywords:

Fake reviews, Positive-Unlabeled learning, PU-learning.

Abstract

Fake review detection has been studied byresearchers for several years. However, so far all reportedstudies are based on English reviews. This paperreports a study of detecting fake reviews in Chinese. Ourreview dataset is from the Chinese review hosting site Dianping1,which has built a fake review detection system.They are confident that their algorithm has a very highprecision, but they don’t know the recall. This meansthat all fake reviews detected by the system are almostcertainly fake but the remaining reviews may not be allgenuine. This paper first reports a supervised learningstudy of two classes, fake and unknown. However, sincethe unknown set may contain many fake reviews, it ismore appropriate to treat it as an unlabeled set. Thiscalls for the model of learning from positive and unlabeledexamples (or PU-learning). Experimental resultsshow that PU learning not only outperforms supervisedlearning significantly, but also detects a large number ofpotentially fake reviews hidden in the unlabeled set that Dianping fails to detect.

Author Biographies

Huayi Li, University of Illinois at Chicago

is a direct PhD student at the Universityof Illinois at Chicago with Professor Bing Liu. Hereceived his B.S degree of Computer Science fromNanjing Normal University, China. Previously hehadinternships in Analysis and Experimentation inMicrosoft, Map Analysis Center of Excellence inNokia HERE Map and Chinese Academy of Sciencewhere he did various big data projects rangingfrom sentiment analysis, text mining, machinelearning and A/B testing. Now he is also a researchspecialist in Health Media Collaboratory in UIC.His Ph.D. thesis is about collective classification inheterogeneous networks and topic modeling

Bing Liu, University of Illinois at Chicago

is a professor of Computer Science at theUniversity of Illinois at Chicago (UIC). He receivedhis PhD in Artificial Intelligence from the Universityof Edinburgh. Before joining UIC, he was a facultymember at the National University of Singapore.His current research interests include sentimentanalysis and opinion mining, data mining, machinelearning, and natural language processing (NLP).He has published extensively in top conferencesand journals. He is also the author of two books:Sentiment Analysis and Opinion Mining (Morganand Claypool) and Web Data Mining: ExploringHyperlinks, Contents and Usage Data (Springer).In addition to research impacts, his work has alsomade important social impacts. Some of his workhas been widely reported in the press, includinga front-page article in The New York Times. Onprofessional services, Liu has served as programchairs of many leading data mining related conferencesof ACM, IEEE, and SIAM: KDD, ICDM,CIKM, WSDM, SDM, and PAKDD, as associate editorsof several leading data mining journals, e.g.,TKDE, TWEB, DMKD, and as area/track chairs orsenior technical committee members of numerousNLP, data mining, and Web technology conferences.He currently also serves as the Chair ofACM SIGKDD, and is an IEEE Fellow.

Arjun Mukherjee, University of Houston

is currently an Assistant Professorat the University of Houston. He obtainedhis Ph.D. from the University of Illinois at Chicago.He has been an intern fellow at Microsoft Researchand Indian Statistical Institute. He has beensupported various scholarships such as DeansScholar, Chancellors Fellow and Provost and DeissFellow. His research spans several areas such asBayesian inference, statistical data mining and naturallanguage processing, machine learning, andsocial and information sciences with a particularemphasis on solving big-data problems in socialmedia and the Web. His works have addresseda wide variety of social media problems including(1) modeling trust, reputation, opinion spam, deceptionand user behaviors (e.g., collusion, socialinteractions,burstiness, etc.); (2) fine-grained latentvariable modeling of sentiments expressed inonline communications such as debates, reviews,and comments; and (3) market prediction and financialmodeling using social sentiments. His works have been published in leading publicationvenues in Computer Science like KDD, WWW,ACL, EMNLP, CIKM, IJCAI, AAAI-ICWSM.

Jidong Shao, Dianping Inc., Shanghai

is currently the director of the CredibilityGroup at Dianping.com, which mainly focuseson anti-spam, anti-fraud and business rating. Hereceived his Ph.D. from the Zhejiang University,China. Before joining Dianping, he was a seniordata mining expert at Alibaba.com. His mainresearch interests include data mining, machinelearning and their applications in internet business.

Downloads

Published

2014-09-29