Online opinions now play a pivotal role in decision making and influence a wide spectrum of our lives. Choices of restaurants at which to dine, places to stay, universities to attend, books to read, doctors to consult, and even political candidates to vote for, are largely influenced by crowdsourced opinions. However, it is estimated that up to 30% of reviews on websites are fake. As a larger part of the US economy is becoming driven by social opinions, it poses a serious risk to the general public (e.g., by getting mislead to invest on low quality products, services or doctors). The Federal Trade Commission Opinion may soon consider online fraud as unlawful and a legal offense. Detecting fake online opinions is an urgent research area. Otherwise, online social media might continue to progress undetected. This project aims to develop novel deception detection algorithms in order to identify fraudulent behavior. It synergistically integrates techniques from computational linguistics, behavioral modeling and statistical machine learning in order to advance knowledge in this area.
The project consists of a four-pronged research effort: 1) novel methods to learn deception classifiers from large-scale noisy crowd data and small-scale domain expert coded data, 2) unsupervised models that treat "spamicity" of reviewers as latent with observed behavioral footprints, 3) a relational architecture for jointly modeling reviews, reviewers, and their linguistic and behavioral patterns leveraging inherent reinforcement relations, and 4) an ensemble scoring mechanism blending cues from of all approaches, and an end-to-end validation framework. The techniques developed in the project can (1) reduce the marketing, consumer, and economic risk in e-commerce; (2) improve user profiling, detecting online harassment, bigotry, trolls, and other social media fraud that are of major relevance to national security; and (3) transition techniques developed to courses/tutorials and attract underrepresented students, including minorities and women. The result is a suite of novel, principled, and scalable techniques to filter opinion spam at large scale.
|