As social media permeates our daily life, there has been a sharp rise in the use of social media to humiliate, bully, and threaten others, which has come with harmful consequences such as emotional distress, depression, and suicide. The October 2014 Pew Research survey shows that 73% of adult Internet users have observed online harassment and 40% have experienced it. The prevalence and serious consequences of online harassment present both social and technological challenges. This project identifies harassing messages in social media, through a combination of text analysis and the use of other clues in the social media (e.g., indications of power relationships between sender and receiver of a potentially harassing message.) The project will develop prototypes to detect harassing messages in Twitter; the proposed techniques can be adapted to other platforms, such as Facebook, online forums, and blogs. An interdisciplinary team of computer scientists, social scientists, urban and public affairs professionals, educators, and the participation of college and high schools students in the research will ensure wide impact of scientific research on the support for safe social interactions.
This project combines social science theory and human judgment of potential harassment examples from social media, in both school and workplace contexts, to operationalize the detection of harassing messages and offenders. It develops comprehensive and reliable context-aware techniques (using machine learning, text mining, natural language processing, and social network analysis) to glean information about the people involved and their interconnected network of relationships, and to determine and evaluate potential harassment and harassers. The key innovations of this work include: (1) identification of the generic language of insult, characterized by profanities and other general patterns of verbal abuse, and recognition of target-dependent offensive language involving sensitive topics that are personal to a specific individual or social circle; (2) prediction of harassment-specific emotion evoked in a recipient after reading messages by leveraging conversation history as well as sender's emotions; (3) recognition of a sender's malicious intent behind messages based on the aspects of power, truth (approximated by trust), and familiarity; (4) a harmfulness assessment of harassing messages by fusing aforementioned language, emotion, and intent factors; and (5) detection of harassers from their aggregated behaviors, such as harassment frequency, duration, and coverage measures, for effective prevention and intervention.
|