Large-scale Affective Content Analysis: Combining Media Content Features and Facial Reactions

Submitted by grigby1 on Tue, 01/23/2018 - 4:02pm

Title	Large-scale Affective Content Analysis: Combining Media Content Features and Facial Reactions
Publication Type	Conference Paper
Year of Publication	2017
Authors	McDuff, D., Soleymani, M.
Conference Name	2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017)
Date Published	May
Publisher	IEEE
ISBN Number	978-1-5090-4023-0
Keywords	AU-2, automated facial action measurements, Automated Response Actions, composability, content management, deep visual-sentiment descriptors, encoding, face recognition, facial reactions, facial responses, feature extraction, image classification, large-scale affective content analysis, Media, media clips, media content features, multimodal fusion model, pubcrawl, Resiliency, sentiment analysis, Software, tagging, Videos, visualization
Abstract	We present a novel multimodal fusion model for affective content analysis, combining visual, audio and deep visual-sentiment descriptors from the media content with automated facial action measurements from naturalistic responses to the media. We collected a dataset of 48,867 facial responses to 384 media clips and extracted a rich feature set from the facial responses and media content. The stimulus videos were validated to be informative, inspiring, persuasive, sentimental or amusing. By combining the features, we were able to obtain a classification accuracy of 63% (weighted F1-score: 0.62) for a five-class task. This was a significant improvement over using the media content features alone. By analyzing the feature sets independently, we found that states of informed and persuaded were difficult to differentiate from facial responses alone due to the presence of similar sets of action units in each state (AU 2 occurring frequently in both cases). Facial actions were beneficial in differentiating between amused and informed states whereas media content features alone performed less well due to similarities in the visual and audio make up of the content. We highlight examples of content and reactions from each class. This is the first affective content analysis based on reactions of 10,000s of people.
URL	https://ieeexplore.ieee.org/document/7961761/
DOI	10.1109/FG.2017.49
Citation Key	mcduff_large-scale_2017

Groups:

Science of Security VO