Large-scale Affective Content Analysis: Combining Media Content Features and Facial Reactions
Title | Large-scale Affective Content Analysis: Combining Media Content Features and Facial Reactions |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | McDuff, D., Soleymani, M. |
Conference Name | 2017 12th IEEE International Conference on Automatic Face Gesture Recognition (FG 2017) |
Date Published | May |
Publisher | IEEE |
ISBN Number | 978-1-5090-4023-0 |
Keywords | AU-2, automated facial action measurements, Automated Response Actions, composability, content management, deep visual-sentiment descriptors, encoding, face recognition, facial reactions, facial responses, feature extraction, image classification, large-scale affective content analysis, Media, media clips, media content features, multimodal fusion model, pubcrawl, Resiliency, sentiment analysis, Software, tagging, Videos, visualization |
Abstract | We present a novel multimodal fusion model for affective content analysis, combining visual, audio and deep visual-sentiment descriptors from the media content with automated facial action measurements from naturalistic responses to the media. We collected a dataset of 48,867 facial responses to 384 media clips and extracted a rich feature set from the facial responses and media content. The stimulus videos were validated to be informative, inspiring, persuasive, sentimental or amusing. By combining the features, we were able to obtain a classification accuracy of 63% (weighted F1-score: 0.62) for a five-class task. This was a significant improvement over using the media content features alone. By analyzing the feature sets independently, we found that states of informed and persuaded were difficult to differentiate from facial responses alone due to the presence of similar sets of action units in each state (AU 2 occurring frequently in both cases). Facial actions were beneficial in differentiating between amused and informed states whereas media content features alone performed less well due to similarities in the visual and audio make up of the content. We highlight examples of content and reactions from each class. This is the first affective content analysis based on reactions of 10,000s of people. |
URL | https://ieeexplore.ieee.org/document/7961761/ |
DOI | 10.1109/FG.2017.49 |
Citation Key | mcduff_large-scale_2017 |
- large-scale affective content analysis
- visualization
- Videos
- tagging
- Software
- sentiment analysis
- Resiliency
- pubcrawl
- multimodal fusion model
- media content features
- media clips
- Media
- AU-2
- image classification
- feature extraction
- facial responses
- facial reactions
- face recognition
- encoding
- deep visual-sentiment descriptors
- content management
- composability
- Automated Response Actions
- automated facial action measurements