Biblio
This paper explores the benefits of 3D face modeling for in-the-wild facial expression recognition (FER). Since there is limited in-the-wild 3D FER dataset, we first construct 3D facial data from available 2D dataset using recent advances in 3D face reconstruction. The 3D facial geometry representation is then extracted by deep learning technique. In addition, we also take advantage of manipulating the 3D face, such as using 2D projected images of 3D face as additional input for FER. These features are then fused with that of 2D FER typical network. By doing so, despite using common approaches, we achieve a competent recognition accuracy on Real-World Affective Faces (RAF) database and Static Facial Expressions in the Wild (SFEW 2.0) compared with the state-of-the-art reports. To the best of our knowledge, this is the first time such a deep learning combination of 3D and 2D facial modalities is presented in the context of in-the-wild FER.
Every so often Humans utilize non-verbal gestures (e.g. facial expressions) to express certain information or emotions. Moreover, countless face gestures are expressed throughout the day because of the capabilities possessed by humans. However, the channels of these expression/emotions can be through activities, postures, behaviors & facial expressions. Extensive research unveiled that there exists a strong relationship between the channels and emotions which has to be further investigated. An Automatic Facial Expression Recognition (AFER) framework has been proposed in this work that can predict or anticipate seven universal expressions. In order to evaluate the proposed approach, Frontal face Image Database also named as Japanese Female Facial Expression (JAFFE) is opted as input. This database is further processed with a frequency domain technique known as Discrete Cosine transform (DCT) and then classified using Artificial Neural Networks (ANN). So as to check the robustness of this novel strategy, the random trial of K-fold cross validation, leave one out and person independent methods is repeated many times to provide an overview of recognition rates. The experimental results demonstrate a promising performance of this application.
In this paper, a merged convolution neural network (MCNN) is proposed to improve the accuracy and robustness of real-time facial expression recognition (FER). Although there are many ways to improve the performance of facial expression recognition, a revamp of the training framework and image preprocessing renders better results in applications. When the camera is capturing images at high speed, however, changes in image characteristics may occur at certain moments due to the influence of light and other factors. Such changes can result in incorrect recognition of human facial expression. To solve this problem, we propose a statistical method for recognition results obtained from previous images, instead of using the current recognition output. Experimental results show that the proposed method can satisfactorily recognize seven basic facial expressions in real time.
Automatic emotion recognition using computer vision is significant for many real-world applications like photojournalism, virtual reality, sign language recognition, and Human Robot Interaction (HRI) etc., Psychological research findings advocate that humans depend on the collective visual conduits of face and body to comprehend human emotional behaviour. Plethora of studies have been done to analyse human emotions using facial expressions, EEG signals and speech etc., Most of the work done was based on single modality. Our objective is to efficiently integrate emotions recognized from facial expressions and upper body pose of humans using images. Our work on bimodal emotion recognition provides the benefits of the accuracy of both the modalities.
Deep learning based facial expression recognition (FER) has received a lot of attention in the past few years. Most of the existing deep learning based FER methods do not consider domain knowledge well, which thereby fail to extract representative features. In this work, we propose a novel FER framework, named Facial Motion Prior Networks (FMPN). Particularly, we introduce an addition branch to generate a facial mask so as to focus on facial muscle moving regions. To guide the facial mask learning, we propose to incorporate prior domain knowledge by using the average differences between neutral faces and the corresponding expressive faces as the training guidance. Extensive experiments on three facial expression benchmark datasets demonstrate the effectiveness of the proposed method, compared with the state-of-the-art approaches.
In the past few years, there has been increasing interest in the perception of human expressions and mental states by machines, and Facial Expression Recognition (FER) has attracted increasing attention. Facial Action Unit (AU) is an early proposed method to describe facial muscle movements, which can effectively reflect the changes in people's facial expressions. In this paper, we propose a high-performance facial expression recognition method based on facial action unit, which can run on low-configuration computer and realize video and real-time camera FER. Our method is mainly divided into two parts. In the first part, 68 facial landmarks and image Histograms of Oriented Gradients (HOG) are obtained, and the feature values of action units are calculated accordingly. The second part uses three classification methods to realize the mapping from AUs to FER. We have conducted many experiments on the popular human FER benchmark datasets (CK+ and Oulu CASIA) to demonstrate the effectiveness of our method.
Building natural and conversational virtual humans is a task of formidable complexity. We believe that, especially when building agents that affectively interact with biological humans in real-time, a cognitive science-based, multilayered sensing and artificial intelligence (AI) systems approach is needed. For this demo, we show a working version (through human interaction with it) our modular system of natural, conversation 3D virtual human using AI or sensing layers. These including sensing the human user via facial emotion recognition, voice stress, semantic meaning of the words, eye gaze, heart rate, and galvanic skin response. These inputs are combined with AI sensing and recognition of the environment using deep learning natural language captioning or dense captioning. These are all processed by our AI avatar system allowing for an affective and empathetic conversation using an NLP topic-based dialogue capable of using facial expressions, gestures, breath, eye gaze and voice language-based two-way back and forth conversations with a sensed human. Our lab has been building these systems in stages over the years.
Affective facial expression is a key feature of non-verbal behavior and is considered as a symptom of an internal emotional state. Emotion recognition plays an important role in social communication: human-human and also for human-robot interaction. This work aims at the development of a framework able to recognise human emotions through facial expression for human-robot interaction. Simple features based on facial landmarks distances and angles are extracted to feed a dynamic probabilistic classification framework. The public online dataset Karolinska Directed Emotional Faces (KDEF) [12] is used to learn seven different emotions (e.g. angry, fearful, disgusted, happy, sad, surprised, and neutral) performed by seventy subjects. Offline and on-the-fly tests were carried out: leave-one-out cross validation tests using the dataset and on-the-fly tests during human-robot interactions. Preliminary results show that the proposed framework can correctly recognise human facial expressions with potential to be used in human-robot interaction scenarios.
In this proposed method, the traditional elevators are upgraded in such a way that any alarming situation in the elevator can be detected and then sent to a main center where further action can be taken accordingly. Different emergency situation can be handled by implementing the system. Smart elevator system works by installing different modules inside the elevator such as speed sensors which will detect speed variations occurring above or below a certain threshold of elevator speed. The smart elevator system installed within the elevator sends a message to the emergency response center and sends an automated call as well. The smart system also includes an emotion detection algorithm which will detect emotions of the individual based on their expression in the elevator. The smart system also has a whisper detection system as well to know if someone stuck inside the elevator is alive during any hazardous situation. A broadcast signal is used as a check in the elevator system to evaluate if every part of the system is in stable state. Proposed system can completely replace the current elevator systems and become part of smart homes.
This work presents a novel method to estimate natural expressed emotions in speech through binary acoustic modeling. Standard acoustic features are mapped to a binary value representation and a support vector regression model is used to correlate them with the three-continuous emotional dimensions. Three different sets of speech features, two based on spectral parameters and one on prosody are compared on the VAM corpus, a set of spontaneous dialogues from a German TV talk-show. The regression analysis, in terms of correlation coefficient and mean absolute error, show that the binary key modeling is able to successfully capture speaker emotion characteristics. The proposed algorithm obtains comparable results to those reported on the literature while it relies on a much smaller set of acoustic descriptors. Furthermore, we also report on preliminary results based on the combination of the binary models, which brings further performance improvements.