Biblio
In this paper, we provide insights towards achieving more robust automatic facial expression recognition in smart environments based on our benchmark with three labeled facial expression databases. These databases are selected to test for desktop, 3D and smart environment application scenarios. This work is meant to provide a neutral comparison and guidelines for developers and researchers interested to integrate facial emotion recognition technologies in their applications, understand its limitations and adaptation as well as enhancement strategies. We also introduce and compare three different metrics for finding the primary expression in a time window of a displayed emotion. In addition, we outline facial emotion recognition limitations and enhancements for smart environments and non-frontal setups. By providing our comparison and enhancements we hope to build a bridge from affective computing research and solution providers to application developers that like to enhance new applications by including emotion based user modeling.
Facial expression recognition (FER) for emotion inference has become one of the most important research fields in human-computer interaction. Existing study on FER mainly focuses on visible images, whereas varying lighting conditions may influence their performances. Recent studies have demonstrated the advantages of infrared thermal images reflecting the temperature distributions, which are robust to lighting changes. In this paper, a novel infrared image sequence based FER method is proposed using spatiotemporal feature analysis and deep Boltzmann machines (DBM). Firstly, a dense motion field among infrared image sequences is generated using optical flow algorithm. Then, PCA is applied for dimension reduction and a three-layer DBM structure is designed for final expression classification. Finally, the effectiveness of the proposed method is well demonstrated based on several experiments conducted on NVIE database.
When a complex scene such as rotation within a plane is encountered, the recognition rate of facial expressions will decrease much. A facial expression recognition algorithm based on CNN and LBP feature fusion is proposed in this paper. Firstly, according to the problem of the lack of feature expression ability of CNN in the process of expression recognition, a CNN model was designed. The model is composed of structural units that have two successive convolutional layers followed by a pool layer, which can improve the expressive ability of CNN. Then, the designed CNN model was used to extract the facial expression features, and local binary pattern (LBP) features with rotation invariance were fused. To a certain extent, it makes up for the lack of CNN sensitivity to in-plane rotation changes. The experimental results show that the proposed method improves the expression recognition rate under the condition of plane rotation to a certain extent and has better robustness.
Affective facial expression is a key feature of non-verbal behavior and is considered as a symptom of an internal emotional state. Emotion recognition plays an important role in social communication: human-human and also for human-robot interaction. This work aims at the development of a framework able to recognise human emotions through facial expression for human-robot interaction. Simple features based on facial landmarks distances and angles are extracted to feed a dynamic probabilistic classification framework. The public online dataset Karolinska Directed Emotional Faces (KDEF) [12] is used to learn seven different emotions (e.g. angry, fearful, disgusted, happy, sad, surprised, and neutral) performed by seventy subjects. Offline and on-the-fly tests were carried out: leave-one-out cross validation tests using the dataset and on-the-fly tests during human-robot interactions. Preliminary results show that the proposed framework can correctly recognise human facial expressions with potential to be used in human-robot interaction scenarios.
Fuzzy density is an important part of fuzzy integral, which is used to describe the reliability of classifiers in the process of fusion. Most of the fuzzy density assignment methods are based on the training priori knowledge of the classifier and ignore the difference of the testing samples themselves. To better describe the real-time reliability of the classifier in the fusion process, the dispersion of the classifier is calculated according to the decision information which outputted by the classifier. Then the divisibility of the classifier is obtained through the information entropy of the dispersion. Finally, the divisibility and the priori knowledge are combined to get the fuzzy density which can be dynamically adjusted. Experiments on JAFFE and CK databases show that, compared with traditional fuzzy integral methods, the proposed method can effectively improve the decision performance of fuzzy integral and reduce the interference of unreliable output information to decision. And it is an effective multi-classifier fusion method.
Facial expression recognition is a challenging problem in the field of computer vision. In this paper, we propose a deep learning approach that can learn the joint low-level and high-level features of human face to resolve this problem. Our deep neural networks utilize convolution and downsampling to extract the abstract and local features of human face, and reconstruct the raw input images to learn global features as supplementary information at the same time. We also add an adjustable weight in the networks when combining the two kinds of features for the final classification. The experimental results show that the proposed method can achieve good results, which has an average recognition accuracy of 93.65% on the test datasets.
A machine translation system that can convert South African Sign Language video to English audio or text and vice versa in real-time would be immensely beneficial to the Deaf and hard of hearing. Sign language gestures are characterised and expressed by five distinct parameters: hand location; hand orientation; hand shape; hand movement and facial expressions. The aim of this research is to recognise facial expressions and to compare the following feature descriptors: local binary patterns; compound local binary patterns and histogram of oriented gradients in two testing environments, a subset of the BU3D-FE dataset and the CK+ dataset. The overall accuracy, accuracy across facial expression classes, robustness to test subjects, and the ability to generalise of each feature descriptor within the context of automatic facial expression recognition are analysed as part of the comparison procedure. Overall, HOG proved to be a more robust feature descriptor to the LBP and CLBP. Furthermore, the CLBP can generally be considered to be superior to the LBP, but the LBP has greater potential in terms of its ability to generalise.
Recently, emotion recognition has gained increasing attention in various applications related to Social Signal Processing (SSP) and human affect. The existing research is mainly focused on six basic emotions (happy, sad, fear, disgust, angry, and surprise). However human expresses many kind of emotions, including mix emotion which has not been explored due to its complexity. We model 12 types of mix emotion recognition from facial expression in a sequence of images using two-stages learning which combines Support Vector Machines (SVM) and Conditional Random Fields (CRF) as sequence classifiers. SVM classifies each image frame and produce emotion label output, subsequently it becomes the input for CRF which yields the mix emotion label of the corresponding observation sequence. We evaluate our proposed model on modified image frames of Cohn Kanade+ dataset, and on our own made mix emotion dataset. We also compare our model with the original CRF model, and our model shows a superior performance result.
This paper outlines a demonstration of the work carried out in the SoCoRo project investigating how far a neuro-typical population recognises facial expressions on a non-naturalistic robot face that are designed to show approval and disapproval. RFID-tagged objects are presented to an Emys robot head (called Alyx) and Alyx reacts to each with a facial expression. Participants are asked to put the object in a box marked 'Like' or 'Dislike'. This study is being extended to include assessment of participants' Autism Quotient using a validated questionnaire as a step towards using a robot to help train high-functioning adults with an Autism Spectrum Disorder in social signal recognition.
Human face detection plays an essential role in the first stage of face processing applications. In this study, an enhanced face detection framework is proposed to improve detection rate based on skin color and provide a validation process. A preliminary segmentation of the input images based on skin color can significantly reduce search space and accelerate the process of human face detection. The primary detection is based on Haar-like features and the Adaboost algorithm. A validation process is introduced to reject non-face objects, which might occur during the face detection process. The validation process is based on two-stage Extended Local Binary Patterns. The experimental results on the CMU-MIT and Caltech 10000 datasets over a wide range of facial variations in different colors, positions, scales, and lighting conditions indicated a successful face detection rate.
Recognizing Families In the Wild (RFIW) is a large-scale, multi-track automatic kinship recognition evaluation, supporting both kinship verification and family classification on scales much larger than ever before. It was organized as a Data Challenge Workshop hosted in conjunction with ACM Multimedia 2017. This was achieved with the largest image collection that supports kin-based vision tasks. In the end, we use this manuscript to summarize evaluation protocols, progress made and some technical background and performance ratings of the algorithms used, and a discussion on promising directions for both research and engineers to be taken next in this line of work.
Face recognition has attained a greater importance in bio-metric authentication due to its non-intrusive property of identifying individuals at varying stand-off distance. Face recognition based on multi-spectral imaging has recently gained prime importance due to its ability to capture spatial and spectral information across the spectrum. Our first contribution in this paper is to use extended multi-spectral face recognition in two different age groups. The second contribution is to show empirically the performance of face recognition for two age groups. Thus, in this paper, we developed a multi-spectral imaging sensor to capture facial database for two different age groups (≤ 15years and ≥ 20years) at nine different spectral bands covering 530nm to 1000nm range. We then collected a new facial images corresponding to two different age groups comprises of 168 individuals. Extensive experimental evaluation is performed independently on two different age group databases using four different state-of-the-art face recognition algorithms. We evaluate the verification and identification rate across individual spectral bands and fused spectral band for two age groups. The obtained evaluation results shows higher recognition rate for age groups ≥ 20years than ≤ 15years, which indicates the variation in face recognition across the different age groups.
The face is the most dominant and distinct communication tool of human beings. Automatic analysis of facial behavior allows machines to understand and interpret a human's states and needs for natural interactions. This research focuses on developing advanced computer vision techniques to process and analyze facial images for the recognition of various facial behaviors. Specifically, this research consists of two parts: automatic facial landmark detection and tracking, and facial behavior analysis and recognition using the tracked facial landmark points. In the first part, we develop several facial landmark detection and tracking algorithms on facial images with varying conditions, such as varying facial expressions, head poses and facial occlusions. First, to handle facial expression and head pose variations, we introduce a hierarchical probabilistic face shape model and a discriminative deep face shape model to capture the spatial relationships among facial landmark points under different facial expressions and face poses to improve facial landmark detection. Second, to handle facial occlusion, we improve upon the effective cascade regression framework and propose the robust cascade regression framework for facial landmark detection, which iteratively predicts the landmark visibility probabilities and landmark locations. The second part of this research applies our facial landmark detection and tracking algorithms to facial behavior analysis, including facial action recognition and face pose estimation. For facial action recognition, we introduce a novel regression framework for joint facial landmark detection and facial action recognition. For head pose estimation, we are working on a robust algorithm that can perform head pose estimation under facial occlusion.
Recognition of facial expressions authenticity is quite troublesome for humans. Therefore, it is an interesting topic for the computer vision community, as the developed algorithms for facial expressions authenticity estimation may be used as indicators of deception. This paper discusses the state-of-the art methods developed for smile veracity estimation and proposes a plan of development and validation of a novel approach to automated discrimination between genuine and posed facial expressions. The proposed fully automated technique is based on the extension of the high-dimensional Local Binary Patterns (LBP) to the spatio-temporal domain and combines them with the dynamics of facial landmarks movements. The proposed technique will be validated on several existing smile databases and a novel database created with the use of a high speed camera. Finally, the developed framework will be applied for the detection of deception in real life scenarios.
Automatic face recognition techniques applied on particular group or mass database introduces error cases. Error prevention is crucial for the court. Reranking of recognition results based on anthropology analysis can significant improve the accuracy of automatic methods. Previous studies focused on manual facial comparison. This paper proposed a weighted facial similarity computing method based on morphological analysis of components characteristics. Search sequence of face recognition reranked according to similarity, while the interference terms can be removed. Within this research project, standardized photographs, surveillance videos, 3D face images, identity card photographs of 241 male subjects from China were acquired. Sequencing results were modified by modeling selected individual features from the DMV altas. The improved method raises the accuracy of face recognition through anthroposophic or morphologic theory.
Face is crucial for human identity, while face identification has become crucial to information security. It is important to understand and work with the problems and challenges for all different aspects of facial feature extraction and face identification. In this tutorial, we identify and discuss four research challenges in current Face Detection/Recognition research and related research areas: (1) Unavoidable Facial Feature Alterations, (2) Voluntary Facial Feature Alterations, (3) Uncontrolled Environments, and (4) Accuracy Control on Large-scale Dataset. We also direct several different applications (spin-offs) of facial feature studies in the tutorial.
Augmented reality is poised to become a dominant computing paradigm over the next decade. With promises of three-dimensional graphics and interactive interfaces, augmented reality experiences will rival the very best science fiction novels. This breakthrough also brings in unique challenges on how users can authenticate one another to share rich content between augmented reality headsets. Traditional authentication protocols fall short when there is no common central entity or when access to the central authentication server is not available or desirable. Looks Good To Me (LGTM) is an authentication protocol that leverages the unique hardware and context provided with augmented reality headsets to bring innate human trust mechanisms into the digital world to solve authentication in a usable and secure way. LGTM works over point to point wireless communication so users can authenticate one another in a variety of circumstances and is designed with usability at its core, requiring users to perform only two actions: one to initiate and one to confirm. Users intuitively authenticate one another, using seemingly only each other's faces, but under the hood LGTM uses a combination of facial recognition and wireless localization to bootstrap trust from a wireless signal, to a location, to a face, for secure and usable authentication.
Heterogeneous face recognition aims to identify or verify person identity by matching facial images of different modalities. In practice, it is known that its performance is highly influenced by modality inconsistency, appearance occlusions, illumination variations and expressions. In this paper, a new method named as ensemble of sparse cross-modal metrics is proposed for tackling these challenging issues. In particular, a weak sparse cross-modal metric learning method is firstly developed to measure distances between samples of two modalities. It learns to adjust rank-one cross-modal metrics to satisfy two sets of triplet based cross-modal distance constraints in a compact form. Meanwhile, a group based feature selection is performed to enforce that features in the same position of two modalities are selected simultaneously. By neglecting features that attribute to "noise" in the face regions (eye glasses, expressions and so on), the performance of learned weak metrics can be markedly improved. Finally, an ensemble framework is incorporated to combine the results of differently learned sparse metrics into a strong one. Extensive experiments on various face datasets demonstrate the benefit of such feature selection especially when heavy occlusions exist. The proposed ensemble metric learning has been shown superiority over several state-of-the-art methods in heterogeneous face recognition.
Machine learning is enabling a myriad innovations, including new algorithms for cancer diagnosis and self-driving cars. The broad use of machine learning makes it important to understand the extent to which machine-learning algorithms are subject to attack, particularly when used in applications where physical security or safety is at risk. In this paper, we focus on facial biometric systems, which are widely used in surveillance and access control. We define and investigate a novel class of attacks: attacks that are physically realizable and inconspicuous, and allow an attacker to evade recognition or impersonate another individual. We develop a systematic method to automatically generate such attacks, which are realized through printing a pair of eyeglass frames. When worn by the attacker whose image is supplied to a state-of-the-art face-recognition algorithm, the eyeglasses allow her to evade being recognized or to impersonate another individual. Our investigation focuses on white-box face-recognition systems, but we also demonstrate how similar techniques can be used in black-box scenarios, as well as to avoid face detection.
In the past three years, Emotion Recognition in the Wild (EmotiW) Grand Challenge has drawn more and more attention due to its huge potential applications. In the fourth challenge, aimed at the task of video based emotion recognition, we propose a multi-clue emotion fusion (MCEF) framework by modeling human emotion from three mutually complementary sources, facial appearance texture, facial action, and audio. To extract high-level emotion features from sequential face images, we employ a CNN-RNN architecture, where face image from each frame is first fed into the fine-tuned VGG-Face network to extract face feature, and then the features of all frames are sequentially traversed in a bidirectional RNN so as to capture dynamic changes of facial textures. To attain more accurate facial actions, a facial landmark trajectory model is proposed to explicitly learn emotion variations of facial components. Further, audio signals are also modeled in a CNN framework by extracting low-level energy features from segmented audio clips and then stacking them as an image-like map. Finally, we fuse the results generated from three clues to boost the performance of emotion recognition. Our proposed MCEF achieves an overall accuracy of 56.66% with a large improvement of 16.19% with respect to the baseline.
We present the largest kinship recognition dataset to date, Families in the Wild (FIW). Motivated by the lack of a single, unified dataset for kinship recognition, we aim to provide a dataset that captivates the interest of the research community. With only a small team, we were able to collect, organize, and label over 10,000 family photos of 1,000 families with our annotation tool designed to mark complex hierarchical relationships and local label information in a quick and efficient manner. We include several benchmarks for two image-based tasks, kinship verification and family recognition. For this, we incorporate several visual features and metric learning methods as baselines. Also, we demonstrate that a pre-trained Convolutional Neural Network (CNN) as an off-the-shelf feature extractor outperforms the other feature types. Then, results were further boosted by fine-tuning two deep CNNs on FIW data: (1) for kinship verification, a triplet loss function was learned on top of the network of pre-train weights; (2) for family recognition, a family-specific softmax classifier was added to the network.
Automation systems are gaining popularity around the world. The use of these powerful technologies for home security has been proposed and some systems have been developed. Other implementations see the user taking a central role in providing and receiving updates to the system. We propose a system making use of an Android based smartphone as the user control point. Our Android application allows for dual factor (facial and secret pin) based authentication in order to protect the privacy of the user. The system successfully implements facial recognition on the limited resources of a smartphone by making use of the Eigenfaces algorithm. The system we created was designed for home automation but makes use of technologies that allow it to be applied within any environment. This opens the possibility for more research into dual factor authentication and the architecture of our system provides a blue print for the implementation of home based automation systems. This system with minimal modifications can be applied within an industrial application.
- « first
- ‹ previous
- 1
- 2
- 3