Biblio
Human emotion recognition plays a vital role in interpersonal communication and human-machine interaction domain. Emotions are expressed through speech, hand gestures and by the movements of other body parts and through facial expression. Facial emotions are one of the most important factors in human communication that help us to understand, what the other person is trying to communicate. People understand only one-third of the message verbally, and two-third of it is through non-verbal means. There are many face emotion recognition (FER) systems present right now, but in real-life scenarios, they do not perform efficiently. Though there are many which claim to be a near-perfect system and to achieve the results in favourable and optimal conditions. The wide variety of expressions shown by people and the diversity in facial features of different people will not aid in the process of coming up with a system that is definite in nature. Hence developing a reliable system without any flaws showed by the existing systems is a challenging task. This paper aims to build an enhanced system that can analyse the exact facial expression of a user at that particular time and generate the corresponding emotion. Datasets like JAFFE and FER2013 were used for performance analysis. Pre-processing methods like facial landmark and HOG were incorporated into a convolutional neural network (CNN), and this has achieved good accuracy when compared with the already existing models.
We investigate a deep learning model for action recognition that simultaneously extracts spatio-temporal information from a raw RGB input data. The proposed multiple spatio-temporal scales recurrent neural network (MSTRNN) model is derived by combining multiple timescale recurrent dynamics with a conventional convolutional neural network model. The architecture of the proposed model imposes both spatial and temporal constraints simultaneously on its neural activities. The constraints vary, with multiple scales in different layers. As suggested by the principle of upward and downward causation, it is assumed that the network can develop a functional hierarchy using its constraints during training. To evaluate and observe the characteristics of the proposed model, we use three human action datasets consisting of different primitive actions and different compositionality levels. The performance capabilities of the MSTRNN model on these datasets are compared with those of other representative deep learning models used in the field. The results show that the MSTRNN outperforms baseline models while using fewer parameters. The characteristics of the proposed model are observed by analyzing its internal representation properties. The analysis clarifies how the spatio-temporal constraints of the MSTRNN model aid in how it extracts critical spatio-temporal information relevant to its given tasks.
Automatic emotion recognition using computer vision is significant for many real-world applications like photojournalism, virtual reality, sign language recognition, and Human Robot Interaction (HRI) etc., Psychological research findings advocate that humans depend on the collective visual conduits of face and body to comprehend human emotional behaviour. Plethora of studies have been done to analyse human emotions using facial expressions, EEG signals and speech etc., Most of the work done was based on single modality. Our objective is to efficiently integrate emotions recognized from facial expressions and upper body pose of humans using images. Our work on bimodal emotion recognition provides the benefits of the accuracy of both the modalities.
Over the past few years, virtual and mixed reality systems have evolved significantly yielding high immersive experiences. Most of the metaphors used for interaction with the virtual environment do not provide the same meaningful feedback, to which the users are used to in the real world. This paper proposes a cyber-glove to improve the immersive sensation and the degree of embodiment in virtual and mixed reality interaction tasks. In particular, we are proposing a cyber-glove system that tracks wrist movements, hand orientation and finger movements. It provides a decoupled position of the wrist and hand, which can contribute to a better embodiment in interaction and manipulation tasks. Additionally, the detection of the curvature of the fingers aims to improve the proprioceptive perception of the grasping/releasing gestures more consistent to visual feedback. The cyber-glove system is being developed for VR applications related to real estate promotion, where users have to go through divisions of the house and interact with objects and furniture. This work aims to assess if glove-based systems can contribute to a higher sense of immersion, embodiment and usability when compared to standard VR hand controller devices (typically button-based). Twenty-two participants tested the cyber-glove system against the HTC Vive controller in a 3D manipulation task, specifically the opening of a virtual door. Metric results showed that 83% of the users performed faster door pushes, and described shorter paths with their hands wearing the cyber-glove. Subjective results showed that all participants rated the cyber-glove based interactions as equally or more natural, and 90% of users experienced an equal or a significant increase in the sense of embodiment.
Mixed reality (MR) technologies are widely used in distributed collaborative learning scenarios and have made learning and training more flexible and intuitive. However, there are many challenges in the use of MR due to the difficulty in creating a physical presence, particularly when a physical task is being performed collaboratively. We therefore developed a novel MR system to overcomes these limitations and enhance the distributed collaboration user experience. The primary objective of this paper is to explore the potential of a MR-based hand gestures system to enhance the conceptual architecture of MR in terms of both visualization and interaction in distributed collaboration. We propose a synchronous prototype named MRCollab as an immersive collaborative approach that allows two or more users to communicate with a peer based on the integration of several technologies such as video, audio, and hand gestures.
Channel state information (CSI) has been recently shown to be useful in performing security attacks in public WiFi environments. By analyzing how CSI is affected by the finger motions, CSI-based attacks can effectively reconstruct text-based passwords and locking patterns. This paper presents WiGuard, a novel system to protect sensitive on-screen gestures in a public place. Our approach carefully exploits the WiFi channel interference to introduce noise into the attacker's CSI measurement to reduce the success rate of the attack. Our approach automatically detects when a CSI-based attack happens. We evaluate our approach by applying it to protect text-based passwords and pattern locks on mobile devices. Experimental results show that our approach is able to reduce the success rate of CSI attacks from 92% to 42% for text-based passwords and from 82% to 22% for pattern lock.
This paper proposes a context-aware, graph-based approach for identifying anomalous user activities via user profile analysis, which obtains a group of users maximally similar among themselves as well as to the query during test time. The main challenges for the anomaly detection task are: (1) rare occurrences of anomalies making it difficult for exhaustive identification with reasonable false-alarm rate, and (2) continuously evolving new context-dependent anomaly types making it difficult to synthesize the activities apriori. Our proposed query-adaptive graph-based optimization approach, solvable using maximum flow algorithm, is designed to fully utilize both mutual similarities among the user models and their respective similarities with the query to shortlist the user profiles for a more reliable aggregated detection. Each user activity is represented using inputs from several multi-modal resources, which helps to localize anomalies from time-dependent data efficiently. Experiments on public datasets of insider threats and gesture recognition show impressive results.