Biblio
Emotions are a powerful tool in communication and one way that humans show their emotions is through their facial expressions. One of the challenging and powerful tasks in social communications is facial expression recognition, as in non-verbal communication, facial expressions are key. In the field of Artificial Intelligence, Facial Expression Recognition (FER) is an active research area, with several recent studies using Convolutional Neural Networks (CNNs). In this paper, we demonstrate the classification of FER based on static images, using CNNs, without requiring any pre-processing or feature extraction tasks. The paper also illustrates techniques to improve future accuracy in this area by using pre-processing, which includes face detection and illumination correction. Feature extraction is used to extract the most prominent parts of the face, including the jaw, mouth, eyes, nose, and eyebrows. Furthermore, we also discuss the literature review and present our CNN architecture, and the challenges of using max-pooling and dropout, which eventually aided in better performance. We obtained a test accuracy of 61.7% on FER2013 in a seven-classes classification task compared to 75.2% in state-of-the-art classification.
In this study, it was aimed to recognize the emotional state from facial images using the deep learning method. In the study, which was approved by the ethics committee, a custom data set was created using videos taken from 20 male and 20 female participants while simulating 7 different facial expressions (happy, sad, surprised, angry, disgusted, scared, and neutral). Firstly, obtained videos were divided into image frames, and then face images were segmented using the Haar library from image frames. The size of the custom data set obtained after the image preprocessing is more than 25 thousand images. The proposed convolutional neural network (CNN) architecture which is mimics of LeNet architecture has been trained with this custom dataset. According to the proposed CNN architecture experiment results, the training loss was found as 0.0115, the training accuracy was found as 99.62%, the validation loss was 0.0109, and the validation accuracy was 99.71%.
Human action recognition in video is one of the most widely applied topics in the field of image and video processing, with many applications in surveillance (security, sports, etc.), activity detection, video-content-based monitoring, man-machine interaction, and health/disability care. Action recognition is a complex process that faces several challenges such as occlusion, camera movement, viewpoint move, background clutter, and brightness variation. In this study, we propose a novel human action recognition method using convolutional neural networks (CNN) and deep bidirectional LSTM (DB-LSTM) networks, using only raw video frames. First, deep features are extracted from video frames using a pre-trained CNN architecture called ResNet152. The sequential information of the frames is then learned using the DB-LSTM network, where multiple layers are stacked together in both forward and backward passes of DB-LSTM, to increase depth. The evaluation results of the proposed method using PyTorch, compared to the state-of-the-art methods, show a considerable increase in the efficiency of action recognition on the UCF 101 dataset, reaching 95% recognition accuracy. The choice of the CNN architecture, proper tuning of input parameters, and techniques such as data augmentation contribute to the accuracy boost in this study.