Biblio
Facial expressions are one of the most powerful, natural and immediate means for human being to present their emotions and intensions. In this paper, we present a novel method for fully automatic facial expression recognition. The facial landmarks are detected for characterizing facial expressions. A graph convolutional neural network is proposed for feature extraction and facial expression recognition classification. The experiments were performed on the three facial expression databases. The result shows that the proposed FER method can achieve good recognition accuracy up to 95.85% using the proposed method.
Monitoring for security and well-being in highly populated areas is a critical issue for city administrators, policy makers and urban planners. As an essential part of many dynamic and critical data-driven tasks, situational awareness (SAW) provides decision-makers a deeper insight of the meaning of urban surveillance. Thus, surveillance measures are increasingly needed. However, traditional surveillance platforms are not scalable when more cameras are added to the network. In this work, a smart surveillance as an edge service has been proposed. To accomplish the object detection, identification, and tracking tasks at the edge-fog layers, two novel lightweight algorithms are proposed for detection and tracking respectively. A prototype has been built to validate the feasibility of the idea, and the test results are very encouraging.
Unilateral spatial neglect (USN) is a higher cognitive dysfunction that can occur after a stroke. It is defined as an impairment in finding, reporting, reacting to, and directing stimuli opposite the damaged side of the brain. We have proposed a system to identify neglected regions in USN patients in three dimensions using three-dimensional virtual reality. The objectives of this study are twofold: first, to propose a system for numerically identifying the neglected regions using an object detection task in a virtual space, and second, to compare the neglected regions during object detection when the patient's neck is immobilized (‘fixed-neck’ condition) versus when the neck can be freely moved to search (‘free-neck’ condition). We performed the test using an immersive virtual reality system, once with the patient's neck fixed and once with the patient's neck free to move. Comparing the results of the study in two patients, we found that the neglected areas were similar in the fixed-neck condition. However, in the free-neck condition, one patient's neglect improved while the other patient’s neglect worsened. These results suggest that exploratory ability affects the symptoms of USN and is crucial for clinical evaluation of USN patients.
In recent months, AI-synthesized face swapping videos referred to as deepfake have become an emerging problem. False video is becoming more and more difficult to distinguish, which brings a series of challenges to social security. Some scholars are devoted to studying how to improve the detection accuracy of deepfake video. At the same time, in order to conduct better research, some datasets for deepfake detection are made. Companies such as Google and Facebook have also spent huge sums of money to produce datasets for deepfake video detection, as well as holding deepfake detection competitions. The continuous advancement of video tampering technology and the improvement of video quality have also brought great challenges to deepfake detection. Some scholars have achieved certain results on existing datasets, while the results on some high-quality datasets are not as good as expected. In this paper, we propose new method with clustering-based embedding regularization for deepfake detection. We use open source algorithms to generate videos which can simulate distinctive artifacts in the deepfake videos. To improve the local smoothness of the representation space, we integrate a clustering-based embedding regularization term into the classification objective, so that the obtained model learns to resist adversarial examples. We evaluate our method on three latest deepfake datasets. Experimental results demonstrate the effectiveness of our method.
Human action recognition in video is one of the most widely applied topics in the field of image and video processing, with many applications in surveillance (security, sports, etc.), activity detection, video-content-based monitoring, man-machine interaction, and health/disability care. Action recognition is a complex process that faces several challenges such as occlusion, camera movement, viewpoint move, background clutter, and brightness variation. In this study, we propose a novel human action recognition method using convolutional neural networks (CNN) and deep bidirectional LSTM (DB-LSTM) networks, using only raw video frames. First, deep features are extracted from video frames using a pre-trained CNN architecture called ResNet152. The sequential information of the frames is then learned using the DB-LSTM network, where multiple layers are stacked together in both forward and backward passes of DB-LSTM, to increase depth. The evaluation results of the proposed method using PyTorch, compared to the state-of-the-art methods, show a considerable increase in the efficiency of action recognition on the UCF 101 dataset, reaching 95% recognition accuracy. The choice of the CNN architecture, proper tuning of input parameters, and techniques such as data augmentation contribute to the accuracy boost in this study.
A novel deep neural network is proposed, for accurate and robust crowd counting. Crowd counting is a complex task, as it strongly depends on the deployed camera characteristics and, above all, the scene perspective. Crowd counting is essential in security applications where Internet of Things (IoT) cameras are deployed to help with crowd management tasks. The complexity of a scene varies greatly, and a medium to large scale security system based on IoT cameras must cater for changes in perspective and how people appear from different vantage points. To address this, our deep architecture extracts multi-scale features with a pyramid contextual module to provide long-range contextual information and enlarge the receptive field. Experiments were run on three major crowd counting datasets, to test our proposed method. Results demonstrate our method supersedes the performance of state-of-the-art methods.
Video surveillance plays an important role in our times. It is a great help in reducing the crime rate, and it can also help to monitor the status of facilities. The performance of the video surveillance system is limited by human factors such as fatigue, time efficiency, and human resources. It would be beneficial for all if fully automatic video surveillance systems are employed to do the job. The automation of the video surveillance system is still not satisfying regarding many problems such as the accuracy of the detector, bandwidth consumption, storage usage, etc. This scientific paper mainly focuses on a video surveillance system using Convolutional Neural Networks (CNN), IoT and cloud. The system contains multi nods, each node consists of a microprocessor(Raspberry Pi) and a camera, the nodes communicate with each other using client and server architecture. The nodes can detect humans using a pretraining MobileNetv2-SSDLite model and Common Objects in Context(COCO) dataset, the captured video will stream to the main node(only one node will communicate with cloud) in order to stream the video to the cloud. Also, the main node will send an SMS notification to the security team to inform the detection of humans. The security team can check the videos captured using a mobile application or web application. Operating the Object detection model of Deep learning will be required a large amount of the computational power, for instance, the Raspberry Pi with a limited in performance for that reason we used the MobileNetv2-SSDLite model.
Object recognition with the help of outdoor video surveillance cameras is an important task in the context of ensuring the security at enterprises, public places and even private premises. There have long existed systems that allow detecting moving objects in the image sequence from a video surveillance system. Such a system is partially considered in this research. It detects moving objects using a background model, which has certain problems. Due to this some objects are missed or detected falsely. We propose to combine the moving objects detection results with the classification, using a deep neural network. This will allow determining whether a detected object belongs to a certain class, sorting out false detections, discarding the unnecessary ones (sometimes individual classes are unwanted), to divide detected people into the employees in the uniform and all others, etc. The authors perform a network training in the Keras developer-friendly environment that provides for quick building, changing and training of network architectures. The performance of the Keras integration into a video analysis system, using direct Python script execution techniques, is between 6 and 52 ms, while the precision is between 59.1% and 97.2% for different architectures. The integration, made by freezing a selected network architecture with weights, is selected after testing. After that, frozen architecture can be imported into video analysis using the TensorFlow interface for C++. The performance of such type of integration is between 3 and 49 ms. The precision is between 63.4% and 97.8% for different architectures.
The incidence of abnormal road traffic events, especially abnormal traffic congestion, is becoming more and more prominent in daily traffic management in China. It has become the main research work of urban traffic management to detect and identify traffic congestion incidents in time. Efficient and accurate detection of traffic congestion incidents can provide a good strategy for traffic management. At present, the detection and recognition of traffic congestion events mainly rely on the integration of road traffic flow data and the passing data collected by electronic police or devices of checkpoint, and then estimating and forecasting road conditions through the method of big data analysis; Such methods often have some disadvantages such as low time-effect, low precision and small prediction range. Therefore, with the help of the current large and medium cities in the public security, traffic police have built video surveillance equipment, through computer vision technology to analyze the traffic flow from video monitoring, in this paper, the motion state and the changing trend of vehicle flow are obtained by using the technology of vehicle detection from video and multi-target tracking based on deep learning, so as to realize the perception and recognition of traffic congestion. The method achieves the recognition accuracy of less than 60 seconds in real-time, more than 80% in detection rate of congestion event and more than 82.5% in accuracy of detection. At the same time, it breaks through the restriction of traditional big data prediction, such as traffic flow data, truck pass data and GPS floating car data, and enlarges the scene and scope of detection.
Video streams acquired from thermal cameras are proven to be beneficial in diverse number of fields including military, healthcare, law enforcement, and security. Despite the hype, thermal imaging is increasingly affected by poor resolution, where it has expensive optical sensors and inability to attain optical precision. In recent years, deep learning based super-resolution algorithms are developed to enhance the video frame resolution at high accuracy. This paper presents a comparative analysis of super resolution (SR) techniques based on deep neural networks (DNN) that are applied on thermal video dataset. SRCNN, EDSR, Auto-encoder, and SRGAN are also discussed and investigated. Further the results on benchmark thermal datasets including FLIR, OSU thermal pedestrian database and OSU color thermal database are evaluated and analyzed. Based on the experimental results, it is concluded that, SRGAN has delivered a superior performance on thermal frames when compared to other techniques and improvements, which has the ability to provide state-of-the art performance in real time operations.
This paper proposes a method for detecting anomalies in video data. A Variational Autoencoder (VAE) is used for reducing the dimensionality of video frames, generating latent space information that is comparable to low-dimensional sensory data (e.g., positioning, steering angle), making feasible the development of a consistent multi-modal architecture for autonomous vehicles. An Adapted Markov Jump Particle Filter defined by discrete and continuous inference levels is employed to predict the following frames and detecting anomalies in new video sequences. Our method is evaluated on different video scenarios where a semi-autonomous vehicle performs a set of tasks in a closed environment.
By analogy to nature, sight is the main integral component of robotic complexes, including unmanned vehicles. In this connection, one of the urgent tasks in the modern development of unmanned vehicles is the solution to the problem of providing security for new advanced systems, algorithms, methods, and principles of space navigation of robots. In the paper, we present an approach to the protection of machine vision systems based on technologies of deep learning. At the heart of the approach lies the “Feature Squeezing” method that works on the phase of model operation. It allows us to detect “adversarial” examples. Considering the urgency and importance of the target process, the features of unmanned vehicle hardware platforms and also the necessity of execution of tasks on detecting of the objects in real-time mode, it was offered to carry out an additional simple computational procedure of localization and classification of required objects in case of crossing a defined in advance threshold of “adversarial” object testing.
To bring a uniform development platform which seamlessly combines hardware components and software architecture of various developers across the globe and reduce the complexity in producing robots which help people in their daily ergonomics. ROS has come out to be a game changer. It is disappointing to see the lack of penetration of technology in different verticals which involve protection, defense and security. By leveraging the power of ROS in the field of robotic automation and computer vision, this research will pave path for identification of suspicious activity with autonomously moving bots which run on ROS. The research paper proposes and validates a flow where ROS and computer vision algorithms like YOLO can fall in sync with each other to provide smarter and accurate methods for indoor and limited outdoor patrolling. Identification of age,`gender, weapons and other elements which can disturb public harmony will be an integral part of the research and development process. The simulation and testing reflects the efficiency and speed of the designed software architecture.
The Robot Operating System (ROS) are being deployed for multiple life critical activities such as self-driving cars, drones, and industries. However, the security has been persistently neglected, especially the image flows incoming from camera robots. In this paper, we perform a structured security assessment of robot cameras using ROS. We points out a relevant number of security flaws that can be used to take over the flows incoming from the robot cameras. Furthermore, we propose an intrusion detection system to detect abnormal flows. Our defense approach is based on images comparisons and unsupervised anomaly detection method. We experiment our approach on robot cameras embedded on a self-driving car.