Biblio

List
Filter

Found 15 results

Filters: Keyword is image motion analysis [Clear All Filters]

2021-03-29

Zhou, J., Zhang, X., Liu, Y., Lan, X.. 2020. Facial Expression Recognition Using Spatial-Temporal Semantic Graph Network. 2020 IEEE International Conference on Image Processing (ICIP). :1961—1965.

Motions of facial components convey significant information of facial expressions. Although remarkable advancement has been made, the dynamic of facial topology has not been fully exploited. In this paper, a novel facial expression recognition (FER) algorithm called Spatial Temporal Semantic Graph Network (STSGN) is proposed to automatically learn spatial and temporal patterns through end-to-end feature learning from facial topology structure. The proposed algorithm not only has greater discriminative power to capture the dynamic patterns of facial expression and stronger generalization capability to handle different variations but also higher interpretability. Experimental evaluation on two popular datasets, CK+ and Oulu-CASIA, shows that our algorithm achieves more competitive results than other state-of-the-art methods.

2021-01-11

Mihanpour, A., Rashti, M. J., Alavi, S. E.. 2020. Human Action Recognition in Video Using DB-LSTM and ResNet. 2020 6th International Conference on Web Research (ICWR). :133—138.

Human action recognition in video is one of the most widely applied topics in the field of image and video processing, with many applications in surveillance (security, sports, etc.), activity detection, video-content-based monitoring, man-machine interaction, and health/disability care. Action recognition is a complex process that faces several challenges such as occlusion, camera movement, viewpoint move, background clutter, and brightness variation. In this study, we propose a novel human action recognition method using convolutional neural networks (CNN) and deep bidirectional LSTM (DB-LSTM) networks, using only raw video frames. First, deep features are extracted from video frames using a pre-trained CNN architecture called ResNet152. The sequential information of the frames is then learned using the DB-LSTM network, where multiple layers are stacked together in both forward and backward passes of DB-LSTM, to increase depth. The evaluation results of the proposed method using PyTorch, compared to the state-of-the-art methods, show a considerable increase in the efficiency of action recognition on the UCF 101 dataset, reaching 95% recognition accuracy. The choice of the CNN architecture, proper tuning of input parameters, and techniques such as data augmentation contribute to the accuracy boost in this study.

Fomin, I., Burin, V., Bakhshiev, A.. 2020. Research on Neural Networks Integration for Object Classification in Video Analysis Systems. 2020 International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM). :1—5.

Object recognition with the help of outdoor video surveillance cameras is an important task in the context of ensuring the security at enterprises, public places and even private premises. There have long existed systems that allow detecting moving objects in the image sequence from a video surveillance system. Such a system is partially considered in this research. It detects moving objects using a background model, which has certain problems. Due to this some objects are missed or detected falsely. We propose to combine the moving objects detection results with the classification, using a deep neural network. This will allow determining whether a detected object belongs to a certain class, sorting out false detections, discarding the unnecessary ones (sometimes individual classes are unwanted), to divide detected people into the employees in the uniform and all others, etc. The authors perform a network training in the Keras developer-friendly environment that provides for quick building, changing and training of network architectures. The performance of the Keras integration into a video analysis system, using direct Python script execution techniques, is between 6 and 52 ms, while the precision is between 59.1% and 97.2% for different architectures. The integration, made by freezing a selected network architecture with weights, is selected after testing. After that, frozen architecture can be imported into video analysis using the TensorFlow interface for C++. The performance of such type of integration is between 3 and 49 ms. The precision is between 63.4% and 97.8% for different architectures.

Liu, X., Gao, W., Feng, D., Gao, X.. 2020. Abnormal Traffic Congestion Recognition Based on Video Analysis. 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). :39—42.

The incidence of abnormal road traffic events, especially abnormal traffic congestion, is becoming more and more prominent in daily traffic management in China. It has become the main research work of urban traffic management to detect and identify traffic congestion incidents in time. Efficient and accurate detection of traffic congestion incidents can provide a good strategy for traffic management. At present, the detection and recognition of traffic congestion events mainly rely on the integration of road traffic flow data and the passing data collected by electronic police or devices of checkpoint, and then estimating and forecasting road conditions through the method of big data analysis; Such methods often have some disadvantages such as low time-effect, low precision and small prediction range. Therefore, with the help of the current large and medium cities in the public security, traffic police have built video surveillance equipment, through computer vision technology to analyze the traffic flow from video monitoring, in this paper, the motion state and the changing trend of vehicle flow are obtained by using the technology of vehicle detection from video and multi-target tracking based on deep learning, so as to realize the perception and recognition of traffic congestion. The method achieves the recognition accuracy of less than 60 seconds in real-time, more than 80% in detection rate of congestion event and more than 82.5% in accuracy of detection. At the same time, it breaks through the restriction of traditional big data prediction, such as traffic flow data, truck pass data and GPS floating car data, and enlarges the scene and scope of detection.

2020-10-05

Lee, Haanvid, Jung, Minju, Tani, Jun. 2018. Recognition of Visually Perceived Compositional Human Actions by Multiple Spatio-Temporal Scales Recurrent Neural Networks. IEEE Transactions on Cognitive and Developmental Systems. 10:1058—1069.

We investigate a deep learning model for action recognition that simultaneously extracts spatio-temporal information from a raw RGB input data. The proposed multiple spatio-temporal scales recurrent neural network (MSTRNN) model is derived by combining multiple timescale recurrent dynamics with a conventional convolutional neural network model. The architecture of the proposed model imposes both spatial and temporal constraints simultaneously on its neural activities. The constraints vary, with multiple scales in different layers. As suggested by the principle of upward and downward causation, it is assumed that the network can develop a functional hierarchy using its constraints during training. To evaluate and observe the characteristics of the proposed model, we use three human action datasets consisting of different primitive actions and different compositionality levels. The performance capabilities of the MSTRNN model on these datasets are compared with those of other representative deep learning models used in the field. The results show that the MSTRNN outperforms baseline models while using fewer parameters. The characteristics of the proposed model are observed by analyzing its internal representation properties. The analysis clarifies how the spatio-temporal constraints of the MSTRNN model aid in how it extracts critical spatio-temporal information relevant to its given tasks.

2020-07-24

Lv, Weijie, Bai, Ruifeng, Sun, Xueqiang. 2019. Image Encryption Algorithm Based on Hyper-chaotic Lorenz Map and Compressed Sensing Theory. 2019 Chinese Control Conference (CCC). :3405—3410.

The motion process of multi-dimensional chaotic system is complex and variable, the randomness of motion state is stronger, and the motion state is more unpredictable within a certain range. This feature of multi-dimensional chaotic system can effectively improve the security performance of digital image encryption algorithm. In this paper, the hyper-chaotic Lorenz map is used to design the encryption sequence to improve the random performance of the encryption sequence, thus optimizing the performance of the digital image encryption algorithm. In this paper, the chaotic sequence is used to randomly select the row vector of the Hadamard matrix to form the Hadamard matrix to determine the measurement matrix, which simplifies the computational difficulty of the algorithm and solves the problem of the discontinuity of the key space in the random matrix design.

2020-07-03

Feng, Ri-Chen, Lin, Daw-Tung, Chen, Ken-Min, Lin, Yi-Yao, Liu, Chin-De. 2019. Improving Deep Learning by Incorporating Semi-automatic Moving Object Annotation and Filtering for Vision-based Vehicle Detection*. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). :2484—2489.

Deep learning has undergone tremendous advancements in computer vision studies. The training of deep learning neural networks depends on a considerable amount of ground truth datasets. However, labeling ground truth data is a labor-intensive task, particularly for large-volume video analytics applications such as video surveillance and vehicles detection for autonomous driving. This paper presents a rapid and accurate method for associative searching in big image data obtained from security monitoring systems. We developed a semi-automatic moving object annotation method for improving deep learning models. The proposed method comprises three stages, namely automatic foreground object extraction, object annotation in subsequent video frames, and dataset construction using human-in-the-loop quick selection. Furthermore, the proposed method expedites dataset collection and ground truth annotation processes. In contrast to data augmentation and data generative models, the proposed method produces a large amount of real data, which may facilitate training results and avoid adverse effects engendered by artifactual data. We applied the constructed annotation dataset to train a deep learning you-only-look-once (YOLO) model to perform vehicle detection on street intersection surveillance videos. Experimental results demonstrated that the accurate detection performance was improved from a mean average precision (mAP) of 83.99 to 88.03.

2020-04-13

Kim, Dongchil, Kim, Kyoungman, Park, Sungjoo. 2019. Automatic PTZ Camera Control Based on Deep-Q Network in Video Surveillance System. 2019 International Conference on Electronics, Information, and Communication (ICEIC). :1–3.

Recently, Pan/Tilt/Zoom (PTZ) camera has been widely used in video surveillance systems. However, it is difficult to automatically control PTZ cameras according to moving objects in the surveillance area. This paper proposes an automatic camera control method based on a Deep-Q Network (DQN) for improving the recognition accuracy of anomaly actions in the video surveillance system. To generate PTZ camera control values, the proposed method uses the position and size information of the object which received from the video analysis system. Through implementation results, the proposed method can automatically control the PTZ camera according to moving objects.

2020-03-09

Zhai, Liming, Wang, Lina, Ren, Yanzhen. 2019. Multi-domain Embedding Strategies for Video Steganography by Combining Partition Modes and Motion Vectors. 2019 IEEE International Conference on Multimedia and Expo (ICME). :1402–1407.

Digital video has various types of entities, which are utilized as embedding domains to hide messages in steganography. However, nearly all video steganography uses only one type of embedding domain, resulting in limited embedding capacity and potential security risks. In this paper, we firstly propose to embed in multi-domains for video steganography by combining partition modes (PMs) and motion vectors (MVs). The multi-domain embedding (MDE) aims to spread the modifications to different embedding domains for achieving higher undetectability. The key issue of MDE is the interactions of entities across domains. To this end, we design two MDE strategies, which hide data in PM domain and MV domain by sequential embedding and simultaneous embedding respectively. These two strategies can be applied to existing steganography within a distortion-minimization framework. Experiments show that the MDE strategies achieve a significant improvement in security performance against targeted steganalysis and fusion based steganalysis.

2020-02-10

Selvi J., Anitha Gnana, kalavathy G., Maria. 2019. Probing Image and Video Steganography Based On Discrete Wavelet and Discrete Cosine Transform. 2019 Fifth International Conference on Science Technology Engineering and Mathematics (ICONSTEM). 1:21–24.

Now-a-days, video steganography has developed for a secured communication among various users. The two important factor of steganography method are embedding potency and embedding payload. Here, a Multiple Object Tracking (MOT) algorithmic programs used to detect motion object, also shows foreground mask. Discrete wavelet Transform (DWT) and Discrete Cosine Transform (DCT) are used for message embedding and extraction stage. In existing system Least significant bit method was proposed. This technique of hiding data may lose some data after some file transformation. The suggested Multiple object tracking algorithm increases embedding and extraction speed, also protects secret message against various attackers.

2015-05-04

Lin Chen, Lu Zhou, Chunxue Liu, Quan Sun, Xiaobo Lu. 2014. Occlusive vehicle tracking via processing blocks in Markov random field. Progress in Informatics and Computing (PIC), 2014 International Conference on. :294-298.

The technology of vehicle video detecting and tracking has been playing an important role in the ITS (Intelligent Transportation Systems) field during recent years. The occlusion phenomenon among vehicles is one of the most difficult problems related to vehicle tracking. In order to handle occlusion, this paper proposes an effective solution that applied Markov Random Field (MRF) to the traffic images. The contour of the vehicle is firstly detected by using background subtraction, then numbers of blocks with vehicle's texture and motion information are filled inside each vehicle. We extract several kinds of information of each block to process the following tracking. As for each occlusive block two groups of clique functions in MRF model are defined, which represents spatial correlation and motion coherence respectively. By calculating each occlusive block's total energy function, we finally solve the attribution problem of occlusive blocks. The experimental results show that our method can handle occlusion problems effectively and track each vehicle continuously.

2015-05-01

Shuai Yi, Xiaogang Wang. 2014. Profiling stationary crowd groups. Multimedia and Expo (ICME), 2014 IEEE International Conference on. :1-6.

Detecting stationary crowd groups and analyzing their behaviors have important applications in crowd video surveillance, but have rarely been studied. The contributions of this paper are in two aspects. First, a stationary crowd detection algorithm is proposed to estimate the stationary time of foreground pixels. It employs spatial-temporal filtering and motion filtering in order to be robust to noise caused by occlusions and crowd clutters. Second, in order to characterize the emergence and dispersal processes of stationary crowds and their behaviors during the stationary periods, three attributes are proposed for quantitative analysis. These attributes are recognized with a set of proposed crowd descriptors which extract visual features from the results of stationary crowd detection. The effectiveness of the proposed algorithms is shown through experiments on a benchmark dataset.

Hammoud, R.I., Sahin, C.S., Blasch, E.P., Rhodes, B.J.. 2014. Multi-source Multi-modal Activity Recognition in Aerial Video Surveillance. Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on. :237-244.

Recognizing activities in wide aerial/overhead imagery remains a challenging problem due in part to low-resolution video and cluttered scenes with a large number of moving objects. In the context of this research, we deal with two un-synchronized data sources collected in real-world operating scenarios: full-motion videos (FMV) and analyst call-outs (ACO) in the form of chat messages (voice-to-text) made by a human watching the streamed FMV from an aerial platform. We present a multi-source multi-modal activity/event recognition system for surveillance applications, consisting of: (1) detecting and tracking multiple dynamic targets from a moving platform, (2) representing FMV target tracks and chat messages as graphs of attributes, (3) associating FMV tracks and chat messages using a probabilistic graph-based matching approach, and (4) detecting spatial-temporal activity boundaries. We also present an activity pattern learning framework which uses the multi-source associated data as training to index a large archive of FMV videos. Finally, we describe a multi-intelligence user interface for querying an index of activities of interest (AOIs) by movement type and geo-location, and for playing-back a summary of associated text (ACO) and activity video segments of targets-of-interest (TOIs) (in both pixel and geo-coordinates). Such tools help the end-user to quickly search, browse, and prepare mission reports from multi-source data.

Rasheed, N., Khan, S.A., Khalid, A.. 2014. Tracking and Abnormal Behavior Detection in Video Surveillance Using Optical Flow and Neural Networks. Advanced Information Networking and Applications Workshops (WAINA), 2014 28th International Conference on. :61-66.

An abnormal behavior detection algorithm for surveillance is required to correctly identify the targets as being in a normal or chaotic movement. A model is developed here for this purpose. The uniqueness of this algorithm is the use of foreground detection with Gaussian mixture (FGMM) model before passing the video frames to optical flow model using Lucas-Kanade approach. Information of horizontal and vertical displacements and directions associated with each pixel for object of interest is extracted. These features are then fed to feed forward neural network for classification and simulation. The study is being conducted on the real time videos and some synthesized videos. Accuracy of method has been calculated by using the performance parameters for Neural Networks. In comparison of plain optical flow with this model, improved results have been obtained without noise. Classes are correctly identified with an overall performance equal to 3.4e-02 with & error percentage of 2.5.

Hong Jiang, Songqing Zhao, Zuowei Shen, Wei Deng, Wilford, P.A., Haimi-Cohen, R.. 2014. Surveillance video analysis using compressive sensing with low latency. Bell Labs Technical Journal. 18:63-74.

We propose a method for analysis of surveillance video by using low rank and sparse decomposition (LRSD) with low latency combined with compressive sensing to segment the background and extract moving objects in a surveillance video. Video is acquired by compressive measurements, and the measurements are used to analyze the video by a low rank and sparse decomposition of a matrix. The low rank component represents the background, and the sparse component, which is obtained in a tight wavelet frame domain, is used to identify moving objects in the surveillance video. An important feature of the proposed low latency method is that the decomposition can be performed with a small number of video frames, which reduces latency in the reconstruction and makes it possible for real time processing of surveillance video. The low latency method is both justified theoretically and validated experimentally.