Visible to the public Biblio

Filters: Keyword is pose estimation  [Clear All Filters]
2023-07-28
Khunchai, Seree, Kruekaew, Adool, Getvongsa, Natthapong.  2022.  A Fuzzy Logic-Based System of Abnormal Behavior Detection Using PoseNet for Smart Security System. 2022 37th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC). :912—915.
This paper aims to contribute towards creating ambient abnormal behavior detection for smart security system from real-time human pose estimation using fuzzy-based systems. Human poses from keypoint detected by pose estimation model are transformed to as angle positions of the axis between human bodies joints comparing to reference point in the axis x to deal with problem of the position change occurred when an individual move in the image. Also, the article attempts to resolve the problem of the ambiguity interpreting the poses with triangular fuzzy logic-based system that determines the detected individual behavior and compares to the poses previously learnt, trained, and recorded by the system. The experiment reveals that the accuracy of the system ranges between 90.75% (maximum) and 84% (minimum). This means that if the accuracy of the system at 85%. The system can be applied to guide future research for designing automatic visual human behavior detection systems.
2023-06-23
Rajin, S M Ataul Karim, Murshed, Manzur, Paul, Manoranjan, Teng, Shyh Wei, Ma, Jiangang.  2022.  Human pose based video compression via forward-referencing using deep learning. 2022 IEEE International Conference on Visual Communications and Image Processing (VCIP). :1–5.

To exploit high temporal correlations in video frames of the same scene, the current frame is predicted from the already-encoded reference frames using block-based motion estimation and compensation techniques. While this approach can efficiently exploit the translation motion of the moving objects, it is susceptible to other types of affine motion and object occlusion/deocclusion. Recently, deep learning has been used to model the high-level structure of human pose in specific actions from short videos and then generate virtual frames in future time by predicting the pose using a generative adversarial network (GAN). Therefore, modelling the high-level structure of human pose is able to exploit semantic correlation by predicting human actions and determining its trajectory. Video surveillance applications will benefit as stored “big” surveillance data can be compressed by estimating human pose trajectories and generating future frames through semantic correlation. This paper explores a new way of video coding by modelling human pose from the already-encoded frames and using the generated frame at the current time as an additional forward-referencing frame. It is expected that the proposed approach can overcome the limitations of the traditional backward-referencing frames by predicting the blocks containing the moving objects with lower residuals. Our experimental results show that the proposed approach can achieve on average up to 2.83 dB PSNR gain and 25.93% bitrate savings for high motion video sequences compared to standard video coding.

ISSN: 2642-9357

2023-02-03
Rettlinger, Sebastian, Knaus, Bastian, Wieczorek, Florian, Ivakko, Nikolas, Hanisch, Simon, Nguyen, Giang T., Strufe, Thorsten, Fitzek, Frank H. P..  2022.  MPER - a Motion Profiling Experiment and Research system for human body movement. 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops). :88–90.
State-of-the-art approaches in gait analysis usually rely on one isolated tracking system, generating insufficient data for complex use cases such as sports, rehabilitation, and MedTech. We address the opportunity to comprehensively understand human motion by a novel data model combining several motion-tracking methods. The model aggregates pose estimation by captured videos and EMG and EIT sensor data synchronously to gain insights into muscle activities. Our demonstration with biceps curl and sitting/standing pose generates time-synchronous data and delivers insights into our experiment’s usability, advantages, and challenges.
2021-01-15
Yang, X., Li, Y., Lyu, S..  2019.  Exposing Deep Fakes Using Inconsistent Head Poses. ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). :8261—8265.
In this paper, we propose a new method to expose AI-generated fake face images or videos (commonly known as the Deep Fakes). Our method is based on the observations that Deep Fakes are created by splicing synthesized face region into the original image, and in doing so, introducing errors that can be revealed when 3D head poses are estimated from the face images. We perform experiments to demonstrate this phenomenon and further develop a classification method based on this cue. Using features based on this cue, an SVM classifier is evaluated using a set of real face images and Deep Fakes.
2020-10-05
Chakraborty, Anit, Dutta, Sayandip, Bhattacharyya, Siddhartha, Platos, Jan, Snasel, Vaclav.  2018.  Reinforcement Learning inspired Deep Learned Compositional Model for Decision Making in Tracking. 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN). :158—163.

We formulate a tracker which performs incessant decision making in order to track objects where the objects may undergo different challenges such as partial occlusions, moving camera, cluttered background etc. In the process, the agent must make a decision on whether to keep track of the object when it is occluded or has moved out of the frame temporarily based on its prediction from the previous location or to reinitialize the tracker based on the belief that the target has been lost. Instead of the heuristic methods we depend on reward and penalty based training that helps the agent reach an optimal solution via this partially observable Markov decision making (POMDP). Furthermore, we employ deeply learned compositional model to estimate human pose in order to better handle occlusion without needing human inputs. By learning compositionality of human bodies via deep neural network the agent can make better decision on presence of human in a frame or lack thereof under occlusion. We adapt skeleton based part representation and do away with the large spatial state requirement. This especially helps in cases where orientation of the target in focus is unorthodox. Finally we demonstrate that the deep reinforcement learning based training coupled with pose estimation capabilities allows us to train and tag multiple large video datasets much quicker than previous works.

2020-06-19
Ly, Son Thai, Do, Nhu-Tai, Lee, Guee-Sang, Kim, Soo-Hyung, Yang, Hyung-Jeong.  2019.  A 3d Face Modeling Approach for in-The-Wild Facial Expression Recognition on Image Datasets. 2019 IEEE International Conference on Image Processing (ICIP). :3492—3496.

This paper explores the benefits of 3D face modeling for in-the-wild facial expression recognition (FER). Since there is limited in-the-wild 3D FER dataset, we first construct 3D facial data from available 2D dataset using recent advances in 3D face reconstruction. The 3D facial geometry representation is then extracted by deep learning technique. In addition, we also take advantage of manipulating the 3D face, such as using 2D projected images of 3D face as additional input for FER. These features are then fused with that of 2D FER typical network. By doing so, despite using common approaches, we achieve a competent recognition accuracy on Real-World Affective Faces (RAF) database and Static Facial Expressions in the Wild (SFEW 2.0) compared with the state-of-the-art reports. To the best of our knowledge, this is the first time such a deep learning combination of 3D and 2D facial modalities is presented in the context of in-the-wild FER.

2019-09-23
Zhang, Caixia, Bai, Gang.  2018.  Using Hybrid Features of QR Code to Locate and Track in Augmented Reality. Proceedings of the 2018 International Conference on Information Science and System. :273–279.
Augmented Reality (AR) is a technique which seamlessly integrate virtual 3D models into the image of the real scenario in real time. Using the QR code as the identification mark, an algorithm is proposed to extract the virtual straight line of QR code and to locate and track the camera based on the hybrid features, thus it avoids the possibility of failure when locating and tracking only by feature points. The experimental results show that the method of combining straight lines with feature points is better than that of using only straight lines or feature points. Further, an AR (Augmented Reality) system is developed.
2019-08-12
Liu, Y., Yang, Y., Shi, A., Jigang, P., Haowei, L..  2019.  Intelligent monitoring of indoor surveillance video based on deep learning. 2019 21st International Conference on Advanced Communication Technology (ICACT). :648–653.

With the rapid development of information technology, video surveillance system has become a key part in the security and protection system of modern cities. Especially in prisons, surveillance cameras could be found almost everywhere. However, with the continuous expansion of the surveillance network, surveillance cameras not only bring convenience, but also produce a massive amount of monitoring data, which poses huge challenges to storage, analytics and retrieval. The smart monitoring system equipped with intelligent video analytics technology can monitor as well as pre-alarm abnormal events or behaviours, which is a hot research direction in the field of surveillance. This paper combines deep learning methods, using the state-of-the-art framework for instance segmentation, called Mask R-CNN, to train the fine-tuning network on our datasets, which can efficiently detect objects in a video image while simultaneously generating a high-quality segmentation mask for each instance. The experiment show that our network is simple to train and easy to generalize to other datasets, and the mask average precision is nearly up to 98.5% on our own datasets.

2017-03-08
Paone, J., Bolme, D., Ferrell, R., Aykac, D., Karnowski, T..  2015.  Baseline face detection, head pose estimation, and coarse direction detection for facial data in the SHRP2 naturalistic driving study. 2015 IEEE Intelligent Vehicles Symposium (IV). :174–179.

Keeping a driver focused on the road is one of the most critical steps in insuring the safe operation of a vehicle. The Strategic Highway Research Program 2 (SHRP2) has over 3,100 recorded videos of volunteer drivers during a period of 2 years. This extensive naturalistic driving study (NDS) contains over one million hours of video and associated data that could aid safety researchers in understanding where the driver's attention is focused. Manual analysis of this data is infeasible; therefore efforts are underway to develop automated feature extraction algorithms to process and characterize the data. The real-world nature, volume, and acquisition conditions are unmatched in the transportation community, but there are also challenges because the data has relatively low resolution, high compression rates, and differing illumination conditions. A smaller dataset, the head pose validation study, is available which used the same recording equipment as SHRP2 but is more easily accessible with less privacy constraints. In this work we report initial head pose accuracy using commercial and open source face pose estimation algorithms on the head pose validation data set.

Kerl, C., Stückler, J., Cremers, D..  2015.  Dense Continuous-Time Tracking and Mapping with Rolling Shutter RGB-D Cameras. 2015 IEEE International Conference on Computer Vision (ICCV). :2264–2272.

We propose a dense continuous-time tracking and mapping method for RGB-D cameras. We parametrize the camera trajectory using continuous B-splines and optimize the trajectory through dense, direct image alignment. Our method also directly models rolling shutter in both RGB and depth images within the optimization, which improves tracking and reconstruction quality for low-cost CMOS sensors. Using a continuous trajectory representation has a number of advantages over a discrete-time representation (e.g. camera poses at the frame interval). With splines, less variables need to be optimized than with a discrete representation, since the trajectory can be represented with fewer control points than frames. Splines also naturally include smoothness constraints on derivatives of the trajectory estimate. Finally, the continuous trajectory representation allows to compensate for rolling shutter effects, since a pose estimate is available at any exposure time of an image. Our approach demonstrates superior quality in tracking and reconstruction compared to approaches with discrete-time or global shutter assumptions.