Visible to the public Biblio

Filters: Keyword is deep video  [Clear All Filters]
2021-01-11
Liu, X., Gao, W., Feng, D., Gao, X..  2020.  Abnormal Traffic Congestion Recognition Based on Video Analysis. 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). :39—42.

The incidence of abnormal road traffic events, especially abnormal traffic congestion, is becoming more and more prominent in daily traffic management in China. It has become the main research work of urban traffic management to detect and identify traffic congestion incidents in time. Efficient and accurate detection of traffic congestion incidents can provide a good strategy for traffic management. At present, the detection and recognition of traffic congestion events mainly rely on the integration of road traffic flow data and the passing data collected by electronic police or devices of checkpoint, and then estimating and forecasting road conditions through the method of big data analysis; Such methods often have some disadvantages such as low time-effect, low precision and small prediction range. Therefore, with the help of the current large and medium cities in the public security, traffic police have built video surveillance equipment, through computer vision technology to analyze the traffic flow from video monitoring, in this paper, the motion state and the changing trend of vehicle flow are obtained by using the technology of vehicle detection from video and multi-target tracking based on deep learning, so as to realize the perception and recognition of traffic congestion. The method achieves the recognition accuracy of less than 60 seconds in real-time, more than 80% in detection rate of congestion event and more than 82.5% in accuracy of detection. At the same time, it breaks through the restriction of traditional big data prediction, such as traffic flow data, truck pass data and GPS floating car data, and enlarges the scene and scope of detection.

Zhang, X., Chandramouli, K., Gabrijelcic, D., Zahariadis, T., Giunta, G..  2020.  Physical Security Detectors for Critical Infrastructures Against New-Age Threat of Drones and Human Intrusion. 2020 IEEE International Conference on Multimedia Expo Workshops (ICMEW). :1—4.

Modern critical infrastructures are increasingly turning into distributed, complex Cyber-Physical systems that need proactive protection and fast restoration to mitigate physical or cyber incidents or attacks. Addressing the need for early stage threat detection against physical intrusion, the paper presents two physical security sensors developed within the DEFENDER project for detecting the intrusion of drones and humans using video analytics. The continuous stream of media data obtained from the region of vulnerability and proximity is processed using Region based Fully Connected Neural Network deep-learning model. The novelty of the pro-posed system relies in the processing of multi-threaded media input streams for achieving real-time threat identification. The video analytics solution has been validated using NVIDIA GeForce GTX 1080 for drone detection and NVIDIA GeForce RTX 2070 Max-Q Design for detecting human intruders. The experimental test bed for the validation of the proposed system has been constructed to include environments and situations that are commonly faced by critical infrastructure operators such as the area of protection, tradeoff between angle of coverage against distance of coverage.

Amrutha, C. V., Jyotsna, C., Amudha, J..  2020.  Deep Learning Approach for Suspicious Activity Detection from Surveillance Video. 2020 2nd International Conference on Innovative Mechanisms for Industry Applications (ICIMIA). :335—339.

Video Surveillance plays a pivotal role in today's world. The technologies have been advanced too much when artificial intelligence, machine learning and deep learning pitched into the system. Using above combinations, different systems are in place which helps to differentiate various suspicious behaviors from the live tracking of footages. The most unpredictable one is human behaviour and it is very difficult to find whether it is suspicious or normal. Deep learning approach is used to detect suspicious or normal activity in an academic environment, and which sends an alert message to the corresponding authority, in case of predicting a suspicious activity. Monitoring is often performed through consecutive frames which are extracted from the video. The entire framework is divided into two parts. In the first part, the features are computed from video frames and in second part, based on the obtained features classifier predict the class as suspicious or normal.

Kanna, J. S. Vignesh, Raj, S. M. Ebenezer, Meena, M., Meghana, S., Roomi, S. Mansoor.  2020.  Deep Learning Based Video Analytics For Person Tracking. 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE). :1—6.

As the assets of people are growing, security and surveillance have become a matter of great concern today. When a criminal activity takes place, the role of the witness plays a major role in nabbing the criminal. The witness usually states the gender of the criminal, the pattern of the criminal's dress, facial features of the criminal, etc. Based on the identification marks provided by the witness, the criminal is searched for in the surveillance cameras. Surveillance cameras are ubiquitous and finding criminals from a huge volume of surveillance video frames is a tedious process. In order to automate the search process, proposed a novel smart methodology using deep learning. This method takes gender, shirt pattern, and spectacle status as input to find out the object as person from the video log. The performance of this method achieves an accuracy of 87% in identifying the person in the video frame.

Gautam, A., Singh, S..  2020.  A Comparative Analysis of Deep Learning based Super-Resolution Techniques for Thermal Videos. 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT). :919—925.

Video streams acquired from thermal cameras are proven to be beneficial in diverse number of fields including military, healthcare, law enforcement, and security. Despite the hype, thermal imaging is increasingly affected by poor resolution, where it has expensive optical sensors and inability to attain optical precision. In recent years, deep learning based super-resolution algorithms are developed to enhance the video frame resolution at high accuracy. This paper presents a comparative analysis of super resolution (SR) techniques based on deep neural networks (DNN) that are applied on thermal video dataset. SRCNN, EDSR, Auto-encoder, and SRGAN are also discussed and investigated. Further the results on benchmark thermal datasets including FLIR, OSU thermal pedestrian database and OSU color thermal database are evaluated and analyzed. Based on the experimental results, it is concluded that, SRGAN has delivered a superior performance on thermal frames when compared to other techniques and improvements, which has the ability to provide state-of-the art performance in real time operations.

2020-07-03
Kakadiya, Rutvik, Lemos, Reuel, Mangalan, Sebin, Pillai, Meghna, Nikam, Sneha.  2019.  AI Based Automatic Robbery/Theft Detection using Smart Surveillance in Banks. 2019 3rd International conference on Electronics, Communication and Aerospace Technology (ICECA). :201—204.

Deep learning is the segment of artificial intelligence which is involved with imitating the learning approach that human beings utilize to get some different types of knowledge. Analyzing videos, a part of deep learning is one of the most basic problems of computer vision and multi-media content analysis for at least 20 years. The job is very challenging as the video contains a lot of information with large differences and difficulties. Human supervision is still required in all surveillance systems. New advancement in computer vision which are observed as an important trend in video surveillance leads to dramatic efficiency gains. We propose a CCTV based theft detection along with tracking of thieves. We use image processing to detect theft and motion of thieves in CCTV footage, without the use of sensors. This system concentrates on object detection. The security personnel can be notified about the suspicious individual committing burglary using Real-time analysis of the movement of any human from CCTV footage and thus gives a chance to avert the same.

Feng, Ri-Chen, Lin, Daw-Tung, Chen, Ken-Min, Lin, Yi-Yao, Liu, Chin-De.  2019.  Improving Deep Learning by Incorporating Semi-automatic Moving Object Annotation and Filtering for Vision-based Vehicle Detection*. 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). :2484—2489.

Deep learning has undergone tremendous advancements in computer vision studies. The training of deep learning neural networks depends on a considerable amount of ground truth datasets. However, labeling ground truth data is a labor-intensive task, particularly for large-volume video analytics applications such as video surveillance and vehicles detection for autonomous driving. This paper presents a rapid and accurate method for associative searching in big image data obtained from security monitoring systems. We developed a semi-automatic moving object annotation method for improving deep learning models. The proposed method comprises three stages, namely automatic foreground object extraction, object annotation in subsequent video frames, and dataset construction using human-in-the-loop quick selection. Furthermore, the proposed method expedites dataset collection and ground truth annotation processes. In contrast to data augmentation and data generative models, the proposed method produces a large amount of real data, which may facilitate training results and avoid adverse effects engendered by artifactual data. We applied the constructed annotation dataset to train a deep learning you-only-look-once (YOLO) model to perform vehicle detection on street intersection surveillance videos. Experimental results demonstrated that the accurate detection performance was improved from a mean average precision (mAP) of 83.99 to 88.03.

Pan, Jonathan.  2019.  Physical Integrity Attack Detection of Surveillance Camera with Deep Learning based Video Frame Interpolation. 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS). :79—85.

Surveillance cameras, which is a form of Cyber Physical System, are deployed extensively to provide visual surveillance monitoring of activities of interest or anomalies. However, these cameras are at risks of physical security attacks against their physical attributes or configuration like tampering of their recording coverage, camera positions or recording configurations like focus and zoom factors. Such adversarial alteration of physical configuration could also be invoked through cyber security attacks against the camera's software vulnerabilities to administratively change the camera's physical configuration settings. When such Cyber Physical attacks occur, they affect the integrity of the targeted cameras that would in turn render these cameras ineffective in fulfilling the intended security functions. There is a significant measure of research work in detection mechanisms of cyber-attacks against these Cyber Physical devices, however it is understudied area with such mechanisms against integrity attacks on physical configuration. This research proposes the use of the novel use of deep learning algorithms to detect such physical attacks originating from cyber or physical spaces. Additionally, we proposed the novel use of deep learning-based video frame interpolation for such detection that has comparatively better performance to other anomaly detectors in spatiotemporal environments.

Dinama, Dima Maharika, A’yun, Qurrota, Syahroni, Achmad Dahlan, Adji Sulistijono, Indra, Risnumawan, Anhar.  2019.  Human Detection and Tracking on Surveillance Video Footage Using Convolutional Neural Networks. 2019 International Electronics Symposium (IES). :534—538.

Safety is one of basic human needs so we need a security system that able to prevent crime happens. Commonly, we use surveillance video to watch environment and human behaviour in a location. However, the surveillance video can only used to record images or videos with no additional information. Therefore we need more advanced camera to get another additional information such as human position and movement. This research were able to extract those information from surveillance video footage by using human detection and tracking algorithm. The human detection framework is based on Deep Learning Convolutional Neural Networks which is a very popular branch of artificial intelligence. For tracking algorithms, channel and spatial correlation filter is used to track detected human. This system will generate and export tracked movement on footage as an additional information. This tracked movement can be analysed furthermore for another research on surveillance video problems.

Suo, Yucong, Zhang, Chen, Xi, Xiaoyun, Wang, Xinyi, Zou, Zhiqiang.  2019.  Video Data Hierarchical Retrieval via Deep Hash Method. 2019 IEEE 11th International Conference on Communication Software and Networks (ICCSN). :709—714.

Video retrieval technology faces a series of challenges with the tremendous growth in the number of videos. In order to improve the retrieval performance in efficiency and accuracy, a novel deep hash method for video data hierarchical retrieval is proposed in this paper. The approach first uses cluster-based method to extract key frames, which reduces the workload of subsequent work. On the basis of this, high-level semantical features are extracted from VGG16, a widely used deep convolutional neural network (deep CNN) model. Then we utilize a hierarchical retrieval strategy to improve the retrieval performance, roughly can be categorized as coarse search and fine search. In coarse search, we modify simHash to learn hash codes for faster speed, and in fine search, we use the Euclidean distance to achieve higher accuracy. Finally, we compare our approach with other two methods through practical experiments on two videos, and the results demonstrate that our approach has better retrieval effect.

Abbasi, Milad Haji, Majidi, Babak, Eshghi, Moahmmad, Abbasi, Ebrahim Haji.  2019.  Deep Visual Privacy Preserving for Internet of Robotic Things. 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI). :292—296.

In the past few years, visual information collection and transmission is increased significantly for various applications. Smart vehicles, service robotic platforms and surveillance cameras for the smart city applications are collecting a large amount of visual data. The preservation of the privacy of people presented in this data is an important factor in storage, processing, sharing and transmission of visual data across the Internet of Robotic Things (IoRT). In this paper, a novel anonymisation method for information security and privacy preservation in visual data in sharing layer of the Web of Robotic Things (WoRT) is proposed. The proposed framework uses deep neural network based semantic segmentation to preserve the privacy in video data base of the access level of the applications and users. The data is anonymised to the applications with lower level access but the applications with higher legal access level can analyze and annotated the complete data. The experimental results show that the proposed method while giving the required access to the authorities for legal applications of smart city surveillance, is capable of preserving the privacy of the people presented in the data.

Bashir, Muzammil, Rundensteiner, Elke A., Ahsan, Ramoza.  2019.  A deep learning approach to trespassing detection using video surveillance data. 2019 IEEE International Conference on Big Data (Big Data). :3535—3544.
Railroad trespassing is a dangerous activity with significant security and safety risks. However, regular patrolling of potential trespassing sites is infeasible due to exceedingly high resource demands and personnel costs. This raises the need to design automated trespass detection and early warning prediction techniques leveraging state-of-the-art machine learning. To meet this need, we propose a novel framework for Automated Railroad Trespassing detection System using video surveillance data called ARTS. As the core of our solution, we adopt a CNN-based deep learning architecture capable of video processing. However, these deep learning-based methods, while effective, are known to be computationally expensive and time consuming, especially when applied to a large volume of surveillance data. Leveraging the sparsity of railroad trespassing activity, ARTS corresponds to a dual-stage deep learning architecture composed of an inexpensive pre-filtering stage for activity detection, followed by a high fidelity trespass classification stage employing deep neural network. The resulting dual-stage ARTS architecture represents a flexible solution capable of trading-off accuracy with computational time. We demonstrate the efficacy of our approach on public domain surveillance data achieving 0.87 f1 score while keeping up with the enormous video volume, achieving a practical time and accuracy trade-off.
Adari, Suman Kalyan, Garcia, Washington, Butler, Kevin.  2019.  Adversarial Video Captioning. 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W). :24—27.
In recent years, developments in the field of computer vision have allowed deep learning-based techniques to surpass human-level performance. However, these advances have also culminated in the advent of adversarial machine learning techniques, capable of launching targeted image captioning attacks that easily fool deep learning models. Although attacks in the image domain are well studied, little work has been done in the video domain. In this paper, we show it is possible to extend prior attacks in the image domain to the video captioning task, without heavily affecting the video's playback quality. We demonstrate our attack against a state-of-the-art video captioning model, by extending a prior image captioning attack known as Show and Fool. To the best of our knowledge, this is the first successful method for targeted attacks against a video captioning model, which is able to inject 'subliminal' perturbations into the video stream, and force the model to output a chosen caption with up to 0.981 cosine similarity, achieving near-perfect similarity to chosen target captions.
2020-04-13
Nalamati, Mrunalini, Kapoor, Ankit, Saqib, Muhammed, Sharma, Nabin, Blumenstein, Michael.  2019.  Drone Detection in Long-Range Surveillance Videos. 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). :1–6.

The usage of small drones/UAVs has significantly increased recently. Consequently, there is a rising potential of small drones being misused for illegal activities such as terrorism, smuggling of drugs, etc. posing high-security risks. Hence, tracking and surveillance of drones are essential to prevent security breaches. The similarity in the appearance of small drone and birds in complex background makes it challenging to detect drones in surveillance videos. This paper addresses the challenge of detecting small drones in surveillance videos using popular and advanced deep learning-based object detection methods. Different CNN-based architectures such as ResNet-101 and Inception with Faster-RCNN, as well as Single Shot Detector (SSD) model was used for experiments. Due to sparse data available for experiments, pre-trained models were used while training the CNNs using transfer learning. Best results were obtained from experiments using Faster-RCNN with the base architecture of ResNet-101. Experimental analysis on different CNN architectures is presented in the paper, along with the visual analysis of the test dataset.

2019-08-12
Wang, Bingning, Liu, Kang, Zhao, Jun.  2018.  Deep Semantic Hashing with Multi-Adversarial Training. Proceedings of the 27th ACM International Conference on Information and Knowledge Management. :1453–1462.
With the amount of data has been rapidly growing over recent decades, binary hashing has become an attractive approach for fast search over large databases, in which the high-dimensional data such as image, video or text is mapped into a low-dimensional binary code. Searching in this hamming space is extremely efficient which is independent of the data size. A lot of methods have been proposed to learn this binary mapping. However, to make the binary codes conserves the input information, previous works mostly resort to mean squared error, which is prone to lose a lot of input information [11]. On the other hand, most of the previous works adopt the norm constraint or approximation on the hidden representation to make it as close as possible to binary, but the norm constraint is too strict that harms the expressiveness and flexibility of the code. In this paper, to generate desirable binary codes, we introduce two adversarial training procedures to the hashing process. We replace the L2 reconstruction error with an adversarial training process to make the codes reserve its input information, and we apply another adversarial learning discriminator on the hidden codes to make it proximate to binary. With the adversarial training process, the generated codes are getting close to binary while also conserves the input information. We conduct comprehensive experiments on both supervised and unsupervised hashing applications and achieves a new state of the arts result on many image hashing benchmarks.
Benzer, R., Yildiz, M. C..  2018.  YOLO Approach in Digital Object Definition in Military Systems. 2018 International Congress on Big Data, Deep Learning and Fighting Cyber Terrorism (IBIGDELFT). :35–37.

Today, as surveillance systems are widely used for indoor and outdoor monitoring applications, there is a growing interest in real-time generation detection and there are many different applications for real-time generation detection and analysis. Two-dimensional videos; It is used in multimedia content-based indexing, information acquisition, visual surveillance and distributed cross-camera surveillance systems, human tracking, traffic monitoring and similar applications. It is of great importance for the development of systems for national security by following a moving target within the scope of military applications. In this research, a more efficient solution is proposed in addition to the existing methods. Therefore, we present YOLO, a new approach to object detection for military applications.

Eetha, S., Agrawal, S., Neelam, S..  2018.  Zynq FPGA Based System Design for Video Surveillance with Sobel Edge Detection. 2018 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS). :76–79.

Advancements in semiconductor domain gave way to realize numerous applications in Video Surveillance using Computer vision and Deep learning, Video Surveillances in Industrial automation, Security, ADAS, Live traffic analysis etc. through image understanding improves efficiency. Image understanding requires input data with high precision which is dependent on Image resolution and location of camera. The data of interest can be thermal image or live feed coming for various sensors. Composite(CVBS) is a popular video interface capable of streaming upto HD(1920x1080) quality. Unlike high speed serial interfaces like HDMI/MIPI CSI, Analog composite video interface is a single wire standard supporting longer distances. Image understanding requires edge detection and classification for further processing. Sobel filter is one the most used edge detection filter which can be embedded into live stream. This paper proposes Zynq FPGA based system design for video surveillance with Sobel edge detection, where the input Composite video decoded (Analog CVBS input to YCbCr digital output), processed in HW and streamed to HDMI display simultaneously storing in SD memory for later processing. The HW design is scalable for resolutions from VGA to Full HD for 60fps and 4K for 24fps. The system is built on Xilinx ZC702 platform and TVP5146 to showcase the functional path.

Liu, Y., Yang, Y., Shi, A., Jigang, P., Haowei, L..  2019.  Intelligent monitoring of indoor surveillance video based on deep learning. 2019 21st International Conference on Advanced Communication Technology (ICACT). :648–653.

With the rapid development of information technology, video surveillance system has become a key part in the security and protection system of modern cities. Especially in prisons, surveillance cameras could be found almost everywhere. However, with the continuous expansion of the surveillance network, surveillance cameras not only bring convenience, but also produce a massive amount of monitoring data, which poses huge challenges to storage, analytics and retrieval. The smart monitoring system equipped with intelligent video analytics technology can monitor as well as pre-alarm abnormal events or behaviours, which is a hot research direction in the field of surveillance. This paper combines deep learning methods, using the state-of-the-art framework for instance segmentation, called Mask R-CNN, to train the fine-tuning network on our datasets, which can efficiently detect objects in a video image while simultaneously generating a high-quality segmentation mask for each instance. The experiment show that our network is simple to train and easy to generalize to other datasets, and the mask average precision is nearly up to 98.5% on our own datasets.

Fok, Wilton W. T., Chan, Louis C. W., Chen, Carol.  2018.  Artificial Intelligence for Sport Actions and Performance Analysis Using Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM). Proceedings of the 2018 4th International Conference on Robotics and Artificial Intelligence. :40–44.
The development of Human Action Recognition (HAR) system is getting popular. This project developed a HAR system for the application in the surveillance system to minimize the man-power for providing security to the citizens such as public safety and crime prevention. In this research, deep learning network using Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) are used to analyze dynamic video motion of sport actions and classify different types of actions and their performance. It could classify different types of human motion with a small number of video frame for efficiency and memory saving. The current accuracy achieved is up to 92.9% but with high potential of further improvement.
Wu, Yifan, Drucker, Steven, Philipose, Matthai, Ravindranath, Lenin.  2018.  Querying Videos Using DNN Generated Labels. Proceedings of the Workshop on Human-In-the-Loop Data Analytics. :6:1–6:6.
Massive amounts of videos are generated for entertainment, security, and science, powered by a growing supply of user-produced video hosting services. Unfortunately, searching for videos is difficult due to the lack of content annotations. Recent breakthroughs in image labeling with deep neural networks (DNNs) create a unique opportunity to address this problem. While many automated end-to-end solutions have been developed, such as natural language queries, we take on a different perspective: to leverage both the development of algorithms and human capabilities. To this end, we design a query language in tandem with a user interface to help users quickly identify segments of interest from the video based on labels and corresponding bounding boxes. We combine techniques from the database and information visualization communities to help the user make sense of the object labels in spite of errors and inconsistencies.
Verdoliva, Luisa.  2018.  Deep Learning in Multimedia Forensics. Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia Security. :3–3.
With the widespread diffusion of powerful media editing tools, falsifying images and videos has become easier and easier in the last few years. Fake multimedia, often used to support fake news, represents a growing menace in many fields of life, notably in politics, journalism, and the judiciary. In response to this threat, the signal processing community has produced a major research effort. A large number of methods have been proposed for source identification, forgery detection and localization, relying on the typical signal processing tools. The advent of deep learning, however, is changing the rules of the game. On one hand, new sophisticated methods based on deep learning have been proposed to accomplish manipulations that were previously unthinkable. On the other hand, deep learning provides also the analyst with new powerful forensic tools. Given a suitably large training set, deep learning architectures ensure usually a significant performance gain with respect to conventional methods, and a much higher robustness to post-processing and evasions. In this talk after reviewing the main approaches proposed in the literature to ensure media authenticity, the most promising solutions relying on Convolutional Neural Networks will be explored with special attention to realistic scenarios, such as when manipulated images and videos are spread out over social networks. In addition, an analysis of the efficacy of adversarial attacks on such methods will be presented.
Khryashchev, Vladimir, Ivanovsky, Leonid, Priorov, Andrey.  2018.  Deep Learning for Real-Time Robust Facial Expression Analysis. Proceedings of the International Conference on Machine Vision and Applications. :66–70.
The aim of this investigation is to classify real-life facial images into one of six types of emotions. For solving this problem, we propose to use deep machine learning algorithms and convolutional neural network (CNN). CNN is a modern type of neural network, which allows for rapid detection of various objects, as well as to make an effective object classification. For acceleration of CNN learning stage, we use supercomputer NVIDIA DGX-1. This process was implemented in parallel on a large number of independent streams on GPU. Numerical experiments for algorithms were performed on the images of Multi-Pie image database with various lighting of scene and angle rotation of head. For developed models, several metrics of quality were calculated. The designing algorithm was used in real-time video processing in human-computer interaction systems. Moreover, expression recognition can apply in such fields as retail analysis, security, video games, animations, psychiatry, automobile safety, educational software, etc.
Islam, Ashraful, Zhang, Yuexi, Yin, Dong, Camps, Octavia, Radke, Richard J..  2018.  Correlating Belongings with Passengers in a Simulated Airport Security Checkpoint. Proceedings of the 12th International Conference on Distributed Smart Cameras. :14:1–14:7.
Automatic algorithms for tracking and associating passengers and their divested objects at an airport security screening checkpoint would have great potential for improving checkpoint efficiency, including flow analysis, theft detection, line-of-sight maintenance, and risk-based screening. In this paper, we present algorithms for these tracking and association problems and demonstrate their effectiveness in a full-scale physical simulation of an airport security screening checkpoint. Our algorithms leverage both hand-crafted and deep-learning-based approaches for passenger and bin tracking, and are able to accurately track and associate objects through a ceiling-mounted multicamera array. We validate our algorithm on ground-truthed datasets collected at the simulated checkpoint that reflect natural passenger behavior, achieving high rates of passenger/object/transfer event detection while maintaining low false alarm and mismatch rates.
Peixoto, Bruno Malveira, Avila, Sandra, Dias, Zanoni, Rocha, Anderson.  2018.  Breaking Down Violence: A Deep-learning Strategy to Model and Classify Violence in Videos. Proceedings of the 13th International Conference on Availability, Reliability and Security. :50:1–50:7.
Detecting violence in videos through automatic means is significant for law enforcement and analysis of surveillance cameras with the intent of maintaining public safety. Moreover, it may be a great tool for protecting children from accessing inappropriate content and help parents make a better informed decision about what their kids should watch. However, this is a challenging problem since the very definition of violence is broad and highly subjective. Hence, detecting such nuances from videos with no human supervision is not only technical, but also a conceptual problem. With this in mind, we explore how to better describe the idea of violence for a convolutional neural network by breaking it into more objective and concrete parts. Initially, our method uses independent networks to learn features for more specific concepts related to violence, such as fights, explosions, blood, etc. Then we use these features to classify each concept and later fuse them in a meta-classification to describe violence. We also explore how to represent time-based events in still-images as network inputs; since many violent acts are described in terms of movement. We show that using more specific concepts is an intuitive and effective solution, besides being complementary to form a more robust definition of violence. When compared to other methods for violence detection, this approach holds better classification quality while using only automatic features.
2018-11-19
Sobue, Hideaki, Fukushima, Yuki, Kashiyama, Takahiro, Sekimoto, Yoshihide.  2017.  Flying Object Detection and Classification by Monitoring Using Video Images. Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. :57:1–57:4.

In recent years, there has been remarkable development in unmanned aerial vehicle UAVs); certain companies are trying to use the UAV to deliver goods also. Therefore, it is predicted that many such objects will fly over the city, in the near future. This study proposes a system for monitoring objects flying over a city. We use multiple 4K video cameras to capture videos of the flying objects. In this research, we combine background subtraction and a state-of-the-art tracking method, the KCF, for detection and tracking. We use deep learning for classification and the SfM for calculating the 3-dimensional trajectory. A UAV is flown over the inner-city area of Tokyo and videos are captured. The accuracy of each processing is verified, using the videos of objects flying over the city. In each processing, we obtain a certain measure of accuracy; thus, there is a good prospect of creating a system to monitor objects flying, over a city.