Biblio
Advanced persistent threats (APT) have increased in recent times as a result of the rise in interest by nation-states and sophisticated corporations to obtain high profile information. Typically, APT attacks are more challenging to detect since they leverage zero-day attacks and common benign tools. Furthermore, these attack campaigns are often prolonged to evade detection. We leverage an approach that uses a provenance graph to obtain execution traces of host nodes in order to detect anomalous behavior. By using the provenance graph, we extract features that are then used to train an online adaptive metric learning. Online metric learning is a deep learning method that learns a function to minimize the separation between similar classes and maximizes the separation between dis-similar instances. We compare our approach with baseline models and we show our method outperforms the baseline models by increasing detection accuracy on average by 11.3 % and increases True positive rate (TPR) on average by 18.3 %.
Advanced Persistent Threat (APT) is a stealthy, continuous and sophisticated method of network attacks, which can cause serious privacy leakage and millions of dollars losses. In this paper, we introduce a new game-theoretic framework of the interaction between a defender who uses limited Security Resources(SRs) to harden network and an attacker who adopts a multi-stage plan to attack the network. The game model is derived from Stackelberg games called a Multi-stage Maze Network Game (M2NG) in which the characteristics of APT are fully considered. The possible plans of the attacker are compactly represented using attack graphs(AGs), but the compact representation of the attacker's strategies presents a computational challenge and reaching the Nash Equilibrium(NE) is NP-hard. We present a method that first translates AGs into Markov Decision Process(MDP) and then achieves the optimal SRs allocation using the policy hill-climbing(PHC) algorithm. Finally, we present an empirical evaluation of the model and analyze the scalability and sensitivity of the algorithm. Simulation results exhibit that our proposed reinforcement learning-based SRs allocation is feasible and efficient.
Intrusion detection is one of the most prominent and challenging problem faced by cybersecurity organizations. Intrusion Detection System (IDS) plays a vital role in identifying network security threats. It protects the network for vulnerable source code, viruses, worms and unauthorized intruders for many intranet/internet applications. Despite many open source APIs and tools for intrusion detection, there are still many network security problems exist. These problems are handled through the proper pre-processing, normalization, feature selection and ranking on benchmark dataset attributes prior to the enforcement of self-learning-based classification algorithms. In this paper, we have performed a comprehensive comparative analysis of the benchmark datasets NSL-KDD and CIDDS-001. For getting optimal results, we have used the hybrid feature selection and ranking methods before applying self-learning (Machine / Deep Learning) classification algorithmic approaches such as SVM, Naïve Bayes, k-NN, Neural Networks, DNN and DAE. We have analyzed the performance of IDS through some prominent performance indicator metrics such as Accuracy, Precision, Recall and F1-Score. The experimental results show that k-NN, SVM, NN and DNN classifiers perform approx. 100% accuracy regarding performance evaluation metrics on the NSL-KDD dataset whereas k-NN and Naïve Bayes classifiers perform approx. 99% accuracy on the CIDDS-001 dataset.
An acoustic fingerprint is a condensed and powerful digital signature of an audio signal which is used for audio sample identification. A fingerprint is the pattern of a voice or audio sample. A large number of algorithms have been developed for generating such acoustic fingerprints. These algorithms facilitate systems that perform song searching, song identification, and song duplication detection. In this study, a comprehensive and powerful survey of already developed algorithms is conducted. Four major music fingerprinting algorithms are evaluated for identifying and analyzing the potential hurdles that can affect their results. Since the background and environmental noise reduces the efficiency of music fingerprinting algorithms, behavioral analysis of fingerprinting algorithms is performed using audio samples of different languages and under different environmental conditions. The results of music fingerprint classification are more successful when deep learning techniques for classification are used. The testing of the acoustic feature modeling and music fingerprinting algorithms is performed using the standard dataset of iKala, MusicBrainz and MIR-1K.
With the arrival of several face-swapping applications such as FaceApp, SnapChat, MixBooth, FaceBlender and many more, the authenticity of digital media content is hanging on a very loose thread. On social media platforms, videos are widely circulated often at a high compression factor. In this work, we analyze several deep learning approaches in the context of deepfakes classification in high compression scenarios and demonstrate that a proposed approach based on metric learning can be very effective in performing such a classification. Using less number of frames per video to assess its realism, the metric learning approach using a triplet network architecture proves to be fruitful. It learns to enhance the feature space distance between the cluster of real and fake videos embedding vectors. We validated our approaches on two datasets to analyze the behavior in different environments. We achieved a state-of-the-art AUC score of 99.2% on the Celeb-DF dataset and accuracy of 90.71% on a highly compressed Neural Texture dataset. Our approach is especially helpful on social media platforms where data compression is inevitable.
In recent months, AI-synthesized face swapping videos referred to as deepfake have become an emerging problem. False video is becoming more and more difficult to distinguish, which brings a series of challenges to social security. Some scholars are devoted to studying how to improve the detection accuracy of deepfake video. At the same time, in order to conduct better research, some datasets for deepfake detection are made. Companies such as Google and Facebook have also spent huge sums of money to produce datasets for deepfake video detection, as well as holding deepfake detection competitions. The continuous advancement of video tampering technology and the improvement of video quality have also brought great challenges to deepfake detection. Some scholars have achieved certain results on existing datasets, while the results on some high-quality datasets are not as good as expected. In this paper, we propose new method with clustering-based embedding regularization for deepfake detection. We use open source algorithms to generate videos which can simulate distinctive artifacts in the deepfake videos. To improve the local smoothness of the representation space, we integrate a clustering-based embedding regularization term into the classification objective, so that the obtained model learns to resist adversarial examples. We evaluate our method on three latest deepfake datasets. Experimental results demonstrate the effectiveness of our method.
Human action recognition in video is one of the most widely applied topics in the field of image and video processing, with many applications in surveillance (security, sports, etc.), activity detection, video-content-based monitoring, man-machine interaction, and health/disability care. Action recognition is a complex process that faces several challenges such as occlusion, camera movement, viewpoint move, background clutter, and brightness variation. In this study, we propose a novel human action recognition method using convolutional neural networks (CNN) and deep bidirectional LSTM (DB-LSTM) networks, using only raw video frames. First, deep features are extracted from video frames using a pre-trained CNN architecture called ResNet152. The sequential information of the frames is then learned using the DB-LSTM network, where multiple layers are stacked together in both forward and backward passes of DB-LSTM, to increase depth. The evaluation results of the proposed method using PyTorch, compared to the state-of-the-art methods, show a considerable increase in the efficiency of action recognition on the UCF 101 dataset, reaching 95% recognition accuracy. The choice of the CNN architecture, proper tuning of input parameters, and techniques such as data augmentation contribute to the accuracy boost in this study.
Video surveillance plays an important role in our times. It is a great help in reducing the crime rate, and it can also help to monitor the status of facilities. The performance of the video surveillance system is limited by human factors such as fatigue, time efficiency, and human resources. It would be beneficial for all if fully automatic video surveillance systems are employed to do the job. The automation of the video surveillance system is still not satisfying regarding many problems such as the accuracy of the detector, bandwidth consumption, storage usage, etc. This scientific paper mainly focuses on a video surveillance system using Convolutional Neural Networks (CNN), IoT and cloud. The system contains multi nods, each node consists of a microprocessor(Raspberry Pi) and a camera, the nodes communicate with each other using client and server architecture. The nodes can detect humans using a pretraining MobileNetv2-SSDLite model and Common Objects in Context(COCO) dataset, the captured video will stream to the main node(only one node will communicate with cloud) in order to stream the video to the cloud. Also, the main node will send an SMS notification to the security team to inform the detection of humans. The security team can check the videos captured using a mobile application or web application. Operating the Object detection model of Deep learning will be required a large amount of the computational power, for instance, the Raspberry Pi with a limited in performance for that reason we used the MobileNetv2-SSDLite model.
The incidence of abnormal road traffic events, especially abnormal traffic congestion, is becoming more and more prominent in daily traffic management in China. It has become the main research work of urban traffic management to detect and identify traffic congestion incidents in time. Efficient and accurate detection of traffic congestion incidents can provide a good strategy for traffic management. At present, the detection and recognition of traffic congestion events mainly rely on the integration of road traffic flow data and the passing data collected by electronic police or devices of checkpoint, and then estimating and forecasting road conditions through the method of big data analysis; Such methods often have some disadvantages such as low time-effect, low precision and small prediction range. Therefore, with the help of the current large and medium cities in the public security, traffic police have built video surveillance equipment, through computer vision technology to analyze the traffic flow from video monitoring, in this paper, the motion state and the changing trend of vehicle flow are obtained by using the technology of vehicle detection from video and multi-target tracking based on deep learning, so as to realize the perception and recognition of traffic congestion. The method achieves the recognition accuracy of less than 60 seconds in real-time, more than 80% in detection rate of congestion event and more than 82.5% in accuracy of detection. At the same time, it breaks through the restriction of traditional big data prediction, such as traffic flow data, truck pass data and GPS floating car data, and enlarges the scene and scope of detection.
Modern critical infrastructures are increasingly turning into distributed, complex Cyber-Physical systems that need proactive protection and fast restoration to mitigate physical or cyber incidents or attacks. Addressing the need for early stage threat detection against physical intrusion, the paper presents two physical security sensors developed within the DEFENDER project for detecting the intrusion of drones and humans using video analytics. The continuous stream of media data obtained from the region of vulnerability and proximity is processed using Region based Fully Connected Neural Network deep-learning model. The novelty of the pro-posed system relies in the processing of multi-threaded media input streams for achieving real-time threat identification. The video analytics solution has been validated using NVIDIA GeForce GTX 1080 for drone detection and NVIDIA GeForce RTX 2070 Max-Q Design for detecting human intruders. The experimental test bed for the validation of the proposed system has been constructed to include environments and situations that are commonly faced by critical infrastructure operators such as the area of protection, tradeoff between angle of coverage against distance of coverage.
Video Surveillance plays a pivotal role in today's world. The technologies have been advanced too much when artificial intelligence, machine learning and deep learning pitched into the system. Using above combinations, different systems are in place which helps to differentiate various suspicious behaviors from the live tracking of footages. The most unpredictable one is human behaviour and it is very difficult to find whether it is suspicious or normal. Deep learning approach is used to detect suspicious or normal activity in an academic environment, and which sends an alert message to the corresponding authority, in case of predicting a suspicious activity. Monitoring is often performed through consecutive frames which are extracted from the video. The entire framework is divided into two parts. In the first part, the features are computed from video frames and in second part, based on the obtained features classifier predict the class as suspicious or normal.
As the assets of people are growing, security and surveillance have become a matter of great concern today. When a criminal activity takes place, the role of the witness plays a major role in nabbing the criminal. The witness usually states the gender of the criminal, the pattern of the criminal's dress, facial features of the criminal, etc. Based on the identification marks provided by the witness, the criminal is searched for in the surveillance cameras. Surveillance cameras are ubiquitous and finding criminals from a huge volume of surveillance video frames is a tedious process. In order to automate the search process, proposed a novel smart methodology using deep learning. This method takes gender, shirt pattern, and spectacle status as input to find out the object as person from the video log. The performance of this method achieves an accuracy of 87% in identifying the person in the video frame.
Video streams acquired from thermal cameras are proven to be beneficial in diverse number of fields including military, healthcare, law enforcement, and security. Despite the hype, thermal imaging is increasingly affected by poor resolution, where it has expensive optical sensors and inability to attain optical precision. In recent years, deep learning based super-resolution algorithms are developed to enhance the video frame resolution at high accuracy. This paper presents a comparative analysis of super resolution (SR) techniques based on deep neural networks (DNN) that are applied on thermal video dataset. SRCNN, EDSR, Auto-encoder, and SRGAN are also discussed and investigated. Further the results on benchmark thermal datasets including FLIR, OSU thermal pedestrian database and OSU color thermal database are evaluated and analyzed. Based on the experimental results, it is concluded that, SRGAN has delivered a superior performance on thermal frames when compared to other techniques and improvements, which has the ability to provide state-of-the art performance in real time operations.