Biblio
In recent years, artificial intelligence has been widely used in the field of network security, which has significantly improved the effect of network security analysis and detection. However, because the power industrial control system is faced with the problem of shortage of attack data, the direct deployment of the network intrusion detection system based on artificial intelligence is faced with the problems of lack of data, low precision, and high false alarm rate. To solve this problem, we propose an anomaly traffic detection method based on cross-domain knowledge transferring. By using the TrAdaBoost algorithm, we achieve a lower error rate than using LSTM alone.
Network covert timing channel(NCTC) is a process of transmitting hidden information by means of inter-packet delay (IPD) of legitimate network traffic. Their ability to evade traditional security policies makes NCTCs a grave security concern. However, a robust method that can be used to detect a large number of NCTCs is missing. In this paper, a NCTC detection method based on chaos theory and threshold secret sharing is proposed. Our method uses chaos theory to reconstruct a high-dimensional phase space from one-dimensional time series and extract the unique and stable channel traits. Then, a channel identifier is constructed using the secret reconstruction strategy from threshold secret sharing to realize the mapping of the channel features to channel identifiers. Experimental results show that the approach can detect varieties of NCTCs with a guaranteed true positive rate and greatly improve the versatility and robustness.
Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.
Nowadays, Microblog has become an important online social networking platform, and a large number of users share information through Microblog. Many malicious users have released various false news driven by various interests, which seriously affects the availability of Microblog platform. Therefore, the evaluation of Microblog user credibility has become an important research issue. This paper proposes a microblog user credibility evaluation algorithm based on trust propagation. In view of the high consumption and low precision caused by malicious users' attacking algorithms and manual selection of seed sets by establishing false social relationships, this paper proposes two optimization strategies: pruning algorithm based on social activity and similarity and based on The seed node selection algorithm of clustering. The pruning algorithm can trim off the attack edges established by malicious users and normal users. The seed node selection algorithm can efficiently select the highly available seed node set, and finally use the user social relationship graph to perform the two-way propagation trust scoring, so that the low trusted user has a lower trusted score and thus identifies the malicious user. The related experiments verify the effectiveness of the trustworthiness-based user credibility evaluation algorithm in the evaluation of Microblog user credibility.
Botnet is one of the major threats on the Internet for committing cybercrimes, such as DDoS attacks, stealing sensitive information, spreading spams, etc. It is a challenging issue to detect modern botnets that are continuously improving for evading detection. In this paper, we propose a machine learning based botnet detection system that is shown to be effective in identifying P2P botnets. Our approach extracts convolutional version of effective flow-based features, and trains a classification model by using a feed-forward artificial neural network. The experimental results show that the accuracy of detection using the convolutional features is better than the ones using the traditional features. It can achieve 94.7% of detection accuracy and 2.2% of false positive rate on the known P2P botnet datasets. Furthermore, our system provides an additional confidence testing for enhancing performance of botnet detection. It further classifies the network traffic of insufficient confidence in the neural network. The experiment shows that this stage can increase the detection accuracy up to 98.6% and decrease the false positive rate up to 0.5%.
Personal privacy is an important issue when publishing social network data. An attacker may have information to reidentify private data. So, many researchers developed anonymization techniques, such as k-anonymity, k-isomorphism, l-diversity, etc. In this paper, we focus on graph k-degree anonymity by editing edges. Our method is divided into two steps. First, we propose an efficient algorithm to find a new degree sequence with theoretically minimal edit cost. Second, we insert and delete edges based on the new degree sequence to achieve k-degree anonymity.
We report a an experimental study of device-independent quantum random number generation based on an detection-loophole free Bell test with entangled photons. After considering statistical fluctuations and applying an 80 Gb × 45.6 Mb Toeplitz matrix hashing, we achieve a final random bit rate of 114 bits/s, with a failure probability less than 10-5.
To prevent users' privacy from leakage, more and more mobile devices employ biometric-based authentication approaches, such as fingerprint, face recognition, voiceprint authentications, etc., to enhance the privacy protection. However, these approaches are vulnerable to replay attacks. Although state-of-art solutions utilize liveness verification to combat the attacks, existing approaches are sensitive to ambient environments, such as ambient lights and surrounding audible noises. Towards this end, we explore liveness verification of user authentication leveraging users' lip movements, which are robust to noisy environments. In this paper, we propose a lip reading-based user authentication system, LipPass, which extracts unique behavioral characteristics of users' speaking lips leveraging build-in audio devices on smartphones for user authentication. We first investigate Doppler profiles of acoustic signals caused by users' speaking lips, and find that there are unique lip movement patterns for different individuals. To characterize the lip movements, we propose a deep learning-based method to extract efficient features from Doppler profiles, and employ Support Vector Machine and Support Vector Domain Description to construct binary classifiers and spoofer detectors for user identification and spoofer detection, respectively. Afterwards, we develop a binary tree-based authentication approach to accurately identify each individual leveraging these binary classifiers and spoofer detectors with respect to registered users. Through extensive experiments involving 48 volunteers in four real environments, LipPass can achieve 90.21% accuracy in user identification and 93.1% accuracy in spoofer detection.
Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.
Despite decades of research on software diversification, only address space layout randomization has seen widespread adoption. Code randomization, an effective defense against return-oriented programming exploits, has remained an academic exercise mainly due to i) the lack of a transparent and streamlined deployment model that does not disrupt existing software distribution norms, and ii) the inherent incompatibility of program variants with error reporting, whitelisting, patching, and other operations that rely on code uniformity. In this work we present compiler-assisted code randomization (CCR), a hybrid approach that relies on compiler-rewriter cooperation to enable fast and robust fine-grained code randomization on end-user systems, while maintaining compatibility with existing software distribution models. The main concept behind CCR is to augment binaries with a minimal set of transformation-assisting metadata, which i) facilitate rapid fine-grained code transformation at installation or load time, and ii) form the basis for reversing any applied code transformation when needed, to maintain compatibility with existing mechanisms that rely on referencing the original code. We have implemented a prototype of this approach by extending the LLVM compiler toolchain, and developing a simple binary rewriter that leverages the embedded metadata to generate randomized variants using basic block reordering. The results of our experimental evaluation demonstrate the feasibility and practicality of CCR, as on average it incurs a modest file size increase of 11.46% and a negligible runtime overhead of 0.28%, while it is compatible with link-time optimization and control flow integrity.
This Innovate Practice Work in Progress paper is about education on Cybersecurity, which is essential in training of innovative talents in the era of the Internet. Besides knowledge and skills, it is important as well to enhance the students' awareness of cybersecurity in daily life. Considering that contactless smart cards are common and widely used in various areas, one basic and two advanced contactless smart card experiments were designed innovatively and assigned to junior students in 3-people groups in an introductory cybersecurity summer course. The experimental principles, facilities, contents and arrangement are introduced successively. Classroom tests were managed before and after the experiments, and a box and whisker plot is used to describe the distributions of the scores in both tests. The experimental output and student feedback implied the learning objectives were achieved through the problem-based, active and group learning experience during the experiments.
Monitoring for security and well-being in highly populated areas is a critical issue for city administrators, policy makers and urban planners. As an essential part of many dynamic and critical data-driven tasks, situational awareness (SAW) provides decision-makers a deeper insight of the meaning of urban surveillance. Thus, surveillance measures are increasingly needed. However, traditional surveillance platforms are not scalable when more cameras are added to the network. In this work, a smart surveillance as an edge service has been proposed. To accomplish the object detection, identification, and tracking tasks at the edge-fog layers, two novel lightweight algorithms are proposed for detection and tracking respectively. A prototype has been built to validate the feasibility of the idea, and the test results are very encouraging.
The analysis of multiple Android malware families indicates malware instances within a common malware family always have similar call graph structures. Based on the isomorphism of sensitive API call graph, we propose a method which is used to construct malware family features via combining static analysis approach with graph similarity metric. The experiment is performed on a malware dataset which contains 1326 malware samples from 16 different malware families. The result shows that the method can differentiate distinct malware family features and divide suspect malware samples into corresponding families with a high accuracy of 96.77% overall and even defend a certain extent of obfuscation.
Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. The need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.
In order to solve the problem of millimeter wave (mm-wave) antenna impedance mismatch in 5G communication system, a optimization algorithm for Particle Swarm Ant Colony Optimization (PSACO) is proposed to optimize antenna patch parameter. It is proved that the proposed method can effectively achieve impedance matching in 28GHz center frequency, and the return loss characteristic is obviously improved. At the same time, the nonlinear regression model is used to solve the nonlinear relationship between the resonant frequency and the patch parameters. The Elman Neural Network (Elman NN) model is used to verify the reliability of PSACO and nonlinear regression model. Patch parameters optimized by PSACO were introduced into the nonlinear relationship, which obtained error within 2%. The method proposed in this paper improved efficiency in antenna design.
Wearable and mobile medical devices provide efficient, comfortable, and economic health monitoring, having a wide range of applications from daily to clinical scenarios. Health data security becomes a critically important issue. Electrocardiogram (ECG) has proven to be a potential biometric in human recognition over the past decade. Unlike conventional authentication methods using passwords, fingerprints, face, etc., ECG signal can not be simply intercepted, duplicated, and enables continuous identification. However, in many of the studies, algorithms developed are not suitable for practical application, which usually require long ECG data for authentication. In this work, we introduce a two-phase authentication using artificial neural network (NN) models. This algorithm enables fast authentication within only 3 seconds, meanwhile achieves reasonable performance in recognition. We test the proposed method in a controlled laboratory experiment with 50 subjects. Finger ECG signals are collected using a mobile device at different times and physical statues. At the first stage, a ``General'' NN model is constructed based on data from the cohort and used for preliminary screening, while at the second stage ``Personal'' NN models constructed from single individual's data are applied as fine-grained identification. The algorithm is tested on the whole data set, and on different sizes of subsets (5, 10, 20, 30, and 40). Results proved that the proposed method is feasible and reliable for individual authentication, having obtained average False Acceptance Rate (FAR) and False Rejection Rate (FRR) below 10% for the whole data set.
In this paper, inspired by Gatys's recent work, we propose a novel approach that transforms photos to comics using deep convolutional neural networks (CNNs). While Gatys's method that uses a pre-trained VGG network generally works well for transferring artistic styles such as painting from a style image to a content image, for more minimalist styles such as comics, the method often fails to produce satisfactory results. To address this, we further introduce a dedicated comic style CNN, which is trained for classifying comic images and photos. This new network is effective in capturing various comic styles and thus helps to produce better comic stylization results. Even with a grayscale style image, Gatys's method can still produce colored output, which is not desirable for comics. We develop a modified optimization framework such that a grayscale image is guaranteed to be synthesized. To avoid converging to poor local minima, we further initialize the output image using grayscale version of the content image. Various examples show that our method synthesizes better comic images than the state-of-the-art method.