Biblio
In this paper we propose a solution to support iOS developers in creating better applications, to use static analysis to investigate source code and detect secure coding issues while simultaneously pointing out good practices and/or secure APIs they should use.
Reverse engineering is a manually intensive but necessary technique for understanding the inner workings of new malware, finding vulnerabilities in existing systems, and detecting patent infringements in released software. An assembly clone search engine facilitates the work of reverse engineers by identifying those duplicated or known parts. However, it is challenging to design a robust clone search engine, since there exist various compiler optimization options and code obfuscation techniques that make logically similar assembly functions appear to be very different. A practical clone search engine relies on a robust vector representation of assembly code. However, the existing clone search approaches, which rely on a manual feature engineering process to form a feature vector for an assembly function, fail to consider the relationships between features and identify those unique patterns that can statistically distinguish assembly functions. To address this problem, we propose to jointly learn the lexical semantic relationships and the vector representation of assembly functions based on assembly code. We have developed an assembly code representation learning model \textbackslashemphAsm2Vec. It only needs assembly code as input and does not require any prior knowledge such as the correct mapping between assembly functions. It can find and incorporate rich semantic relationships among tokens appearing in assembly code. We conduct extensive experiments and benchmark the learning model with state-of-the-art static and dynamic clone search approaches. We show that the learned representation is more robust and significantly outperforms existing methods against changes introduced by obfuscation and optimizations.
Static vulnerability detection has shown its effectiveness in detecting well-defined low-level memory errors. However, high-level control-flow related (CFR) vulnerabilities, such as insufficient control flow management (CWE-691), business logic errors (CWE-840), and program behavioral problems (CWE-438), which are often caused by a wide variety of bad programming practices, posing a great challenge for existing general static analysis solutions. This paper presents a new deep-learning-based graph embedding approach to accurate detection of CFR vulnerabilities. Our approach makes a new attempt by applying a recent graph convolutional network to embed code fragments in a compact and low-dimensional representation that preserves high-level control-flow information of a vulnerable program. We have conducted our experiments using 8,368 real-world vulnerable programs by comparing our approach with several traditional static vulnerability detectors and state-of-the-art machine-learning-based approaches. The experimental results show the effectiveness of our approach in terms of both accuracy and recall. Our research has shed light on the promising direction of combining program analysis with deep learning techniques to address the general static analysis challenges.
Classifying Hyperspectral images with few training samples is a challenging problem. The generative adversarial networks (GAN) are promising techniques to address the problems. GAN constructs an adversarial game between a discriminator and a generator. The generator generates samples that are not distinguishable by the discriminator, and the discriminator determines whether or not a sample is composed of real data. In this paper, by introducing multilayer features fusion in GAN and a dynamic neighborhood voting mechanism, a novel algorithm for HSIs classification based on 1-D GAN was proposed. Extracting and fusing multiple layers features in discriminator, and using a little labeled samples, we fine-tuned a new sample 1-D CNN spectral classifier for HSIs. In order to improve the accuracy of the classification, we proposed a dynamic neighborhood voting mechanism to classify the HSIs with spatial features. The obtained results show that the proposed models provide competitive results compared to the state-of-the-art methods.
SSL certificates are a core component of the public key infrastructure that underpins encrypted communication in the Internet. In this paper, we report the results of a longitudinal study of the characteristics of SSL certificate chains presented to clients during secure web (HTTPS) connection setup. Our data set consists of 23B SSL certificate chains collected from a global panel consisting of over 2M residential client machines over a period of 6 months. The data informing our analyses provide perspective on the entire chain of trust, including root certificates, across a wide distribution of client machines. We identify over 35M unique certificate chains with diverse relationships at all levels of the PKI hierarchy. We report on the characteristics of valid certificates, which make up 99.7% of the total corpus. We also examine invalid certificate chains, finding that 93% of them contain an untrusted root certificate and we find they have shorter average chain length than their valid counterparts. Finally, we examine two unintended but prevalent behaviors in our data: the deprecation of root certificates and secure traffic interception. Our results support aspects of prior, scan-based studies on certificate characteristics but contradict other findings, highlighting the importance of the residential client-side perspective.
With the unprecedented prevalence of mobile network applications, cryptographic protocols, such as the Secure Socket Layer/Transport Layer Security (SSL/TLS), are widely used in mobile network applications for communication security. The proven methods for encrypted video stream classification or encrypted protocol detection are unsuitable for the SSL/TLS traffic. Consequently, application-level traffic classification based networking and security services are facing severe challenges in effectiveness. Existing encrypted traffic classification methods exhibit unsatisfying accuracy for applications with similar state characteristics. In this paper, we propose a multiple-attribute-based encrypted traffic classification system named Multi-Attribute Associated Fingerprints (MAAF). We develop MAAF based on the two key insights that the DNS traces generated during the application runtime contain classification guidance information and that the handshake certificates in the encrypted flows can provide classification clues. Apart from the exploitation of key insights, MAAF employs the context of the encrypted traffic to overcome the attribute-lacking problem during the classification. Our experimental results demonstrate that MAAF achieves 98.69% accuracy on the real-world traceset that consists of 16 applications, supports the early prediction, and is robust to the scale of the training traceset. Besides, MAAF is superior to the state-of-the-art methods in terms of both accuracy and robustness.
Spam is a genuine and irritating issue for quite a longtime. Despite the fact that a lot of arrangements have been advanced, there still remains a considerable measure to be advanced in separating spam messages all the more proficiently. These days a noteworthy issue in spam separating also as content characterization in common dialect handling is the colossal size of vector space because of the various element terms, which is normally the reason for broad figuring and moderate order. Extracting semantic implications from the substance of writings and utilizing these as highlight terms to develop the vector space, rather than utilizing words as highlight terms in convention ways, could decrease the component of vectors viably and advance the characterization in the meantime. In spite of the fact that there are a wide range of techniques to square spam messages, a large portion of program designers just mean to square spam messages from being conveyed to their customers. In this paper, we present an effective way to deal with keep spam messages from being exchanged.In this work, a Collaborative filtering approach with semantics-based text classification technology was proposed and the related feature terms were selected from the semantic meanings of the text content.
Data-driven verification methods utilize execution data together with models for establishing safety requirements. These are often the only tools available for analyzing complex, nonlinear cyber-physical systems, for which purely model-based analysis is currently infeasible. In this chapter, we outline the key concepts and algorithmic approaches for data-driven verification and discuss the guarantees they provide. We introduce some of the software tools that embody these ideas and present several practical case studies demonstrating their application in safety analysis of autonomous vehicles, advanced driver assist systems (ADAS), satellite control, and engine control systems.
With the spread of wireless application, huge amount of data is generated every day. Thanks to its elasticity, machine learning is becoming a fundamental brick in this field, and many of applications are developed with the use of it and the several techniques that it offers. However, machine learning suffers on different problems and people that use it often are not aware of the possible threats. Often, an adversary tries to exploit these vulnerabilities in order to obtain benefits; because of this, adversarial machine learning is becoming wide studied in the scientific community. In this paper, we show state-of-the-art adversarial techniques and possible countermeasures, with the aim of warning people regarding sensible argument related to the machine learning.
The Internet has gradually penetrated into the national economy, politics, culture, military, education and other fields. Due to its openness, interconnectivity and other characteristics, the Internet is vulnerable to all kinds of malicious attacks. The research uses a honeynet to collect attacker information, and proposes a network penetration recognition technology based on interactive behavior analysis. Using Sebek technology to capture the attacker's keystroke record, time series modeling of the keystroke sequences of the interaction behavior is proposed, using a Recurrent Neural Network. The attack recognition method is constructed by using Long Short-Term Memory that solves the problem of gradient disappearance, gradient explosion and long-term memory shortage in ordinary Recurrent Neural Network. Finally, the experiment verifies that the short-short time memory network has a high accuracy rate for the recognition of penetration attacks.
The current authentication systems based on password and pin code are not enough to guarantee attacks from malicious users. For this reason, in the last years, several studies are proposed with the aim to identify the users basing on their typing dynamics. In this paper, we propose a deep neural network architecture aimed to discriminate between different users using a set of keystroke features. The idea behind the proposed method is to identify the users silently and continuously during their typing on a monitored system. To perform such user identification effectively, we propose a feature model able to capture the typing style that is specific to each given user. The proposed approach is evaluated on a large dataset derived by integrating two real-world datasets from existing studies. The merged dataset contains a total of 1530 different users each writing a set of different typing samples. Several deep neural networks, with an increasing number of hidden layers and two different sets of features, are tested with the aim to find the best configuration. The final best classifier scores a precision equal to 0.997, a recall equal to 0.99 and an accuracy equal to 99% using an MLP deep neural network with 9 hidden layers. Finally, the performances obtained by using the deep learning approach are also compared with the performance of traditional decision-trees machine learning algorithm, attesting the effectiveness of the deep learning-based classifiers in the domain of keystroke analysis.
Keystroke Dynamics is the study of typing patterns and rhythm for personal identification and traits. Keystrokes may be analysed as fixed text such as passwords or as continuous typed text such as documents. This paper reviews different classification metrics for continuous text, such as the A and R metrics, Canberra, Manhattan and Euclidean and introduces a variant of the Minkowski distance. To test the metrics, we adopted a substantial dataset containing 239 thousand records acquired under real, harsh, and unidealised conditions. We propose a new parameter for the Minkowski metric, and we reinforce another for the A metric, as initially stated by its authors.
The potential risk of agricultural product supply chain is huge because of the complex attributes specific to it. Actually the safety incidents of edible agricultural product emerge frequently in recent years, which expose the fragility of the agricultural product supply chain. In this paper the possible risk factors in agricultural product supply chain is analyzed in detail, the agricultural product supply chain risk evaluation index system and evaluation model are established, and an empirical analysis is made using BP neural network method. The results show that the risk ranking of the simulated evaluation is consistent with the target value ranking, and the risk assessment model has a good generalization and extension ability, and the model has a good reference value for preventing agricultural product supply chain risk.
Motivated by the September 11 attacks, we are addressing the problem of policy analysis of supply-chain security. Considering the potential economic and operational impacts of inspection together with the inherent difficulty of assigning a reasonable cost to an inspection failure call for a policy analysis methodology in which stakeholders can understand the trade-offs between the diverse and potentially conflicting objectives. To obtain this information, we used a simulation-based methodology to characterize the set of Pareto optimal solutions with respect to the multiple objectives represented in the decision problem. Our methodology relies on simulation and the response surface method (RSM) to model the relationships between inspection policies and relevant stakeholder objectives in order to construct a set of Pareto optimal solutions. The approach is illustrated with an application to a real-world supply chain.
Biometric techniques can help make vehicles safer to drive, authenticate users, and provide personalized in-car experiences. However, it is unclear to what extent users are willing to trade their personal biometric data for such benefits. In this early work, we conducted an open card sorting study (N=11) to better understand how well users perceive their physical, behavioral and physiological features can personally identify them. Findings showed that on average participants clustered features into six groups, and helped us revise ambiguous cards and better understand users' clustering. These findings provide the basis for a follow up online closed card sorting study to more fully understand perceived identification accuracy of (in-vehicle) biometric sensing. By uncovering this at a larger scale, we can then further study the privacy and user experience trade-off in (automated) vehicles.
In this paper, we focus on versatile and scalable key management for Advanced Metering Infrastructure (AMI) in Smart Grid (SG). We show that a recently proposed key graph based scheme for AMI systems (VerSAMI) suffers from efficiency flaws in its broadcast key management protocol. Then, we propose a new key management scheme (iVerSAMI) by modifying VerSAMI's key graph structure and proposing a new broadcast key update process. We analyze security and performance of the proposed broadcast key management in details to show that iVerSAMI is secure and efficient in terms of storage and communication overheads.
We present a machine-checked proof of security for the domain management protocol of Amazon Web Services' KMS (Key Management Service) a critical security service used throughout AWS and by AWS customers. Domain management is at the core of AWS KMS; it governs the top-level keys that anchor the security of encryption services at AWS. We show that the protocol securely implements an ideal distributed encryption mechanism under standard cryptographic assumptions. The proof is machine-checked in the EasyCrypt proof assistant and is the largest EasyCrypt development to date.