Biblio
Understanding how to group a set of binary files into the piece of software they belong to is highly desirable for software profiling, malware detection, or enterprise audits, among many other applications. Unfortunately, it is also extremely challenging: there is absolutely no uniformity in the ways different applications rely on different files, in how binaries are signed, or in the versioning schemes used across different pieces of software. In this paper, we show that, by combining information gleaned from a large number of endpoints (millions of computers), we can accomplish large-scale application identification automatically and reliably. Our approach relies on collecting metadata on billions of files every day, summarizing it into much smaller "sketches", and performing approximate k-nearest neighbor clustering on non-metric space representations derived from these sketches. We design and implement our proposed system using Apache Spark, show that it can process billions of files in a matter of hours, and thus could be used for daily processing. We further show our system manages to successfully identify which files belong to which application with very high precision, and adequate recall.
The paper presents a fully automatic end-to-end trainable system to colorize grayscale images. Colorization is a highly under-constrained problem. In order to produce realistic outputs, the proposed approach takes advantage of the recent advances in deep learning and generative networks. To achieve plausible colorization, the paper investigates conditional Wasserstein Generative Adversarial Networks (WGAN) [3] as a solution to this problem. Additionally, a loss function consisting of two classification loss components apart from the adversarial loss learned by the WGAN is proposed. The first classification loss provides a measure of how much the predicted colored images differ from ground truth. The second classification loss component makes use of ground truth semantic classification labels in order to learn meaningful intermediate features. Finally, WGAN training procedure pushes the predictions to the manifold of natural images. The system is validated using a user study and a semantic interpretability test and achieves results comparable to [1] on Imagenet dataset [10].
We present AVAMAT: AntiVirus and Malware Analysis Tool - a tool for analysing the malware detection capabilities of AntiVirus (AV) products running on different operating system (OS) platforms. Even though similar tools are available, such as VirusTotal and MetaDefender, they have several limitations, which motivated the creation of our own tool. With AVAMAT we are able to analyse not only whether an AV detects a malware, but also at what stage of inspection does it detect it and on what OS. AVAMAT enables experimental campaigns to answer various research questions, ranging from the detection capabilities of AVs on OSs, to optimal ways in which AVs could be combined to improve malware detection capabilities.
Algorithms for oblivious random access machine (ORAM) simulation allow a client, Alice, to obfuscate a pattern of data accesses with a server, Bob, who is maintaining Alice's outsourced data while trying to learn information about her data. We present a novel ORAM scheme that improves the asymptotic I/O overhead of previous schemes for a wide range of size parameters for clientside private memory and message blocks, from logarithmic to polynomial. Our method achieves statistical security for hiding Alice's access pattern and, with high probability, achieves an I/O overhead that ranges from O(1) to O(log2 n/(log logn)2), depending on these size parameters, where n is the size of Alice's outsourced memory. Our scheme, which we call BIOS ORAM, combines multiple uses of B-trees with a reduction of ORAM simulation to isogrammic access sequences.
Servers in a network are typically assigned a static identity. Static assignment of identities is a cornerstone for adversaries in finding targets. Moving Target Defense (MTD) mutates the environment to increase unpredictability for an attacker. On another side, Software Defined Networks (SDN) facilitate a global view of a network through a central control point. The potential of SDN can not only make network management flexible and convenient, but it can also assist MTD to enhance attack surface obfuscation. In this paper, we propose an effective framework for the prevention, detection, and mitigation of flooding-based Denial of Service (DoS) attacks. Our framework includes a light-weight SDN assisted MTD strategy for network reconnaissance protection and an efficient approach for tackling DoS attacks using Software Defined-Internet Exchange Point (SD-IXP). To assess the effectiveness of the MTD strategy and DoS mitigation scheme, we set two different experiments. Our results confirm the effectiveness of our framework. With the MTD strategy in place, at maximum, barely 16% reconnaissance attempts were successful while the DoS attacks were accurately detected with false alarm rate as low as 7.1%.
The Bonseyes EU H2020 collaborative project aims to develop a platform consisting of a Data Marketplace, a Deep Learning Toolbox, and Developer Reference Platforms for organizations wanting to adopt Artificial Intelligence. The project will be focused on using artificial intelligence in low power Internet of Things (IoT) devices ("edge computing"), embedded computing systems, and data center servers ("cloud computing"). It will bring about orders of magnitude improvements in efficiency, performance, reliability, security, and productivity in the design and programming of systems of artificial intelligence that incorporate Smart Cyber-Physical Systems (CPS). In addition, it will solve a causality problem for organizations who lack access to Data and Models. Its open software architecture will facilitate adoption of the whole concept on a wider scale. To evaluate the effectiveness, technical feasibility, and to quantify the real-world improvements in efficiency, security, performance, effort and cost of adding AI to products and services using the Bonseyes platform, four complementary demonstrators will be built. Bonseyes platform capabilities are aimed at being aligned with the European FI-PPP activities and take advantage of its flagship project FIWARE. This paper provides a description of the project motivation, goals and preliminary work.
HB+ is a lightweight authentication scheme, which is secure against passive attacks if the Learning Parity with Noise Problem (LPN) is hard. However, HB+ is vulnerable to a key-recovery, man-in-the-middle (MiM) attack dubbed GRS. The HB+DB protocol added a distance-bounding dimension to HB+, and was experimentally proven to resist the GRS attack. We exhibit several security flaws in HB+DB. First, we refine the GRS strategy to induce a different key-recovery MiM attack, not deterred by HB+DB's distancebounding. Second, we prove HB+DB impractical as a secure distance-bounding (DB) protocol, as its DB security-levels scale poorly compared to other DB protocols. Third, we refute that HB+DB's security against passive attackers relies on the hardness of LPN; moreover, (erroneously) requiring such hardness lowers HB+DB's efficiency and security. We also propose anew distance-bounding protocol called BLOG. It retains parts of HB+DB, yet BLOG is provably secure and enjoys better (asymptotical) security.
Cross-VM attacks have emerged as a major threat on commercial clouds. These attacks commonly exploit hardware level leakages on shared physical servers. A co-located machine can readily feel the presence of a co-located instance with a heavy computational load through performance degradation due to contention on shared resources. Shared cache architectures such as the last level cache (LLC) have become a popular leakage source to mount cross-VM attack. By exploiting LLC leakages, researchers have already shown that it is possible to recover fine grain information such as cryptographic keys from popular software libraries. This makes it essential to verify implementations that handle sensitive data across the many versions and numerous target platforms, a task too complicated, error prone and costly to be handled by human beings. Here we propose a machine learning based technique to classify applications according to their cache access profiles. We show that with minimal and simple manual processing steps feature vectors can be used to train models using support vector machines to classify the applications with a high degree of success. The profiling and training steps are completely automated and do not require any inspection or study of the code to be classified. In native execution, we achieve a successful classification rate as high as 98% (L1 cache) and 78$\backslash$% (LLC) over 40 benchmark applications in the Phoronix suite with mild training. In the cross-VM setting on the noisy Amazon EC2 the success rate drops to 60$\backslash$% for a suite of 25 applications. With this initial study we demonstrate that it is possible to train meaningful models to successfully predict applications running in co-located instances.
An operating system kernel written in the Rust language would have extremely fine-grained isolation boundaries, have no memory leaks, and be safe from a wide range of security threats and memory bugs. Previous efforts towards this end concluded that writing a kernel requires changing Rust. This paper reaches a different conclusion, that no changes to Rust are needed and a kernel can be implemented with a very small amount of unsafe code. It describes how three sample kernel mechanisms–-DMA, USB, and buffer caches–-can be built using these abstractions.
Trusted routing is a hot spot in network security. Lots of efforts have been made on trusted routing validation for Interior Gateway Protocols (IGP), e.g., using Public Key Infrastructure (PKI) to enhance the security of protocols, or routing monitoring systems. However, the former is limited by further deployment in the practical Internet, the latter depends on a complete, accurate, and fresh knowledge base-this is still a big challenge (Internet Service Providers (ISPs) are not willing to leak their routing policies). In this paper, inspired by the idea of centrally controlling in Software Defined Network (SDN), we propose a CENtrally Trusted Routing vAlidation framework, named CENTRA, which can automated collect routing information, centrally detect anomaly and deliver secure routing policy. We implement the proposed framework using NETCONF as the communication protocol and YANG as the data model. The experimental results reveal that CENTRA can detect and block anomalous routing in real time. Comparing to existing secure routing mechanism, CENTRA improves the detection efficiency and real-time significantly.
The MgO-based magnetic tunnel junction (MTJ) is the basis of modern hard disk drives' magnetic read sensors. Within its operating bandwidth, the sensor's performance is significantly affected by nonlinear and oscillating behavior arising from the MTJ's magnetization dynamics at microwave frequencies. Static I-V curve measurements are commonly used to characterize sensor's nonlinear effects. Unfortunately, these do not sufficiently capture the MTJ's magnetization dynamics. In this paper, we demonstrate the use of the two-tone measurement technique for full treatment of the sensor's nonlinear effects in conjunction with dynamic ones. This approach is new in the field of magnetism and magnetic materials, and it has its challenges due to the nature of the device. Nevertheless, the experimental results demonstrate how the two-tone measurement technique can be used to characterize magnetic sensor nonlinear properties.
Data loss is perceived as one of the major threats for cloud storage. Consequently, the security community developed several challenge-response protocols that allow a user to remotely verify whether an outsourced file is still intact. However, two important practical problems have not yet been considered. First, clients commonly outsource multiple files of different sizes, raising the question how to formalize such a scheme and in particular ensuring that all files can be simultaneously audited. Second, in case auditing of the files fails, existing schemes do not provide a client with any method to prove if the original files are still recoverable. We address both problems and describe appropriate solutions. The first problem is tackled by providing a new type of "Proofs of Retrievability" scheme, enabling a client to check all files simultaneously in a compact way. The second problem is solved by defining a novel procedure called "Proofs of Recoverability", enabling a client to obtain an assurance whether a file is recoverable or irreparably damaged. Finally, we present a combination of both schemes allowing the client to check the recoverability of all her original files, thus ensuring cloud storage file recoverability.
Organizations face the issue of how to best allocate their security resources. Thus, they need an accurate method for assessing how many new vulnerabilities will be reported for the operating systems (OSs) they use in a given time period. Our approach consists of clustering vulnerabilities by leveraging the text information within vulnerability records, and then simulating the mean value function of vulnerabilities by relaxing the monotonic intensity function assumption, which is prevalent among the studies that use software reliability models (SRMs) and nonhomogeneous Poisson process (NHPP) in modeling. We applied our approach to the vulnerabilities of four OSs: Windows, Mac, IOS, and Linux. For the OSs analyzed in terms of curve fitting and prediction capability, our results, compared to a power-law model without clustering issued from a family of SRMs, are more accurate in all cases we analyzed.
In video surveillance, face recognition (FR) systems seek to detect individuals of interest appearing over a distributed network of cameras. Still-to-video FR systems match faces captured in videos under challenging conditions against facial models, often designed using one reference still per individual. Although CNNs can achieve among the highest levels of accuracy in many real-world FR applications, state-of-the-art CNNs that are suitable for still-to-video FR, like trunk-branch ensemble (TBE) CNNs, represent complex solutions for real-time applications. In this paper, an efficient CNN architecture is proposed for accurate still-to-video FR from a single reference still. The CCM-CNN is based on new cross-correlation matching (CCM) and triplet-loss optimization methods that provide discriminant face representations. The matching pipeline exploits a matrix Hadamard product followed by a fully connected layer inspired by adaptive weighted cross-correlation. A triplet-based training approach is proposed to optimize the CCM-CNN parameters such that the inter-class variations are increased, while enhancing robustness to intra-class variations. To further improve robustness, the network is fine-tuned using synthetically-generated faces based on still and videos of non-target individuals. Experiments on videos from the COX Face and Chokepoint datasets indicate that the CCM-CNN can achieve a high level of accuracy that is comparable to TBE-CNN and HaarNet, but with a significantly lower time and memory complexity. It may therefore represent the better trade-off between accuracy and complexity for real-time video surveillance applications.
We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.
Detecting faces and heads appearing in video feeds are challenging tasks in real-world video surveillance applications due to variations in appearance, occlusions and complex backgrounds. Recently, several CNN architectures have been proposed to increase the accuracy of detectors, although their computational complexity can be an issue, especially for realtime applications, where faces and heads must be detected live using high-resolution cameras. This paper compares the accuracy and complexity of state-of-the-art CNN architectures that are suitable for face and head detection. Single pass and region-based architectures are reviewed and compared empirically to baseline techniques according to accuracy and to time and memory complexity on images from several challenging datasets. The viability of these architectures is analyzed with real-time video surveillance applications in mind. Results suggest that, although CNN architectures can achieve a very high level of accuracy compared to traditional detectors, their computational cost can represent a limitation for many practical real-time applications.
This paper suggests a conceptual mechanism for increasing the security level of the global information community, national information technology infrastructures (e-governments) and private cloud structures, which uses the logical characteristic of IPv6-protocol. The mechanism is based on the properties of the IPv6-header and, in particular, rules of coding IPv6-addresses.
Traditionally, the focus of security and ensuring confidentiality, integrity, and availability of data in spacecraft systems has been on the ground segment and the uplink/downlink components. Although these are the most obvious attack vectors, potential security risks against the satellite's platform is also a serious concern. This paper discusses a notional satellite architecture and explores security vulnerabilities using a systems-level approach. Viewing attacks through this paradigm highlights several potential attack vectors that conventional satellite security approaches fail to consider. If left undetected, these could yield physical effects limiting the satellite's mission or performance. The approach presented aids in risk analysis and gives insight into architectural design considerations which improve the system's overall resiliency.