Biblio
Support vector machines (SVMs) have been widely used for classification in machine learning and data mining. However, SVM faces a huge challenge in large scale classification tasks. Recent progresses have enabled additive kernel version of SVM efficiently solves such large scale problems nearly as fast as a linear classifier. This paper proposes a new accelerated mini-batch stochastic gradient descent algorithm for SVM classification with additive kernel (AK-ASGD). On the one hand, the gradient is approximated by the sum of a scalar polynomial function for each feature dimension; on the other hand, Nesterov's acceleration strategy is used. The experimental results on benchmark large scale classification data sets show that our proposed algorithm can achieve higher testing accuracies and has faster convergence rate.
In machine learning, feature engineering has been a pivotal stage in building a high-quality predictor. Particularly, this work explores the multiple Kernel Discriminant Component Analysis (mKDCA) feature-map and its variants. However, seeking the right subset of kernels for mKDCA feature-map can be challenging. Therefore, we consider the problem of kernel selection, and propose an algorithm based on Differential Mutual Information (DMI) and incremental forward search. DMI serves as an effective metric for selecting kernels, as is theoretically supported by mutual information and Fisher's discriminant analysis. On the other hand, incremental forward search plays a role in removing redundancy among kernels. Finally, we illustrate the potential of the method via an application in privacy-aware classification, and show on three mobile-sensing datasets that selecting an effective set of kernels for mKDCA feature-maps can enhance the utility classification performance, while successfully preserve the data privacy. Specifically, the results show that the proposed DMI forward search method can perform better than the state-of-the-art, and, with much smaller computational cost, can perform as well as the optimal, yet computationally expensive, exhaustive search.
Virtualization based memory isolation has been widely used as a security primitive in many security systems. This paper firstly provides an in-depth analysis of its effectiveness in the multicore setting, a first in the literature. Our study reveals that memory isolation by itself is inadequate for security. Due to the fundamental design choices in hardware, it faces several challenging issues including page table maintenance, address mapping validation and thread identification. As demonstrated by our attacks implemented on XMHF and BitVisor, these issues undermine the security of memory isolation. Next, we propose a new isolation approach that is immune to the aforementioned problems. In our design, the hypervisor constructs a fully isolated micro computing environment (FIMCE) that exposes a minimal attack surface to an untrusted OS on a multicore platform. By virtue of its architectural niche, FIMCE offers stronger assurance and greater versatility than memory isolation. We have built a prototype of FIMCE and measured its performance. To show the benefits of using FIMCE as a building block, we have also implemented several practical applications which cannot be securely realized by using memory isolation alone.
Applications for data analysis of biomedical data are complex programs and often consist of multiple components. Re-usage of existing solutions from external code repositories or program libraries is common in algorithm development. To ease reproducibility as well as transfer of algorithms and required components into distributed infrastructures Linux containers are increasingly used in those environments, that are at least partly connected to the internet. However concerns about the untrusted application remain and are of high interest when medical data is processed. Additionally, the portability of the containers needs to be ensured by using only security technologies, that do not require additional kernel modules. In this paper we describe measures and a solution to secure the execution of an example biomedical application for normalization of multidimensional biosignal recordings. This application, the required runtime environment and the security mechanisms are installed in a Docker-based container. A fine-grained restricted environment (sandbox) for the execution of the application and the prevention of unwanted behaviour is created inside the container. The sandbox is based on the filtering of system calls, as they are required to interact with the operating system to access potentially restricted resources e.g. the filesystem or network. Due to the low-level character of system calls, the creation of an adequate rule set for the sandbox is challenging. Therefore the presented solution includes a monitoring component to collect required data for defining the rules for the application sandbox. Performance evaluation of the application execution shows no significant impact of the resulting sandbox, while detailed monitoring may increase runtime up to over 420%.
Summary form only given. Strong light-matter coupling has been recently successfully explored in the GHz and THz [1] range with on-chip platforms. New and intriguing quantum optical phenomena have been predicted in the ultrastrong coupling regime [2], when the coupling strength Ω becomes comparable to the unperturbed frequency of the system ω. We recently proposed a new experimental platform where we couple the inter-Landau level transition of an high-mobility 2DEG to the highly subwavelength photonic mode of an LC meta-atom [3] showing very large Ω/ωc = 0.87. Our system benefits from the collective enhancement of the light-matter coupling which comes from the scaling of the coupling Ω ∝ √n, were n is the number of optically active electrons. In our previous experiments [3] and in literature [4] this number varies from 104-103 electrons per meta-atom. We now engineer a new cavity, resonant at 290 GHz, with an extremely reduced effective mode surface Seff = 4 × 10-14 m2 (FE simulations, CST), yielding large field enhancements above 1500 and allowing to enter the few (\textbackslashtextless;100) electron regime. It consist of a complementary metasurface with two very sharp metallic tips separated by a 60 nm gap (Fig.1(a, b)) on top of a single triangular quantum well. THz-TDS transmission experiments as a function of the applied magnetic field reveal strong anticrossing of the cavity mode with linear cyclotron dispersion. Measurements for arrays of only 12 cavities are reported in Fig.1(c). On the top horizontal axis we report the number of electrons occupying the topmost Landau level as a function of the magnetic field. At the anticrossing field of B=0.73 T we measure approximately 60 electrons ultra strongly coupled (Ω/ω- \textbackslashtextbar\textbackslashtextbar
Network traffic identification has been a hot topic in network security area. The identification of abnormal traffic can detect attack traffic and helps network manager enforce corresponding security policies to prevent attacks. Support Vector Machines (SVMs) are one of the most promising supervised machine learning (ML) algorithms that can be applied to the identification of traffic in IP networks as well as detection of abnormal traffic. SVM shows better performance because it can avoid local optimization problems existed in many supervised learning algorithms. However, as a binary classification approach, SVM needs more research in multiclass classification. In this paper, we proposed an abnormal traffic identification system(ATIS) that can classify and identify multiple attack traffic applications. Each component of ATIS is introduced in detail and experiments are carried out based on ATIS. Through the test of KDD CUP dataset, SVM shows good performance. Furthermore, the comparison of experiments reveals that scaling and parameters has a vital impact on SVM training results.
Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. The need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.
Today the technology advancement in communication technology permits a malware author to introduce code obfuscation technique, for example, Application Programming Interface (API) hook, to make detecting the footprints of their code more difficult. A signature-based model such as Antivirus software is not effective against such attacks. In this paper, an API graph-based model is proposed with the objective of detecting hook attacks during malicious code execution. The proposed model incorporates techniques such as graph-generation, graph partition and graph comparison to distinguish a legitimate system call from malicious system call. The simulation results confirm that the proposed model outperforms than existing approaches.
Malicious applications have become increasingly numerous. This demands adaptive, learning-based techniques for constructing malware detection engines, instead of the traditional manual-based strategies. Prior work in learning-based malware detection engines primarily focuses on dynamic trace analysis and byte-level n-grams. Our approach in this paper differs in that we use compiler intermediate representations, i.e., the callgraph representation of binaries. Using graph-based program representations for learning provides structure of the program, which can be used to learn more advanced patterns. We use the Shortest Path Graph Kernel (SPGK) to identify similarities between call graphs extracted from binaries. The output similarity matrix is fed into a Support Vector Machine (SVM) algorithm to construct highly-accurate models to predict whether a binary is malicious or not. However, SPGK is computationally expensive due to the size of the input graphs. Therefore, we evaluate different parallelization methods for CPUs and GPUs to speed up this kernel, allowing us to continuously construct up-to-date models in a timely manner. Our hybrid implementation, which leverages both CPU and GPU, yields the best performance, achieving up to a 14.2x improvement over our already optimized OpenMP version. We compared our generated graph-based models to previously state-of-the-art feature vector 2-gram and 3-gram models on a dataset consisting of over 22,000 binaries. We show that our classification accuracy using graphs is over 19% higher than either n-gram model and gives a false positive rate (FPR) of less than 0.1%. We are also able to consider large call graphs and dataset sizes because of the reduced execution time of our parallelized SPGK implementation.
In this paper, we propose a new regularization scheme for the well-known Support Vector Machine (SVM) classifier that operates on the training sample level. The proposed approach is motivated by the fact that Maximum Margin-based classification defines decision functions as a linear combination of the selected training data and, thus, the variations on training sample selection directly affect generalization performance. We show that the exploitation of the proposed regularization scheme is well motivated and intuitive. Experimental results show that the proposed regularization scheme outperforms standard SVM in human action recognition tasks as well as classical recognition problems.
Most security software tools try to detect malicious components by cryptographic hashes, signatures or based on their behavior. The former, is a widely adopted approach based on Integrity Measurement Architecture (IMA) enabling appraisal and attestation of system components. The latter, however, may induce a very long time until misbehavior of a component leads to a successful detection. Another approach is a Dynamic Runtime Attestation (DRA) based on the comparison of binary code loaded in the memory and well-known references. Since DRA is a complex approach, involving multiple related components and often complex attestation strategies, a flexible and extensible architecture is needed. In a cooperation project an architecture was designed and a Proof of Concept (PoC) successfully developed and evaluated. To achieve needed flexibility and extensibility, the implementation facilitates central components providing attestation strategies (guidelines). These guidelines define and implement the necessary steps for all relevant attestation operations, i.e. measurement, reference generation and verification.
The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage where it becomes fundamental to understand how their inference capabilities can be impaired if data elements somehow become corrupted in memory. This paper introduces fault-injection in these systems by simulating failing bit-cells in hardware memories brought on by relaxing the 100% reliable operation assumption. We analyze the behavior of these networks calculating inference under severe fault-injection rates and apply fault mitigation strategies to improve on the CNNs resilience. For the MNIST dataset, we show that 8x less memory is required for the feature maps memory space, and that in sub-100% reliable operation, fault-injection rates up to 10-1 (with most significant bit protection) can withstand only a 1% error probability degradation. Furthermore, considering the offload of the feature maps memory to an embedded dynamic RAM (eDRAM) system, using technology nodes from 65 down to 28 nm, up to 73 80% improved power efficiency can be obtained.
Race conditions are difficult to detect because they usually occur only under specific execution interleavings. Numerous program analysis and testing techniques have been proposed to detect race conditions between threads on single applications. However, most of these techniques neglect races that occur at the process level due to complex system event interactions. This article presents a framework, SIMEXPLORER, that allows engineers to effectively test for process-level race conditions. SIMEXPLORER first uses dynamic analysis techniques to observe system execution, identify program locations of interest, and report faults related to oracles. Next, it uses virtualization to achieve the fine-grained controllability needed to exercise event interleavings that are likely to expose races. We evaluated the effectiveness of SIMEXPLORER on 24 real-world applications containing both known and unknown process-level race conditions. Our results show that SIMEXPLORER is effective at detecting these race conditions, while incurring an overhead that is acceptable given its effectiveness improvements.
The growing computerization of critical infrastructure as well as the pervasiveness of computing in everyday life has led to increased interest in secure application development. We observe a flurry of new security technologies like ARM TrustZone and Intel SGX, but a lack of a corresponding architectural vision. We are convinced that point solutions are not sufficient to address the overall challenge of secure system design. In this paper, we outline our take on a trusted component ecosystem of small individual building blocks with strong isolation. In our view, applications should no longer be designed as massive stacks of vertically layered frameworks, but instead as horizontal aggregates of mutually isolated components that collaborate across machine boundaries to provide a service. Lateral thinking is needed to make secure systems going forward.
Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.
The chips in working state have electromagnetic energy leakage problem. We offer a method to analyze the problem of electromagnetic leakage when the chip is running. We execute a sequence of addition and subtraction arithmetic instructions on FPGA chip, then we use the near-field probe to capture the chip leakage of electromagnetic signals. The electromagnetic signal is collected for analysis and processing, the parts of addition and subtraction are classified and identified by SVM. In this paper, for the problem of electromagnetic leakage, six sets of data were collected for analysis and processing. Good results were obtained by using this method.
The new criterion for selecting the frequencies of the test polyharmonic signals is developed. It allows uniquely filtering the values of multidimensional transfer functions - Fourier-images of Volterra kernel from the partial component of the response of a nonlinear system. It is shown that this criterion significantly weakens the known limitations on the choice of frequencies and, as a result, reduces the number of interpolations during the restoration of the transfer function, and, the more significant, the higher the order of estimated transfer function.
Explicit non-linear transformations of existing steganalysis features are shown to boost their ability to detect steganography in combination with existing simple classifiers, such as the FLD-ensemble. The non-linear transformations are learned from a small number of cover features using Nyström approximation on pilot vectors obtained with kernelized PCA. The best performance is achieved with the exponential form of the Hellinger kernel, which improves the detection accuracy by up to 2-3% for spatial-domain contentadaptive steganography. Since the non-linear map depends only on the cover source and its learning has a low computational complexity, the proposed approach is a practical and low cost method for boosting the accuracy of existing detectors built as binary classifiers. The map can also be used to significantly reduce the feature dimensionality (by up to factor of ten) without performance loss with respect to the non-transformed features.
Building secure systems used to mean ensuring a secure perimeter, but that is no longer the case. Today's systems are ill-equipped to deal with attackers that are able to pierce perimeter defenses. Data provenance is a critical technology in building resilient systems that will allow systems to recover from attackers that manage to overcome the "hard-shell" defenses. In this paper, we provide background information on data provenance, details on provenance collection, analysis, and storage techniques and challenges. Data provenance is situated to address the challenging problem of allowing a system to "fight-through" an attack, and we help to identify necessary work to ensure that future systems are resilient.
In presence of known and unknown vulnerabilities in code and flow control of programs, virtual machine alike isolation and sandboxing to confine maliciousness of process, by monitoring and controlling the behaviour of untrusted application, is an effective strategy. A confined malicious application cannot effect system resources and other applications running on same operating system. But present techniques used for sandboxing have some drawbacks ranging from scope to methodology. Some of proposed techniques restrict specific aspect of execution e.g. system calls and file system access. In the same way techniques that truly isolate the application by providing separate execution environment either require modification in kernel or full blown operating system. Moreover these do not provide isolation from top to bottom but only virtualize operating system services. In this paper, we propose a design to confine native Linux process in virtual machine equivalent isolation by using hardware virtualization extensions with nominal initialization and acceptable execution overheads. We implemented our prototype called Process Virtual Machine that transition a native process into virtual machine, provides minimal possible execution environment, intercept and virtualize system calls to execute it on host kernel. Experimental results show effectiveness of our proposed technique.
Cloud computing is revolutionizing many IT ecosystems through offering scalable computing resources that are easy to configure, use and inter-connect. However, this model has always been viewed with some suspicion as it raises a wide range of security and privacy issues that need to be negotiated. This research focuses on the construction of a trust layer in cloud computing to build a trust relationship between cloud service providers and cloud users. In particular, we address the rise of container-based virtualisation has a weak isolation compared to traditional VMs because of the shared use of the OS kernel and system components. Therefore, we will build a trust layer to solve the issues of weaker isolation whilst maintaining the performance and scalability of the approach. This paper has two objectives. Firstly, we propose a security system to protect containers from other guests through the addition of a Role-based Access Control (RBAC) model and the provision of strict data protection and security. Secondly, we provide a stress test using isolation benchmarking tools to evaluate the isolation in containers in term of performance.
Multi- and many-core systems are increasingly prevalent in embedded systems. Additionally, isolation requirements between different partitions and criticalities are gaining in importance. This difficult combination is not well addressed by current software systems. Parallel systems require consistency guarantees on shared data-structures often provided by locks that use predictable resource sharing protocols. However, as the number of cores increase, even a single shared cache-line (e.g. for the lock) can cause significant interference. In this paper, we present a clean-slate design of the SPeCK kernel, the next generation of our COMPOSITE OS, that attempts to provide a strong version of scalable predictability - where predictability bounds made on a single core, remain constant with an increase in cores. Results show that, despite using a non-preemptive kernel, it has strong scalable predictability, low average-case overheads, and demonstrates better response-times than a state-of-the-art preemptive system.
Due to a rapid revaluation in a virtualization environment, Virtual Machines (VMs) are target point for an attacker to gain privileged access of the virtual infrastructure. The Advanced Persistent Threats (APTs) such as malware, rootkit, spyware, etc. are more potent to bypass the existing defense mechanisms designed for VM. To address this issue, Virtual Machine Introspection (VMI) emerged as a promising approach that monitors run state of the VM externally from hypervisor. However, limitation of VMI lies with semantic gap. An open source tool called LibVMI address the semantic gap. Memory Forensic Analysis (MFA) tool such as Volatility can also be used to address the semantic gap. But, it needs to capture a memory dump (RAM) as input. Memory dump acquires time and its analysis time is highly crucial if Intrusion Detection System IDS (IDS) depends on the data supplied by FAM or VMI tool. In this work, live virtual machine RAM dump acquire time of LibVMI is measured. In addition, captured memory dump analysis time consumed by Volatility is measured and compared with other memory analyzer such as Rekall. It is observed through experimental results that, Rekall takes more execution time as compared to Volatility for most of the plugins. Further, Volatility and Rekall are compared with LibVMI. It is noticed that examining the volatile data through LibVMI is faster as it eliminates memory dump acquire time.