Biblio
The Business Intelligence (BI) paradigm is challenged by emerging use cases such as news and social media analytics in which the source data are unstructured, the analysis metrics are unspecified, and the appropriate visual representations are unsupported by mainstream tools. This case study documents the work undertaken in Microsoft Research to enable these use cases in the Microsoft Power BI product. Our approach comprises: (a) back-end pipelines that use AI to infer navigable data structures from streams of unstructured text, media and metadata; and (b) front-end representations of these structures grounded in the Visual Analytics literature. Through our creation of multiple end-to-end data applications, we learned that representing the varying quality of inferred data structures was crucial for making the use and limitations of AI transparent to users. We conclude with reflections on BI in the age of AI, big data, and democratized access to data analytics.
Approximate computing is an emerging paradigm where design accuracy can be traded off for benefits in design metrics such as design area, power consumption or circuit complexity. In this work, we present a novel paradigm to synthesize approximate circuits using Boolean matrix factorization (BMF). In our methodology the truth table of a sub-circuit of the design is approximated using BMF to a controllable approximation degree, and the results of the factorization are used to synthesize a less complex subcircuit. To scale our technique to large circuits, we devise a circuit decomposition method and a subcircuit design-space exploration technique to identify the best order for subcircuit approximations. Our method leads to a smooth trade-off between accuracy and full circuit complexity as measured by design area and power consumption. Using an industrial strength design flow, we extensively evaluate our methodology on a number of testcases, where we demonstrate that the proposed methodology can achieve up to 63% in power savings, while introducing an average relative error of 5%. We also compare our work to previous works in Boolean circuit synthesis and demonstrate significant improvements in design metrics for same accuracy targets.
Recently, conversation agents have attracted the attention of many companies such as IBM, Facebook, Google, and Amazon which have focused on developing tools or API (Application Programming Interfaces) for developers to create their own chat-bots. In this paper, we focus on new approaches to evaluate such systems presenting some recommendations resulted from evaluating a real chatbot use case. Testing conversational agents or chatbots is not a trivial task due to the multitude aspects/tasks (e.g., natural language understanding, dialog management and, response generation) which must be considered separately and as a mixture. Also, the creation of a general testing tool is a challenge since evaluation is very sensitive to the application context. Finally, exhaustive testing can be a tedious task for the project team what creates a need for a tool to perform it automatically. This paper opens a discussion about how conversational systems testing tools are essential to ensure well-functioning of such systems as well as to help interface designers guiding them to develop consistent conversational interfaces.
Demand for end-to-end secure messaging has been growing rapidly and companies have responded by releasing applications that implement end-to-end secure messaging protocols. Signal and protocols based on Signal dominate the secure messaging applications. In this work we analyze conversational security properties provided by the Signal Android application against a variety of real world adversaries. We identify vulnerabilities that allow the Signal server to learn the contents of attachments, undetectably re-order and drop messages, and add and drop participants from group conversations. We then perform proof-of-concept attacks against the application to demonstrate the practicality of these vulnerabilities, and suggest mitigations that can detect our attacks. The main conclusion of our work is that we need to consider more than confidentiality and integrity of messages when designing future protocols. We also stress that protocols must protect against compromised servers and at a minimum implement a trust but verify model.
In order to improve the limitation of single-mode biometric identification technology, a bimodal biometric verification system based on deep learning is proposed in this paper. A modified CNN architecture is used to generate better facial feature for bimodal fusion. The obtained facial feature and acoustic feature extracted by the acoustic feature extraction model are fused together to form the fusion feature on feature layer level. The fusion feature obtained by this method are used to train a neural network of identifying the target person who have these corresponding features. Experimental results demonstrate the superiority and high performance of our bimodal biometric in comparison with single-mode biometrics for identity authentication, which are tested on a bimodal database consists of data coherent from TED-LIUM and CASIA-WebFace. Compared with using facial feature or acoustic feature alone, the classification accuracy of fusion feature obtained by our method is increased obviously.
A blockchain powered Health information ecosystem can solve a frequently discussed problem of the lifelong recorded patient health data, which seriously could hurdle the privacy of the patients and the growing data hunger of the research and policy maker institutions. On one side the general availability of the data is vital in emergency situations and supports heavily the different research, population health management and development activities, on the other side using the same data can lead to serious social and ethical problems caused by malicious actors. Currently, the regulation of the privacy data varies all over the world, however underlying principles are always defensive and protective towards patient privacy against general availability. The protective principles cause a defensive, data hiding attitude of the health system developers to avoid breaching the overall law regulations. It makes the policy makers and different - primarily drug - developers to find ways to treat data such a way that lead to ethical and political debates. In our paper we introduce how the blockchain technology can help solving the problem of secure data storing and ensuring data availability at the same time. We use the basic principles of the American HIPAA regulation, which defines the public availability criteria of health data, however the different local regulations may differ significantly. Blockchain's decentralized, intermediary-free, cryptographically secured attributes offer a new way of storing patient data securely and at the same time publicly available in a regulated way, where a well-designed distributed peer-to-peer network incentivize the smooth operation of a full-featured EHR system.
Cyber security management of systems in the cyberspace has been a challenging problem for both practitioners and the research community. Their proprietary nature along with the complexity renders traditional approaches rather insufficient and creating the need for the adoption of a holistic point of view. This paper draws upon the principles theory game in order to present a novel systemic approach towards cyber security management, taking into account the complex inter-dependencies and providing cost-efficient defense solutions.
HB+ is a lightweight authentication scheme, which is secure against passive attacks if the Learning Parity with Noise Problem (LPN) is hard. However, HB+ is vulnerable to a key-recovery, man-in-the-middle (MiM) attack dubbed GRS. The HB+DB protocol added a distance-bounding dimension to HB+, and was experimentally proven to resist the GRS attack. We exhibit several security flaws in HB+DB. First, we refine the GRS strategy to induce a different key-recovery MiM attack, not deterred by HB+DB's distancebounding. Second, we prove HB+DB impractical as a secure distance-bounding (DB) protocol, as its DB security-levels scale poorly compared to other DB protocols. Third, we refute that HB+DB's security against passive attackers relies on the hardness of LPN; moreover, (erroneously) requiring such hardness lowers HB+DB's efficiency and security. We also propose anew distance-bounding protocol called BLOG. It retains parts of HB+DB, yet BLOG is provably secure and enjoys better (asymptotical) security.
The goal of network intrusion detection is to inspect network traffic in order to identify threats and known attack patterns. One of its key features is Deep Packet Inspection (DPI), that extracts the content of network packets and compares it against a set of detection signatures. While DPI is commonly used to protect networks and information systems, it requires direct access to the traffic content, which makes it blinded against encrypted network protocols such as HTTPS. So far, a difficult choice was to be made between the privacy of network users and security through the inspection of their traffic content to detect attacks or malicious activities. This paper presents a novel approach that bridges the gap between network security and privacy. It makes possible to perform DPI directly on encrypted traffic, without knowing neither the traffic content, nor the patterns of detection signatures. The relevance of our work is that it preserves the delicate balance in the security market ecosystem. Indeed, security editors will be able to protect their distinctive detection signatures and supply service providers only with encrypted attack patterns. In addition, service providers will be able to integrate the encrypted signatures in their architectures and perform DPI without compromising the privacy of network communications. Finally, users will be able to preserve their privacy through traffic encryption, while also benefiting from network security services. The extensive experiments conducted in this paper prove that, compared to existing encryption schemes, our solution reduces by 3 orders of magnitude the connection setup time for new users, and by 6 orders of magnitude the consumed memory space on the DPI appliance.
The rise of big data age in the Internet has led to the explosive growth of data size. However, trust issue has become the biggest problem of big data, leading to the difficulty in data safe circulation and industry development. The blockchain technology provides a new solution to this problem by combining non-tampering, traceable features with smart contracts that automatically execute default instructions. In this paper, we present a credible big data sharing model based on blockchain technology and smart contract to ensure the safe circulation of data resources.
We present an approach to tracking the behaviour of an attacker on a decoy system, where the decoy communicates with the real system only through low energy bluetooth. The result is a low-cost solution that does not interrupt the live system, while limiting potential damage. The attacker has no way to detect that they are being monitored, while their actions are being logged for further investigation. The system has been physically implemented using Raspberry PI and Arduino boards to replicate practical performance.
Security practitioners use the attack surface of software systems to prioritize areas of systems to test and analyze. To date, approaches for predicting which code artifacts are vulnerable have utilized a binary classification of code as vulnerable or not vulnerable. To better understand the strengths and weaknesses of vulnerability prediction approaches, vulnerability datasets with classification and severity data are needed. The goal of this paper is to help researchers and practitioners make security effort prioritization decisions by evaluating which classifications and severities of vulnerabilities are on an attack surface approximated using crash dump stack traces. In this work, we use crash dump stack traces to approximate the attack surface of Mozilla Firefox. We then generate a dataset of 271 vulnerable files in Firefox, classified using the Common Weakness Enumeration (CWE) system. We use these files as an oracle for the evaluation of the attack surface generated using crash data. In the Firefox vulnerability dataset, 14 different classifications of vulnerabilities appeared at least once. In our study, 85.3%
of vulnerable files were on the attack surface generated using crash data. We found no difference between the severity of vulnerabilities found on the attack surface generated using crash data and vulnerabilities not occurring on the attack surface. Additionally, we discuss lessons learned during the development of this vulnerability dataset.
Malware classification is a critical part in the cyber-security. Traditional methodologies for the malware classification typically use static analysis and dynamic analysis to identify malware. In this paper, a malware classification methodology based on its binary image and extracting local binary pattern (LBP) features is proposed. First, malware images are reorganized into 3 by 3 grids which is mainly used to extract LBP feature. Second, the LBP is implemented on the malware images to extract features in that it is useful in pattern or texture classification. Finally, Tensorflow, a library for machine learning, is applied to classify malware images with the LBP feature. Performance comparison results among different classifiers with different image descriptors such as GIST, a spatial envelop, and the LBP demonstrate that our proposed approach outperforms others.
Recent years have witnessed the trend of increasingly relying on distributed infrastructures. This increased the number of reported incidents of security breaches compromising users' privacy, where third parties massively collect, process and manage users' personal data. Towards these security and privacy challenges, we combine hierarchical identity based cryptographic mechanisms with emerging blockchain infrastructures and propose a blockchain-based data usage auditing architecture ensuring availability and accountability in a privacy-preserving fashion. Our approach relies on the use of auditable contracts deployed in blockchain infrastructures. Thus, it offers transparent and controlled data access, sharing and processing, so that unauthorized users or untrusted servers cannot process data without client's authorization. Moreover, based on cryptographic mechanisms, our solution preserves privacy of data owners and ensures secrecy for shared data with multiple service providers. It also provides auditing authorities with tamper-proof evidences for data usage compliance.
While significant progress has been made separately on analytics systems for scalable stochastic gradient descent (SGD) and private SGD, none of the major scalable analytics frameworks have incorporated differentially private SGD. There are two inter-related issues for this disconnect between research and practice: (1) low model accuracy due to added noise to guarantee privacy, and (2) high development and runtime overhead of the private algorithms. This paper takes a first step to remedy this disconnect and proposes a private SGD algorithm to address both issues in an integrated manner. In contrast to the white-box approach adopted by previous work, we revisit and use the classical technique of output perturbation to devise a novel “bolt-on” approach to private SGD. While our approach trivially addresses (2), it makes (1) even more challenging. We address this challenge by providing a novel analysis of the L2-sensitivity of SGD, which allows, under the same privacy guarantees, better convergence of SGD when only a constant number of passes can be made over the data. We integrate our algorithm, as well as other state-of-the-art differentially private SGD, into Bismarck, a popular scalable SGD-based analytics system on top of an RDBMS. Extensive experiments show that our algorithm can be easily integrated, incurs virtually no overhead, scales well, and most importantly, yields substantially better (up to 4X) test accuracy than the state-of-the-art algorithms on many real datasets.
We present, VIP, an approach to boosting the precision of Virtual call Integrity Protection for large-scale real-world C++ programs (e.g., Chrome) by using pointer analysis for the first time. VIP introduces two new techniques: (1) a sound and scalable partial pointer analysis for discovering statically the sets of legitimate targets at virtual callsites from separately compiled C++ modules and (2) a lightweight instrumentation technique for performing (virtual call) integrity checks at runtime. VIP raises the bar against vtable hijacking attacks by providing stronger security guarantees than the CHA-based approach with comparable performance overhead. VIP is implemented in LLVM-3.8.0 and evaluated using SPEC programs and Chrome. Statically, VIP protects virtual calls more effectively than CHA by significantly reducing the sets of legitimate targets permitted at 20.3% of the virtual callsites per program, on average. Dynamically, VIP incurs an average (maximum) instrumentation overhead of 0.7% (3.3%), making it practically deployable as part of a compiler tool chain.