Biblio
We present the largest kinship recognition dataset to date, Families in the Wild (FIW). Motivated by the lack of a single, unified dataset for kinship recognition, we aim to provide a dataset that captivates the interest of the research community. With only a small team, we were able to collect, organize, and label over 10,000 family photos of 1,000 families with our annotation tool designed to mark complex hierarchical relationships and local label information in a quick and efficient manner. We include several benchmarks for two image-based tasks, kinship verification and family recognition. For this, we incorporate several visual features and metric learning methods as baselines. Also, we demonstrate that a pre-trained Convolutional Neural Network (CNN) as an off-the-shelf feature extractor outperforms the other feature types. Then, results were further boosted by fine-tuning two deep CNNs on FIW data: (1) for kinship verification, a triplet loss function was learned on top of the network of pre-train weights; (2) for family recognition, a family-specific softmax classifier was added to the network.
In this paper we describe a system that allows the real time creation of firewall rules in response to geographic and political changes in the control-plane. This allows an organization to mitigate data exfiltration threats by analyzing Border Gateway Protocol (BGP) updates and blocking packets from being routed through problematic jurisdictions. By inspecting the autonomous system paths and referencing external data sources about the autonomous systems, a BGP participant can infer the countries that traffic to a particular destination address will traverse. Based on this information, an organization can then define constraints on its egress traffic to prevent sensitive data from being sent via an untrusted region. In light of the many route leaks and BGP hijacks that occur today, this offers a new option to organizations willing to accept reduced availability over the risk to confidentiality. Similar to firewalls that allow organizations to block traffic originating from specific countries, our approach allows blocking outbound traffic from transiting specific jurisdictions. To illustrate the efficacy of this approach, we provide an analysis of paths to various financial services IP addresses over the course of a month from a single BGP vantage point that quantifies the frequency of path alterations resulting in the traversal of new countries. We conclude with an argument for the utility of country-based egress policies that do not require the cooperation of upstream providers.
Function Secret Sharing (FSS), introduced by Boyle et al. (Eurocrypt 2015), provides a way for additively secret-sharing a function from a given function family F. More concretely, an m-party FSS scheme splits a function f : \0, 1\n -textgreater G, for some abelian group G, into functions f1,...,fm, described by keys k1,...,km, such that f = f1 + ... + fm and every strict subset of the keys hides f. A Distributed Point Function (DPF) is a special case where F is the family of point functions, namely functions f\_\a,b\ that evaluate to b on the input a and to 0 on all other inputs. FSS schemes are useful for applications that involve privately reading from or writing to distributed databases while minimizing the amount of communication. These include different flavors of private information retrieval (PIR), as well as a recent application of DPF for large-scale anonymous messaging. We improve and extend previous results in several ways: * Simplified FSS constructions. We introduce a tensoring operation for FSS which is used to obtain a conceptually simpler derivation of previous constructions and present our new constructions. * Improved 2-party DPF. We reduce the key size of the PRG-based DPF scheme of Boyle et al. roughly by a factor of 4 and optimize its computational cost. The optimized DPF significantly improves the concrete costs of 2-server PIR and related primitives. * FSS for new function families. We present an efficient PRG-based 2-party FSS scheme for the family of decision trees, leaking only the topology of the tree and the internal node labels. We apply this towards FSS for multi-dimensional intervals. We also present a general technique for extending FSS schemes by increasing the number of parties. * Verifiable FSS. We present efficient protocols for verifying that keys (k*/1,...,k*/m ), obtained from a potentially malicious user, are consistent with some f in F. Such a verification may be critical for applications that involve private writing or voting by many users.
The Google Identity Platform is a system that allows a user to sign in to applications and other services by using a Google account. Google Sign-In is one such method for providing one’s identity to the Google Identity Platform. Google Sign-In is available for Android applications and iOS applications, as well as for websites and other devices. Users of Google Sign-In find that it integrates well with the Android platform, but iOS users (iPhone, iPad, etc.) do not have the same experience. The user experience when logging in to a Google account on an iOS application can not only be more tedious than the Android experience, but it also conditions users to engage in behaviors that put the information in their Google accounts at risk.
In this paper, we propose a hierarchical monitoring intrusion detection system (HAMIDS) for industrial control systems (ICS). The HAMIDS framework detects the anomalies in both level 0 and level 1 of an industrial control plant. In addition, the framework aggregates the cyber-physical process data in one point for further analysis as part of the intrusion detection process. The novelty of this framework is its ability to detect anomalies that have a distributed impact on the cyber-physical process. The performance of the proposed framework evaluated as part of SWaT security showdown (S3) in which six international teams were invited to test the framework in a real industrial control system. The proposed framework outperformed other proposed academic IDS in term of detection of ICS threats during the S3 event, which was held from July 25-29, 2016 at Singapore University of Technology and Design.
Several mature cryptographic frameworks are available, and they have been utilized for building complex applications. However, developers often use these frameworks incorrectly and introduce security vulnerabilities. This is because current cryptographic frameworks erode abstraction boundaries, as they do not encapsulate all the framework-specific knowledge and expect developers to understand security attacks and defenses. Starting from the documented misuse cases of cryptographic APIs, we infer five developer needs and we show that a good API design would address these needs only partially. Building on this observation, we propose APIs that are semantically meaningful for developers, we show how these interfaces can be implemented consistently on top of existing frameworks using novel and known design patterns, and we propose build management hooks for isolating security workarounds needed during the development and test phases. Through two case studies, we show that our APIs can be utilized to implement non-trivial client-server protocols and that they provide a better separation of concerns than existing frameworks. We also discuss the challenges and potential approaches for evaluating our solution. Our semantic interfaces represent a first step toward preventing misuses of cryptographic APIs.
Amplification DDoS attacks have gained popularity and become a serious threat to Internet participants. However, little is known about where these attacks originate, and revealing the attack sources is a non-trivial problem due to the spoofed nature of the traffic. In this paper, we present novel techniques to uncover the infrastructures behind amplification DDoS attacks. We follow a two-step approach to tackle this challenge: First, we develop a methodology to impose a fingerprint on scanners that perform the reconnaissance for amplification attacks that allows us to link subsequent attacks back to the scanner. Our methodology attributes over 58% of attacks to a scanner with a confidence of over 99.9%. Second, we use Time-to-Live-based trilateration techniques to map scanners to the actual infrastructures launching the attacks. Using this technique, we identify 34 networks as being the source for amplification attacks at 98\textbackslash% certainty.
Identity concealment and zero-round trip time (0-RTT) connection are two of current research focuses in the design and analysis of secure transport protocols, like TLS1.3 and Google's QUIC, in the client-server setting. In this work, we introduce a new primitive for identity-concealed authenticated encryption in the public-key setting, referred to as higncryption, which can be viewed as a novel monolithic integration of public-key encryption, digital signature, and identity concealment. We then present the security definitional framework for higncryption, and a conceptually simple (yet carefully designed) protocol construction. As a new primitive, higncryption can have many applications. In this work, we focus on its applications to 0-RTT authentication, showing higncryption is well suitable to and compatible with QUIC and OPTLS, and on its applications to identity-concealed authenticated key exchange (CAKE) and unilateral CAKE (UCAKE). Of independent interest is a new concise security definitional framework for CAKE and UCAKE proposed in this work, which unifies the traditional BR and (post-ID) frameworks, enjoys composability, and ensures very strong security guarantee. Along the way, we make a systematically comparative study with related protocols and mechanisms including Zheng's signcryption, one-pass HMQV, QUIC, TLS1.3 and OPTLS, most of which are widely standardized or in use.
Motivated by the impossibility of achieving fairness in secure computation [Cleve, STOC 1986], recent works study a model of fairness in which an adversarial party that aborts on receiving output is forced to pay a mutually predefined monetary penalty to every other party that did not receive the output. These works show how to design protocols for secure computation with penalties that tolerate an arbitrary number of corruptions. In this work, we improve the efficiency of protocols for secure computation with penalties in a hybrid model where parties have access to the "claim-or-refund" transaction functionality. Our first improvement is for the ladder protocol of Bentov and Kumaresan (Crypto 2014) where we improve the dependence of the script complexity of the protocol (which corresponds to miner verification load and also space on the blockchain) on the number of parties from quadratic to linear (and in particular, is completely independent of the underlying function). Our second improvement is for the see-saw protocol of Kumaresan et al. (CCS 2015) where we reduce the total number of claim-or-refund transactions and also the script complexity from quadratic to linear in the number of parties.
Recently, proactive systems such as Google Now and Microsoft Cortana have become increasingly popular in reforming the way users access information on mobile devices. In these systems, relevant content is presented to users based on their context without a query in the form of information cards that do not require a click to satisfy the users. As a result, prior approaches based on clicks cannot provide reliable measurements of user satisfaction with such systems. It is also unclear how much of the previous findings regarding good abandonment with reactive Web searches can be applied to these proactive systems due to the intrinsic difference in user intent, the greater variety of content types and their presentations. In this paper, we present the first large-scale analysis of viewing behavior based on the viewport (the visible fraction of a Web page) of the mobile devices, towards measuring user satisfaction with the information cards of the mobile proactive systems. In particular, we identified and analyzed a variety of factors that may influence the viewing behavior, including biases from ranking positions, the types and attributes of the information cards, and the touch interactions with the mobile devices. We show that by modeling the various factors we can better measure user satisfaction with the mobile proactive systems, enabling stronger statistical power in large-scale online A/B testing.
TLS and SSH are two of the most commonly used protocols for securing Internet traffic. Many of the implementations of these protocols rely on the cryptographic primitives provided in the OpenSSL library. In this work we disclose a vulnerability in OpenSSL, affecting all versions and forks (e.g. LibreSSL and BoringSSL) since roughly October 2005, which renders the implementation of the DSA signature scheme vulnerable to cache-based side-channel attacks. Exploiting the software defect, we demonstrate the first published cache-based key-recovery attack on these protocols: 260 SSH-2 handshakes to extract a 1024/160-bit DSA host key from an OpenSSH server, and 580 TLS 1.2 handshakes to extract a 2048/256-bit DSA key from an stunnel server.
You get what you measure, and you can't manage what you don't measure. Metrics are a powerful tool used in organizations to set goals, decide which new products and features should be released to customers, which new tests and experiments should be conducted, and how resources should be allocated. To a large extent, metrics drive the direction of an organization, and getting metrics 'right' is one of the most important and difficult problems an organization needs to solve. However, creating good metrics that capture long-term company goals is difficult. They try to capture abstract concepts such as success, delight, loyalty, engagement, life-time value, etc. How can one determine that a metric is a good one? Or, that one metric is better than another? In other words, how do we measure the quality of metrics? Can the evaluation process be automated so that anyone with an idea of a new metric can quickly evaluate it? In this paper we describe the metric evaluation system deployed at Bing, where we have been working on designing and improving metrics for over five years. We believe that by applying a data driven approach to metric evaluation we have been able to substantially improve our metrics and, as a result, ship better features and improve search experience for Bing's users.
Hardware Trojan detection has emerged as a critical challenge to ensure security and trustworthiness of integrated circuits. A vast majority of research efforts in this area has utilized side-channel analysis for Trojan detection. Functional test generation for logic testing is a promising alternative but it may not be helpful if a Trojan cannot be fully activated or the Trojan effect cannot be propagated to the observable outputs. Side-channel analysis, on the other hand, can achieve significantly higher detection coverage for Trojans of all types/sizes, since it does not require activation/propagation of an unknown Trojan. However, they have often limited effectiveness due to poor detection sensitivity under large process variations and small Trojan footprint in side-channel signature. In this paper, we address this critical problem through a novel side-channel-aware test generation approach, based on a concept of Multiple Excitation of Rare Switching (MERS), that can significantly increase Trojan detection sensitivity. The paper makes several important contributions: i) it presents in detail the statistical test generation method, which can generate high-quality testset for creating high relative activity in arbitrary Trojan instances; ii) it analyzes the effectiveness of generated testset in terms of Trojan coverage; and iii) it describes two judicious reordering methods can further tune the testset and greatly improve the side channel sensitivity. Simulation results demonstrate that the tests generated by MERS can significantly increase the Trojans sensitivity, thereby making Trojan detection effective using side-channel analysis.
Hardware Trojan detection has emerged as a critical challenge to ensure security and trustworthiness of integrated circuits. A vast majority of research efforts in this area has utilized side-channel analysis for Trojan detection. Functional test generation for logic testing is a promising alternative but it may not be helpful if a Trojan cannot be fully activated or the Trojan effect cannot be propagated to the observable outputs. Side-channel analysis, on the other hand, can achieve significantly higher detection coverage for Trojans of all types/sizes, since it does not require activation/propagation of an unknown Trojan. However, they have often limited effectiveness due to poor detection sensitivity under large process variations and small Trojan footprint in side-channel signature. In this paper, we address this critical problem through a novel side-channel-aware test generation approach, based on a concept of Multiple Excitation of Rare Switching (MERS), that can significantly increase Trojan detection sensitivity. The paper makes several important contributions: i) it presents in detail the statistical test generation method, which can generate high-quality testset for creating high relative activity in arbitrary Trojan instances; ii) it analyzes the effectiveness of generated testset in terms of Trojan coverage; and iii) it describes two judicious reordering methods can further tune the testset and greatly improve the side channel sensitivity. Simulation results demonstrate that the tests generated by MERS can significantly increase the Trojans sensitivity, thereby making Trojan detection effective using side-channel analysis.
We give attacks on Feistel-based format-preserving encryption (FPE) schemes that succeed in message recovery (not merely distinguishing scheme outputs from random) when the message space is small. For \$4\$-bit messages, the attacks fully recover the target message using \$2textasciicircum1 examples for the FF3 NIST standard and \$2textasciicircum5 examples for the FF1 NIST standard. The examples include only three messages per tweak, which is what makes the attacks non-trivial even though the total number of examples exceeds the size of the domain. The attacks are rigorously analyzed in a new definitional framework of message-recovery security. The attacks are easily put out of reach by increasing the number of Feistel rounds in the standards.
Today's mobile app economy has greatly expanded the types of "things" people can share –- spanning from new types of digital content like physiological data (e.g., workouts) to physical things like apartments and work tools ("sharing economy"). To understand whether mobile platforms provide adequate support for such novel sharing services, we surveyed 200 participants about their experiences with six types of emergent sharing services. For each domain we elicited device usage practices and identified corresponding device selection criteria. Our analysis suggests that, despite contemporary mobile first design efforts, desktop interfaces of emergent content sharing services are often considered more efficient and easier to use –- both for sharing and access control tasks (i.e., privacy). Based on our findings, we outline device-related design and research opportunities in this space.
While existing proactive-based paradigms such as address mutation are effective in slowing down reconnaissance by naive attackers, they are ineffective against skilled human attackers. In this paper, we analytically show that the goal of defeating reconnaissance by skilled human attackers is only achievable by an integration of five defensive dimensions: (1) mutating host addresses, (2) mutating host fingerprints, (3) anonymizing host fingerprints, (4) deploying high-fidelity honeypots with context-aware fingerprints, and (5) deploying context-aware content on those honeypots. Using a novel class of honeypots, referred to as proxy honeypots (high-interaction honeypots with customizable fingerprints), we propose a proactive defense model, called (HIDE), that constantly mutates addresses and fingerprints of network hosts and proxy honeypots in a manner that maximally anonymizes identity of network hosts. The objective is to make a host untraceable over time by not letting even skilled attackers reuse discovered attributes of a host in previous scanning, including its addresses and fingerprint, to identify that host again. The mutations are generated through formal definition and modeling the problem. Using a red teaming evaluation with a group of white-hat hackers, we evaluated our five-dimensional defense model and compared its effectiveness with alternative and competing scenarios. These experiments as well as our analytical evaluation show that by anonymizing all identifying attributes of a host/honeypot over time, HIDE is able to significantly complicate reconnaissance, even for highly skilled human attackers.
Physical Unclonable Functions (PUFs) measure manufacturing variations inside integrated circuits to derive internal secrets during run-time and avoid to store secrets permanently in non-volatile memory. PUF responses are noisy such that they require error correction to generate reliable cryptographic keys. To date, when needed one single key is reproduced in the field and always used, regardless of its reliability. In this work, we compute online reliability information for a reproduced key and perform multiple PUF readout and error correction steps in case of an unreliable result. This permits to choose the most reliable key among multiple derived key candidates with different corrected error patterns. We achieve the same average key error probability from less PUF response bits with this approach. Our proof of concept design for a popular reference scenario uses Differential Sequence Coding (DSC) and a Viterbi decoder with reliability output information. It requires 39% less PUF response bits and 16% less helper data bits than the regular approach without the option for multiple readouts.
In the last few years, there has been significant interest in developing methods to search over encrypted data. In the case of range queries, a simple solution is to encrypt the contents of the database using an order-preserving encryption (OPE) scheme (i.e., an encryption scheme that supports comparisons over encrypted values). However, Naveed et al. (CCS 2015) recently showed that OPE-encrypted databases are extremely vulnerable to "inference attacks." In this work, we consider a related primitive called order-revealing encryption (ORE), which is a generalization of OPE that allows for stronger security. We begin by constructing a new ORE scheme for small message spaces which achieves the "best-possible" notion of security for ORE. Next, we introduce a "domain extension" technique and apply it to our small-message-space ORE. While our domain-extension technique does incur a loss in security, the resulting ORE scheme we obtain is more secure than all existing (stateless and non-interactive) OPE and ORE schemes which are practical. All of our constructions rely only on symmetric primitives. As part of our analysis, we also give a tight lower bound for OPE and show that no efficient OPE scheme can satisfy best-possible security if the message space contains just three messages. Thus, achieving strong notions of security for even small message spaces requires moving beyond OPE. Finally, we examine the properties of our new ORE scheme and show how to use it to construct an efficient range query protocol that is robust against the inference attacks of Naveed et al. We also give a full implementation of our new ORE scheme, and show that not only is our scheme more secure than existing OPE schemes, it is also faster: encrypting a 32-bit integer requires just 55 microseconds, which is more than 65 times faster than existing OPE schemes.
Privacy-preserving range queries allow encrypting data while still enabling queries on ciphertexts if their corresponding plaintexts fall within a requested range. This provides a data owner the possibility to outsource data collections to a cloud service provider without sacrificing privacy nor losing functionality of filtering this data. However, existing methods for range queries either leak additional information (like the ordering of the complete data set) or slow down the search process tremendously by requiring to query each ciphertext in the data collection. We present a novel scheme that only leaks the access pattern while supporting amortized poly-logarithmic search time. Our construction is based on the novel idea of enabling the cloud service provider to compare requested range queries. By doing so, the cloud service provider can use the access pattern to speed-up search time for range queries in the future. On the one hand, values that have fallen within a queried range, are stored in an interactively built index for future requests. On the other hand, values that have not been queried do not leak any information to the cloud service provider and stay perfectly secure. In order to show its practicability we have implemented our scheme and give a detailed runtime evaluation.
Recently there has been much interest in performing search queries over encrypted data to enable functionality while protecting sensitive data. One particularly efficient mechanism for executing such queries is order-preserving encryption/encoding (OPE) which results in ciphertexts that preserve the relative order of the underlying plaintexts thus allowing range and comparison queries to be performed directly on ciphertexts. Recently, Popa et al. (SP 2013) gave the first construction of an ideally-secure OPE scheme and Kerschbaum (CCS 2015) showed how to achieve the even stronger notion of frequency-hiding OPE. However, as Naveed et al. (CCS 2015) have recently demonstrated, these constructions remain vulnerable to several attacks. Additionally, all previous ideal OPE schemes (with or without frequency-hiding) either require a large round complexity of O(log n) rounds for each insertion, or a large persistent client storage of size O(n), where n is the number of items in the database. It is thus desirable to achieve a range query scheme addressing both issues gracefully. In this paper, we propose an alternative approach to range queries over encrypted data that is optimized to support insert-heavy workloads as are common in "big data" applications while still maintaining search functionality and achieving stronger security. Specifically, we propose a new primitive called partial order preserving encoding (POPE) that achieves ideal OPE security with frequency hiding and also leaves a sizable fraction of the data pairwise incomparable. Using only O(1) persistent and O(ne) non-persistent client storage for 0(1-e)) search queries. This improved security and performance makes our scheme better suited for today's insert-heavy databases.
Anonymous authentication allows one to authenticate herself without revealing her identity, and becomes an important technique for constructing privacy-preserving Internet connections. Anonymous password authentication is highly desirable as it enables a client to authenticate herself by a human-memorable password while preserving her privacy. In this paper, we introduce a novel approach for designing anonymous password-authenticated key exchange (APAKE) protocols using algebraic message authentication codes (MACs), where an algebraic MAC wrapped by a password is used by a client for anonymous authentication, and a server issues algebraic MACs to clients and acts as the verifier of login protocols. Our APAKE construction is secure provided that the algebraic MAC is strongly existentially unforgeable under random message and chosen verification queries attack (suf-rmva), weak pseudorandom and tag-randomization simulatable, and has simulation-sound extractable non-interactive zero-knowledge proofs (SE-NIZKs). To design practical APAKE protocols, we instantiate an algebraic MAC based on the q-SDH assumption which satisfies all the required properties, and construct credential presentation algorithms for the MAC which have optimal efficiency for a randomize-then-prove paradigm. Based on the algebraic MAC, we instantiate a highly practical APAKE protocol and denote it by APAKE, which is much more efficient than the mechanisms specified by ISO/IEC 20009-4. An efficient revocation mechanism for APAKE is also proposed.
We integrate APAKE into TLS to present an anonymous client authentication mode where clients holding passwords can authenticate themselves to a server anonymously. Our implementation with 128-bit security shows that the average connection time of APAKE-based ciphersuite is 2.8 ms. With APAKE integrated into the OpenSSL library and using an Apache web server on a 2-core desktop computer, we could serve 953 ECDHE-ECDSA-AES128-GCM-SHA256 HTTPS connections per second for a 10 KB payload. Compared to ECDSA-signed elliptic curve Diffie-Hellman ciphersuite with mutual authentication, this means a 0.27 KB increased handshake size and a 13% reduction in throughput.
This paper introduces typy, a statically typed programming language embedded by reflection into Python. typy features a fragmentary semantics, i.e. it delegates semantic control over each term, drawn from Python's fixed concrete and abstract syntax, to some contextually relevant user-defined semantic fragment. The delegated fragment programmatically 1) typechecks the term (following a bidirectional protocol); and 2) assigns dynamic meaning to the term by computing a translation to Python. We argue that this design is expressive with examples of fragments that express the static and dynamic semantics of 1) functional records; 2) labeled sums (with nested pattern matching a la ML); 3) a variation on JavaScript's prototypal object system; and 4) typed foreign interfaces to Python and OpenCL. These semantic structures are, or would need to be, defined primitively in conventionally structured languages. We further argue that this design is compositionally well-behaved. It avoids the expression problem and the problems of grammar composition because the syntax is fixed. Moreover, programs are semantically stable under fragment composition (i.e. defining a new fragment will not change the meaning of existing program components.)