Biblio
Federated identity providers, e.g., Facebook and PayPal, offer a convenient means for authenticating users to third-party applications. Unfortunately such cross-site authentications carry privacy and tracking risks. For example, federated identity providers can learn what applications users are accessing; meanwhile, the applications can know the users' identities in reality. This paper presents Crypto-Book, an anonymizing layer enabling federated identity authentications while preventing these risks. Crypto-Book uses a set of independently managed servers that employ a (t,n)-threshold cryptosystem to collectively assign credentials to each federated identity (in the form of either a public/private keypair or blinded signed messages). With the credentials in hand, clients can then leverage anonymous authentication techniques such as linkable ring signatures or partially blind signatures to log into third-party applications in an anonymous yet accountable way. We have implemented a prototype of Crypto-Book and demonstrated its use with three applications: a Wiki system, an anonymous group communication system, and a whistleblower submission system. Crypto-Book is practical and has low overhead: in a deployment within our research group, Crypto-Book group authentication took 1.607s end-to-end, an overhead of 1.2s compared to traditional non-privacy-preserving federated authentication.
Modern distributed key-value stores are offering superior performance, incremental scalability, and fine availability for data-intensive computing and cloud-based applications. Among those distributed data stores, the designs that ensure the confidentiality of sensitive data, however, have not been fully explored yet. In this paper, we focus on designing and implementing an encrypted, distributed, and searchable key-value store. It achieves strong protection on data privacy while preserving all the above prominent features of plaintext systems. We first design a secure data partition algorithm that distributes encrypted data evenly across a cluster of nodes. Based on this algorithm, we propose a secure transformation layer that supports multiple data models in a privacy-preserving way, and implement two basic APIs for the proposed encrypted key-value store. To enable secure search queries for secondary attributes of data, we leverage searchable symmetric encryption to design the encrypted secondary indexes which consider security, efficiency, and data locality simultaneously, and further enable secure query processing in parallel. For completeness, we present formal security analysis to demonstrate the strong security strength of the proposed designs. We implement the system prototype and deploy it to a cluster at Microsoft Azure. Comprehensive performance evaluation is conducted in terms of Put/Get throughput, Put/Get latency under different workloads, system scaling cost, and secure query performance. The comparison with Redis shows that our prototype can function in a practical manner.
In this paper we provide a framework to analyze the effect of uniform sampling on graph optimization problems. Interestingly, we apply this framework to a general class of graph optimization problems that we call heavy subgraph problems, and show that uniform sampling preserves a 1-ε approximate solution to these problems. This class contains many interesting problems such as densest subgraph, directed densest subgraph, densest bipartite subgraph, d-max cut, and d-sum-max clustering. As an immediate impact of this result, one can use uniform sampling to solve these problems in streaming, turnstile or Map-Reduce settings. Indeed, our results by characterizing heavy subgraph problems address Open Problem 13 at the IITK Workshop on Algorithms for Data Streams in 2006 regarding the effects of subsampling, in the context of graph streams. Recently Bhattacharya et al. in STOC 2015 provide the first one pass algorithm for the densest subgraph problem in the streaming model with additions and deletions to its edges, i.e., for dynamic graph streams. They present a (0.5-ε)-approximation algorithm using \textasciitildeO(n) space, where factors of ε and log(n) are suppressed in the \textasciitildeO notation. In this paper we improve the (0.5-ε)-approximation algorithm of Bhattacharya et al. by providing a (1-ε)-approximation algorithm using \textasciitildeO(n) space.
Networks are a fundamental tool for understanding and modeling complex systems in physics, biology, neuroscience, engineering, and social science. Many networks are known to exhibit rich, lower-order connectivity patterns that can be captured at the level of individual nodes and edges. However, higher-order organization of complex networks – at the level of small network subgraphs – remains largely unknown. Here, we develop a generalized framework for clustering networks on the basis of higher-order connectivity patterns. This framework provides mathematical guarantees on the optimality of obtained clusters and scales to networks with billions of edges. The framework reveals higher-order organization in a number of networks, including information propagation units in neuronal networks and hub structure in transportation networks. Results show that networks exhibit rich higher-order organizational structures that are exposed by clustering based on higher-order connectivity patterns. Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.
In a secret sharing scheme a dealer shares a secret s among n parties such that an adversary corrupting up to t parties does not learn s, while any t+1 parties can efficiently recover s. Over a long period of time all parties may be corrupted thus violating the threshold, which is accounted for in Proactive Secret Sharing (PSS). PSS schemes periodically rerandomize (refresh) the shares of the secret and invalidate old ones. PSS retains confidentiality even when all parties are corrupted over the lifetime of the secret, but no more than t during a certain window of time, called the refresh period. Existing PSS schemes only guarantee secrecy in the presence of an honest majority with less than n2 total corruptions during a refresh period; an adversary corrupting a single additional party, even if only passively, obtains the secret. This work is the first feasibility result demonstrating PSS tolerating a dishonest majority, it introduces the first PSS scheme secure against t passive adversaries without recovery of lost shares, it can also recover from honest faulty parties losing their shares, and when tolerating e faults the scheme tolerates t passive corruptions. A non-robust version of the scheme can tolerate t active adversaries, and mixed adversaries that control a combination of passively and actively corrupted parties that are a majority, but where less than n/2-e of such corruptions are active. We achieve these high thresholds with O(n4) communication when sharing a single secret, and O(n3) communication when sharing multiple secrets in batches.
Kernel hardening has been an important topic since many applications and security mechanisms often consider the kernel as part of their Trusted Computing Base (TCB). Among various hardening techniques, Kernel Address Space Layout Randomization (KASLR) is the most effective and widely adopted defense mechanism that can practically mitigate various memory corruption vulnerabilities, such as buffer overflow and use-after-free. In principle, KASLR is secure as long as no memory leak vulnerability exists and high entropy is ensured. In this paper, we introduce a highly stable timing attack against KASLR, called DrK, that can precisely de-randomize the memory layout of the kernel without violating any such assumptions. DrK exploits a hardware feature called Intel Transactional Synchronization Extension (TSX) that is readily available in most modern commodity CPUs. One surprising behavior of TSX, which is essentially the root cause of this security loophole, is that it aborts a transaction without notifying the underlying kernel even when the transaction fails due to a critical error, such as a page fault or an access violation, which traditionally requires kernel intervention. DrK turned this property into a precise timing channel that can determine the mapping status (i.e., mapped versus unmapped) and execution status (i.e., executable versus non-executable) of the privileged kernel address space. In addition to its surprising accuracy and precision, DrK is universally applicable to all OSes, even in virtualized environments, and generates no visible footprint, making it difficult to detect in practice. We demonstrated that DrK can break the KASLR of all major OSes (i.e., Windows, Linux, and OS X) with near-perfect accuracy in under a second. Finally, we propose potential countermeasures that can effectively prevent or mitigate the DrK attack. We urge our community to be aware of the potential threat of having Intel TSX, which is present in most recent Intel CPUs – 100% in workstation and 60% in high-end Intel CPUs since Skylake – and is even available on Amazon EC2 (X1).
A biased random-key genetic algorithm (BRKGA) is a general search procedure for finding optimal or near-optimal solutions to hard combinatorial optimization problems. It is derived from the random-key genetic algorithm of Bean (1994), differing in the way solutions are combined to produce offspring. BRKGAs have three key features that specialize genetic algorithms: A fixed chromosome encoding using a vector of N random keys or alleles over the real interval [0, 1), where the value of N depends on the instance of the optimization problem; A well-defined evolutionary process adopting parameterized uniform crossover to generate offspring and thus evolve the population; The introduction of new chromosomes called mutants in place of the mutation operator usually found in evolutionary algorithms. Such features simplify and standardize the procedure with a set of self-contained tasks from which only one is problem-dependent: that of decoding a chromosome, i.e. using, the keys to construct a solution to the underlying optimization problem, from which the objective function value or fitness can be computed. BRKGAs have the additional characteristic that, under a weak assumption, crossover always produces feasible offspring and, therefore, a repair or healing procedure to recover feasibility is not required in a BRKGA. In this tutorial we review the basic components of a BRKGA and introduce an Application Programming Interface (API) for quick implementations of BRKGA heuristics. We then apply the framework to a number of hard combinatorial optimization problems, including 2-D and 3-D packing problems, network design problems, routing problems, scheduling problems, and data mining. We conclude with a brief review of other domains where BRKGA heuristics have been applied.
In this study, we report on techniques and analyses that enable us to capture Internet-wide activity at individual IP address-level granularity by relying on server logs of a large commercial content delivery network (CDN) that serves close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015, these logs recorded client activity involving 1.2 billion unique IPv4 addresses, the highest ever measured, in agreement with recent estimates. Monthly client IPv4 address counts showed constant growth for years prior, but since 2014, the IPv4 count has stagnated while IPv6 counts have grown. Thus, it seems we have entered an era marked by increased complexity, one in which the sole enumeration of active IPv4 addresses is of little use to characterize recent growth of the Internet as a whole. With this observation in mind, we consider new points of view in the study of global IPv4 address activity. Our analysis shows significant churn in active IPv4 addresses: the set of active IPv4 addresses varies by as much as 25% over the course of a year. Second, by looking across the active addresses in a prefix, we are able to identify and attribute activity patterns to networkm restructurings, user behaviors, and, in particular, various address assignment practices. Third, by combining spatio-temporal measures of address utilization with measures of traffic volume, and sampling-based estimates of relative host counts, we present novel perspectives on worldwide IPv4 address activity, including empirical observation of under-utilization in some areas, and complete utilization, or exhaustion, in others.
BootJacker is a proof-of-concept attack tool which demonstrates that authentication mechanisms employed by an operating system can be bypassed by obtaining physical access and simply forcing a restart. The key insight that enables this attack is that the contents of memory on some machines are fully preserved across a warm boot. Upon a reboot, BootJacker uses this residual memory state to revive the original host operating system environment and run malicious payloads. Using BootJacker, an attacker can break into a locked user session and gain access to open encrypted disks, web browser sessions or other secure network connections. BootJacker's non-persistent design makes it possible for an attacker to leave no traces on the victim machine.
In the Internet of Things (IoT), Internet-connected things provide an influx of data and resources that offer unlimited possibility for applications and services. Smart City IoT systems refer to the things that are distributed over wide physical areas covering a whole city. While the new breed of data and resources looks promising, building applications in such large scale IoT systems is a difficult task due to the distributed and dynamic natures of entities involved, such as sensing, actuating devices, people and computing resources. In this paper, we explore the process of developing Smart City IoT applications from a coordination-based perspective. We show that a distributed coordination model that oversees such a large group of distributed components is necessary in building Smart City IoT applications. In particular, we propose Adaptive Distributed Dataflow, a novel Dataflow-based programming model that focuses on coordinating city-scale distributed systems that are highly heterogeneous and dynamic.
After a software system is compromised, it can be difficult to understand what vulnerabilities attackers exploited. Any information residing on that machine cannot be trusted as attackers may have tampered with it to cover their tracks. Moreover, even after an exploit is known, it can be difficult to determine whether it has been used to compromise a given machine. Aviation has long-used black boxes to better understand the causes of accidents, enabling improvements that reduce the likelihood of future accidents. Many attacks introduce abnormal control flows to compromise systems. In this paper, we present BlackBox, a monitoring system for COTS software. Our techniques enable BlackBox to efficiently monitor unexpected and potentially harmful control flow in COTS binaries. BlackBox constructs dynamic profiles of an application's typical control flows to filter the vast majority of expected control flow behavior, leaving us with a manageable amount of data that can be logged across the network to remote devices. Modern applications make extensive use of dynamically generated code, some of which varies greatly between executions. We introduce support for code generators that can detect security-sensitive behaviors while allowing BlackBox to avoid logging the majority of ordinary behaviors. We have implemented BlackBox in DynamoRIO. We evaluate the runtime overhead of BlackBox, and show that it can effectively monitor recent versions of Microsoft Office and Google Chrome. We show that in ROP, COOP, and state- of-the-art JIT injection attacks, BlackBox logs the pivotal actions by which the attacker takes control, and can also blacklist those actions to prevent repeated exploits.
Centrality measures have perpetually been helpful to find the foremost central or most powerful node within the network. There are numerous strategies to compute centrality of a node however in social networks betweenness centrality is the most widely used approach to bifurcate communities within the network, to find out the susceptibility within the complex networks and to generate the scale free networks whose degree distribution follows the power law. In this paper, we've computed betweenness centrality by identifying communities lying within the network. Our algorithm efficiently updates the centrality of the nodes whenever any edge or vertex addition or deletion takes place within the dynamic network by modifying solely a subset of vertices. For the vertex addition, Incremental Algorithm has been used in which Streaming graphs has also been considered. Brandes approach is the most widely used approach for finding out the betweenness centrality however it's still expensive for growing networks since it takes O(mn+n2logn) amount of time and O(n+m) space however our approach efficiently updates the centrality of the nodes by taking O(textbarStextbarn+textbarStextbarnlogn) amount of time where textbarStextbar is the subset of the vertices,m is the number of edges, n is the number of vertices and textbarStextbar≤n holds true.
In this research, we aim to suggest a method for designing trustworthy PRVAs (product recommendation virtual agents). We define an agent's trustworthiness as being operated by user emotion and knowledgeableness perceived by humans. Also, we suggest a user inner state transition model for increasing trust. To increase trust, we aim to cause user emotion to transition to positive by using emotional contagion and to cause user knowledgeableness perceived to become higher by increasing an agent's knowledge. We carried out two experiments to inspect this model. In experiment 1, the PRVAs recommended package tours and became highly knowledgeable in the latter half of ten recommendations. In experiment 2, the PRVAs recommended the same package tours and expressed a positive emotion in the latter half. As a result, participants' inner states transitioned as we expected, and it was proved that this model was valuable for PRVA recommendation.
A major challenge of the existing attack detection approaches is the identification of relevant information to a particular situation, and the use of such information to perform multi-evidence intrusion detection. Addressing such a limitation requires integrating several aspects of context to better predict, avoid and respond to impending attacks. The quality and adequacy of contextual information is important to decrease uncertainty and correctly identify potential cyber-attacks. In this paper, a systematic methodology has been used to identify contextual dimensions that improve the effectiveness of detecting cyber-attacks. This methodology combines graph, probability, and information theories to create several context-based attack prediction models that analyze data at a high- and low-level. An extensive validation of our approach has been performed using a prototype system and several benchmark intrusion detection datasets yielding very promising results.
General sparse matrix-matrix multiplication (SpGEMM) is a core component of many algorithms. A number of recent works have used high throughput graphics processing units (GPUs) to accelerate SpGEMM. However, exploiting the power of GPUs for SpGEMM requires addressing a number of challenges, including highly imbalanced workloads and large numbers of inefficient random global memory accesses. This paper presents a SpGEMM algorithm which uses several novel techniques to overcome these problems. We first propose two low cost methods to achieve perfect load balancing during the most expensive step in SpGEMM. Next, we show how to eliminate nearly all random global memory accesses using shared memory based hash tables. To optimize the performance of the hash tables, we propose a lightweight method to estimate the number of nonzeros in the output matrix. We compared our algorithm to the CUSP, CUSPARSE and the state-of-the-art BHSPARSE GPU SpGEMM algorithms, and show that it performs 5.6x, 2.4x and 1.5x better on average, and up to 11.8x, 9.5x and 2.5x better in the best case, respectively. Furthermore, we show that our algorithm performs especially well on highly imbalanced and unstructured matrices.
Anti-reverse engineering is one of the core technologies of software intellectual property protection, prevailing techniques of which are static and dynamic obfuscation. Static obfuscation can only prevent static analysis with code mutation done before execution by compressing, encrypting and obfuscating. Dynamic obfuscation can prevent both static and dynamic analysis, which changes code while being executed. Popular dynamic obfuscation techniques include self-modifying code and virtual machine protection. Despite the higher safety, dynamic obfuscation has its problems: 1) code appear in plain text remains a long time; 2) control flow is exposable; 3) time and space overheads are too big. This paper presents a binary protection scheme using dynamic fine-grained code hiding and obfuscation named dynFCHO. In this scheme, basic blocks to be protected are hidden in original code and will be restored while being executed. Code obfuscation is also implemented additionally to enhance safety. Experiments prove that dynFCHO can effectively resist static and dynamic analysis without destructing original software functions. It can be used on most binary programs compiled by standard compilers. This scheme can be widely used with the advantages of strong protection, light-weight implementation, and good extendibility.
The Security Behavior Intentions Scale (SeBIS) measures the computer security attitudes of end-users. Because intentions are a prerequisite for planned behavior, the scale could therefore be useful for predicting users' computer security behaviors. We performed three experiments to identify correlations between each of SeBIS's four sub-scales and relevant computer security behaviors. We found that testing high on the awareness sub-scale correlated with correctly identifying a phishing website; testing high on the passwords sub-scale correlated with creating passwords that could not be quickly cracked; testing high on the updating sub-scale correlated with applying software updates; and testing high on the securement sub-scale correlated with smartphone lock screen usage (e.g., PINs). Our results indicate that SeBIS predicts certain computer security behaviors and that it is a reliable and valid tool that should be used in future research.
We present a novel Cyber Security analytics framework. We demonstrate a comprehensive cyber security monitoring system to construct cyber security correlated events with feature selection to anticipate behaviour based on various sensors.
Internet Engineering Task Force (IETF) is working on 6LoW-PAN standard which allows smart devices to be connected to Internet using large address space of IPV6. 6LoWPAN acts as a bridge between resource constrained devices and the Internet. The entire IoT space is vulnerable to local threats as well as the threats from the Internet. Due to the random deployment of the network nodes and the absence of tamper resistant shield, the resource constrained IoT elements face potential insider attacks even in presence of front line defense mechanism that involved cryptographic protocols. To detect such insidious nodes, an Intrusion Detection System (IDS) is required as a second line of defense. In this paper, we attempt to analyze such potential insider attacks, while reviewing the IDS based countermeasures. We attempt to propose a baseline for designing IDS for 6LoWPAN based IoT system.
Embedded Systems (ES) are an integral part of Cyber-Physical Systems (CPS), the Internet of Things (IoT), and consumer devices like smartphones. ES often have limited resources, and - if used in CPS and IoT - have to satisfy real time requirements. Therefore, ES rarely employ the security measures established for computer systems and networks. Due to the growth of both CPS and IoT it is important to identify ongoing attacks on ES without interfering with realtime constraints. Furthermore, security solutions that can be retrofit to legacy systems are desirable, especially when ES are used in Industrial Control Systems (ICS) that often maintain the same hardware for decades. To tackle this problem, several researchers have proposed using side-channels (i.e., physical emanations accompanying cyber processes) to detect such attacks. While prior work focuses on the anomaly detection approach, this might not always be sufficient, especially in complex ES whose behavior depends on the input data. In this paper, we determine whether one of the most common attacks - a buffer overflow attack - generates distinct side-channel signatures if executed on a vulnerable ES. We only consider the power consumption side-channel. We collect and analyze power traces from normal program operation and four cases of buffer overflow attack categories: (i) crash program execution, (ii) injection of executable code, (iii) return to existing function, and (iv) Return Oriented Programming (ROP) with gadgets. Our analysis shows that for some of these cases a power signature-based detection of a buffer overflow attack is possible.
Abstractions make building complex systems possible. Many facilities provided by a modern programming language are directly designed to build a certain style of abstraction. Abstractions also aim to enhance code reusability, thus enhancing programmer productivity and effectiveness. Real-world software systems can grow to have a complicated hierarchy of abstractions. Often, the hierarchy grows unnecessarily deep, because the programmers have envisioned the most generic use cases for a piece of code to make it reusable. Sometimes, the abstractions used in the program are not the appropriate ones, and it would be simpler for the higher level client to circumvent such abstractions. Another problem is the impedance mismatch between different pieces of code or libraries coming from different projects that are not designed to work together. Interoperability between such libraries are often hindered by abstractions, by design, in the name of hiding implementation details and encapsulation. These problems necessitate forms of abstraction that are easy to manipulate if needed. In this paper, we describe a powerful mechanism to create white-box abstractions, that encourage flatter hierarchies of abstraction and ease of manipulation and customization when necessary: program refinement. In so doing, we rely on the basic principle that writing directly in the host programming language is as least restrictive as one can get in terms of expressiveness, and allow the programmer to reuse and customize existing code snippets to address their specific needs.
We develop a systematic approach for analyzing client-server applications that aim to hide sensitive user data from untrusted servers. We then apply it to Mylar, a framework that uses multi-key searchable encryption (MKSE) to build Web applications on top of encrypted data. We demonstrate that (1) the Popa-Zeldovich model for MKSE does not imply security against either passive or active attacks; (2) Mylar-based Web applications reveal users' data and queries to passive and active adversarial servers; and (3) Mylar is generically insecure against active attacks due to system design flaws. Our results show that the problem of securing client-server applications against actively malicious servers is challenging and still unsolved. We conclude with general lessons for the designers of systems that rely on property-preserving or searchable encryption to protect data from untrusted servers.
Synchronous replication is critical for today's enterprise IT organization. It is mandatory by regulation in several countries for some types of organizations, including banks and insurance companies. The technology has been available for a long period of time, but due to speed of light and maximal latency limitations, it is usually limited to a distance of 50-100 miles. Flight data recorders, also known as black boxes, have long been used to record the last actions which happened in airplanes at times of disasters. We present an integration between an Enterprise Data Recorder and an asynchronous replication mechanism, which allows breaking the functional limits that light speed imposes on synchronous replication.
When reasoning about software security, researchers and practitioners use the phrase ``attack surface'' as a metaphor for risk. Enumerate and minimize the ways attackers can break in then risk is reduced and the system is better protected, the metaphor says. But software systems are much more complicated than their surfaces. We propose function- and file-level attack surface metrics–-proximity and risky walk–-that enable fine-grained risk assessment. Our risky walk metric is highly configurable: we use PageRank on a probability-weighted call graph to simulate attacker behavior of finding or exploiting a vulnerability. We provide evidence-based guidance for deploying these metrics, including an extensive parameter tuning study. We conducted an empirical study on two large open source projects, FFmpeg and Wireshark, to investigate the potential correlation between our metrics and historical post-release vulnerabilities. We found our metrics to be statistically significantly associated with vulnerable functions/files with a small-to-large Cohen's d effect size. Our prediction model achieved an increase of 36% (in FFmpeg) and 27% (in Wireshark) in the average value of F-measure over a base model built with SLOC and coupling metrics. Our prediction model outperformed comparable models from prior literature with notable improvements: 58% reduction in false negative rate, 81% reduction in false positive rate, and 548% increase in F-measure. These metrics advance vulnerability prevention by [(a)] being flexible in terms of granularity, performing better than vulnerability prediction literature, and being tunable so that practitioners can tailor the metrics to their products and better assess security risk.
Five core primitives belonging to most distributed systems are presented. These primitives apply well to systems with large amounts of data, scalability concerns, heterogeneity concerns, temporal concerns, and elements of unknown pedigree with possible nefarious intent. These primitives form the basic building blocks for a Network of 'Things' (NoT), including the Internet of Things (IoT). This talk discusses the underlying and foundational science of IoT. To our knowledge, the ideas and the manner in which the science underlying IoT is presented here is unique.