Biblio
Providing a global understanding of privacy is crucial, because everything is connected. Nowadays companies are providing their customers with more services that will give them more access to their data and daily activity; electricity companies are marketing the new smart meters as a new service with great benefit to reduce the electricity usage by monitoring the electricity reading in real time. Although the users might benefit from this extra service, it will compromise the privacy of the users by having constant access to the readings. Since the smart meters will provide the users with real electricity readings, they will be able to decide and identify which devices are consuming energy in that specific moment and how much it will cost. This kind of information can be exploited by numerous types of people. Unauthorized use of this information is an invasion of privacy and may lead to much more severe consequences. This paper will propose an algorithm approach for the comparison and analysis of Smart Meter data readings, considering the time and temperature factors at each second to identify the use patterns at each house by identifying the appliances activities at each second in time complexity O(log(m)).
Password vaults are used to store login credentials, usually encrypted by a master password, relieving the user from memorizing a large number of complex passwords. To manage accounts on multiple devices, vaults are often stored at an online service, which substantially increases the risk of leaking the (encrypted) vault. To protect the master password against guessing attacks, previous work has introduced cracking-resistant password vaults based on Honey Encryption. If decryption is attempted with a wrong master password, they output plausible-looking decoy vaults, thus seemingly disabling offline guessing attacks. In this work, we propose attacks against cracking-resistant password vaults that are able to distinguish between real and decoy vaults with high accuracy and thus circumvent the offered protection. These attacks are based on differences in the generated distribution of passwords, which are measured using Kullback-Leibler divergence. Our attack is able to rank the correct vault into the 1.3% most likely vaults (on median), compared to 37.8% of the best-reported attack in previous work. (Note that smaller ranks are better, and 50% is achievable by random guessing.) We demonstrate that this attack is, to a certain extent, a fundamental problem with all static Natural Language Encoders (NLE), where the distribution of decoy vaults is fixed. We propose the notion of adaptive NLEs and demonstrate that they substantially limit the effectiveness of such attacks. We give one example of an adaptive NLE based on Markov models and show that the attack is only able to rank the decoy vaults with a median rank of 35.1%.
In settings where data instances are generated sequentially or in streaming fashion, online learning algorithms can learn predictors using incremental training algorithms such as stochastic gradient descent. In some security applications such as training anomaly detectors, the data streams may consist of private information or transactions and the output of the learning algorithms may reveal information about the training data. Differential privacy is a framework for quantifying the privacy risk in such settings. This paper proposes two differentially private strategies to mitigate privacy risk when training a classifier for anomaly detection in an online setting. The first is to use a randomized active learning heuristic to screen out uninformative data points in the stream. The second is to use mini-batching to improve classifier performance. Experimental results show how these two strategies can trade off privacy, label complexity, and generalization performance.
TCP congestion control has been known for its crucial role in stabilizing the Internet and preventing congestion collapses. However, with the rapid advancement in networking technologies, resulting in the emergence of challenging network environments such as data center networks (DCNs), the traditional TCP algorithm leads to several impairments. The shortcomings of TCP when deployed in DCNs have motivated the development of multiple new variants, including DCTCP, ICTCP, IA-TCP, and D2TCP, but all of these algorithms exhibit their advantages at the cost of a number of drawbacks in the Global Internet. Motivated by the belief that new innovations need to be established on top of a solid foundation with a thorough understanding of the existing, well-established algorithms, we have been working towards a comprehensive analysis of various conventional TCP algorithms in DCNs and other modern networks. This paper presents our first milestone towards the completion of our comparative study in which we present the results obtained by simulating multiple TCP variants: NewReno, Vegas, HighSpeed, Scalable, Westwood+, BIC, CUBIC, and YeAH using a fat tree architecture. Each protocol is evaluated in terms of queue length, number of dropped packets, average packet delay, and aggregate bandwidth as a percentage of the channel bandwidth.
This paper proposes a method for segmentation of nuclei of single/isolated and overlapping/touching immature white blood cells from microscopic images of B-Lineage acute lymphoblastic leukemia (ALL) prepared from peripheral blood and bone marrow aspirate. We propose deep belief network approach for the segmentation of these nuclei. Simulation results and comparison with some of the existing methods demonstrate the efficacy of the proposed method.
In this paper, we improve recent results on the decentralized switched control problem to include the moving horizon case and apply it to a testbed system. Using known derivations for a centralized controller with look-ahead, we were able to extend the decentralized problem with finite memory to include receding horizon modal information. We then compare the performance of a switched controller with finite memory and look-ahead horizon to that of a linear time independent (LTI) controller using a simulation. The decentralized controller is further tested with a real-world system comprised of multiple model-sized hovercrafts.
In distributed control systems with shared resources, participating agents can improve the overall performance of the system by sharing data about their personal preferences. In this paper, we formulate and study a natural tradeoff arising in these problems between the privacy of the agent’s data and the performance of the control system.We formalize privacy in terms of differential privacy of agents’ preference vectors. The overall control system consists of N agents with linear discrete-time coupled dynamics, each controlled to track its preference vector. Performance of the system is measured by the mean squared tracking error.We present a mechanism that achieves differential privacy by adding Laplace noise to the shared information in a way that depends on the sensitivity of the control system to the private data. We show that for stable systems the performance cost of using this type of privacy preserving mechanism grows as O(T 3/Nε2 ), where T is the time horizon and ε is the privacy parameter. For unstable systems, the cost grows exponentially with time. From an estimation point of view, we establish a lower-bound for the entropy of any unbiased estimator of the private data from any noise-adding mechanism that gives ε-differential privacy.We show that the mechanism achieving this lower-bound is a randomized mechanism that also uses Laplace noise.
Scientific advancement is fueled by solid fundamental research, followed by replication, meta-analysis, and theory building. To support such advancement, researchers and government agencies have been working towards a "science of security". As in other sciences, security science requires high-quality fundamental research addressing important problems and reporting approaches that capture the information necessary for replication, meta-analysis, and theory building. The goal of this paper is to aid security researchers in establishing a baseline of the state of scientific reporting in security through an analysis of indicators of scientific research as reported in top security conferences, specifically the 2015 ACM CCS and 2016 IEEE S&P proceedings. To conduct this analysis, we employed a series of rubrics to analyze the completeness of information reported in papers relative to the type of evaluation used (e.g. empirical study, proof, discussion). Our findings indicated some important information is often missing from papers, including explicit documentation of research objectives and the threats to validity. Our findings show a relatively small number of replications reported in the literature. We hope that this initial analysis will serve as a baseline against which we can measure the advancement of the science of security.
This work is motivated by the rapid increase of the number of attacks in computer networks and software engineering. In this paper we study identity snowball attacks and formally prove the correctness of suggested solutions to this type of attack (solutions that are based on the graph reachability reduction) using a proof assistant. We propose a model of an attack graph that captures technical informations about the calculation of reachability of the graph. The model has been implemented with the proof assistant PVS 6.0 (Prototype Verification System). It makes it possible to prove algorithms of reachability reduction such as Sparsest\_cut.
Dynamic Searchable Symmetric Encryption (DSSE) allows a client to perform keyword searches over encrypted files via an encrypted data structure. Despite its merits, DSSE leaks search and update patterns when the client accesses the encrypted data structure. These leakages may create severe privacy problems as already shown, for example, in recent statistical attacks on DSSE. While Oblivious Random Access Memory (ORAM) can hide such access patterns, it incurs significant communication overhead and, therefore, it is not yet fully practical for cloud computing systems. Hence, there is a critical need to develop private access schemes over the encrypted data structure that can seal the leakages of DSSE while achieving practical search/update operations. In this paper, we propose a new oblivious access scheme over the encrypted data structure for searchable encryption purposes, that we call textlessutextgreaterDtextless/utextgreateristributed textlessutextgreaterOtextless/utextgreaterblivious textlessutextgreaterDtextless/utextgreaterata structure textlessutextgreaterDSSEtextless/utextgreater (DOD-DSSE). The main idea is to create a distributed encrypted incidence matrix on two non-colluding servers such that no arbitrary queries on these servers can be linked to each other. This strategy prevents not only recent statistical attacks on the encrypted data structure but also other potential threats exploiting query linkability. Our security analysis proves that DOD-DSSE ensures the unlink-ability of queries and, therefore, offers much higher security than traditional DSSE. At the same time, our performance evaluation demonstrates that DOD-DSSE is two orders of magnitude faster than ORAM-based techniques (e.g., Path ORAM), since it only incurs a small-constant number of communication overhead. That is, we deployed DOD-DSSE on geographically distributed Amazon EC2 servers, and showed that, a search/update operation on a very large dataset only takes around one second with DOD-DSSE, while it takes 3 to 13 minutes with Path ORAM-based methods.
We propose a simple and efficient searchable symmetric encryption scheme based on a Bitmap index that evaluates Boolean queries. Our scheme provides a practical solution in settings where communications and computations are very constrained as it offers a suitable trade-off between privacy and performance.
In this paper, we present a new kind of near-optimal double-layered syndrome-trellis codes (STCs) for spatial domain steganography. The STCs can hide longer message or improve the security with the same-length message comparing to the previous double-layered STCs. In our scheme, according to the theoretical deduction we can more precisely divide the secret payload into two parts which will be embedded in the first layer and the second layer of the cover respectively with binary STCs. When embed the message, we encourage to realize the double-layered embedding by ±1 modifications. But in order to further decrease the modifications and improve the time efficient, we allow few pixels to be modified by ±2. Experiment results demonstrate that while applying this double-layered STCs to the adaptive steganographic algorithms, the embedding modifications become more concentrative and the number decreases, consequently the security of steganography is improved.
The prodigious amount of user-generated content continues to grow at an enormous rate. While it greatly facilitates the flow of information and ideas among people and communities, it may pose great threat to our individual privacy. In this paper, we demonstrate that the private traits of individuals can be inferred from user-generated content by using text classification techniques. Specifically, we study three private attributes on Twitter users: religion, political leaning, and marital status. The ground truth labels of the private traits can be readily collected from the Twitter bio field. Based on the tweets posted by the users and their corresponding bios, we show that text classification yields a high accuracy of identification of these personal attributes, which poses a great privacy risk on user-generated content. We further propose a constrained utility maximization framework for preserving user privacy. The goal is to maximize the utility of data when modifying the user-generated content, while degrading the prediction performance of the adversary. The KL divergence is minimized between the prior knowledge about the private attribute and the posterior probability after seeing the user-generated data. Based on this proposed framework, we investigate several specific data sanitization operations for privacy preservation: add, delete, or replace words in the tweets. We derive the exact transformation of the data under each operation. The experiments demonstrate the effectiveness of the proposed framework.
A new paradigm in wireless network access is presented and analyzed. In this concept, certain classes of wireless terminals can be turned temporarily into an access point (AP) anytime while connected to the Internet. This creates a dynamic network architecture (DNA) since the number and location of these APs vary in time. In this paper, we present a framework to optimize different aspects of this architecture. First, the dynamic AP association problem is addressed with the aim to optimize the network by choosing the most convenient APs to provide the quality-of-service (QoS) levels demanded by the users with the minimum cost. Then, an economic model is developed to compensate the users for serving as APs and, thus, augmenting the network resources. The users' security investment is also taken into account in the AP selection. A preclustering process of the DNA is proposed to keep the optimization process feasible in a high dense network. To dynamically reconfigure the optimum topology and adjust it to the traffic variations, a new specific encoding of genetic algorithm (GA) is presented. Numerical results show that GA can provide the optimum topology up to two orders of magnitude faster than exhaustive search for network clusters, and the improvement significantly increases with the cluster size.
C programming language never performs automatic bounds checking in order to speed up execution. But bounds checking is absolutely necessary in any program. Because if a variable is out-of-bounds, some serious errors may occur during execution, such as endless loop or buffer overflows. When there are arrays used in a program, the index of an array must be within the boundary of the array. But programmers always miss the array bounds checking or do not perform a correct array bounds checking. In this paper, we perform static analysis based on taint analysis and data flow analysis to detect which arrays do not have correct array bounds checking in the program. And we implement an automatic static tool, Carraybound. And the experimental results show that Carraybound can work effectively and efficiently.
Preventive and reactive security measures can only partially mitigate the damage caused by modern ransomware attacks. Indeed, the remarkable amount of illicit profit and the cyber-criminals' increasing interest in ransomware schemes suggest that a fair number of users are actually paying the ransoms. Unfortunately, pure-detection approaches (e.g., based on analysis sandboxes or pipelines) are not sufficient nowadays, because often we do not have the luxury of being able to isolate a sample to analyze, and when this happens it is already too late for several users! We believe that a forward-looking solution is to equip modern operating systems with practical self-healing capabilities against this serious threat. Towards such a vision, we propose ShieldFS, an add-on driver that makes the Windows native filesystem immune to ransomware attacks. For each running process, ShieldFS dynamically toggles a protection layer that acts as a copy-on-write mechanism, according to the outcome of its detection component. Internally, ShieldFS monitors the low-level filesystem activity to update a set of adaptive models that profile the system activity over time. Whenever one or more processes violate these models, their operations are deemed malicious and the side effects on the filesystem are transparently rolled back. We designed ShieldFS after an analysis of billions of low-level, I/O filesystem requests generated by thousands of benign applications, which we collected from clean machines in use by real users for about one month. This is the first measurement on the filesystem activity of a large set of benign applications in real working conditions. We evaluated ShieldFS in real-world working conditions on real, personal machines, against samples from state of the art ransomware families. ShieldFS was able to detect the malicious activity at runtime and transparently recover all the original files. Although the models can be tuned to fit various filesystem usage profiles, our results show that our initial tuning yields high accuracy even on unseen samples and variants.
Malicious domains are key components to a variety of cyber attacks. Several recent techniques are proposed to identify malicious domains through analysis of DNS data. The general approach is to build classifiers based on DNS-related local domain features. One potential problem is that many local features, e.g., domain name patterns and temporal patterns, tend to be not robust. Attackers could easily alter these features to evade detection without affecting much their attack capabilities. In this paper, we take a complementary approach. Instead of focusing on local features, we propose to discover and analyze global associations among domains. The key challenges are (1) to build meaningful associations among domains; and (2) to use these associations to reason about the potential maliciousness of domains. For the first challenge, we take advantage of the modus operandi of attackers. To avoid detection, malicious domains exhibit dynamic behavior by, for example, frequently changing the malicious domain-IP resolutions and creating new domains. This makes it very likely for attackers to reuse resources. It is indeed commonly observed that over a period of time multiple malicious domains are hosted on the same IPs and multiple IPs host the same malicious domains, which creates intrinsic association among them. For the second challenge, we develop a graph-based inference technique over associated domains. Our approach is based on the intuition that a domain having strong associations with known malicious domains is likely to be malicious. Carefully established associations enable the discovery of a large set of new malicious domains using a very small set of previously known malicious ones. Our experiments over a public passive DNS database show that the proposed technique can achieve high true positive rates (over 95%) while maintaining low false positive rates (less than 0.5%). Further, even with a small set of known malicious domains (a couple of hundreds), our technique can discover a large set of potential malicious domains (in the scale of up to tens of thousands).
Function Secret Sharing (FSS), introduced by Boyle et al. (Eurocrypt 2015), provides a way for additively secret-sharing a function from a given function family F. More concretely, an m-party FSS scheme splits a function f : \0, 1\n -textgreater G, for some abelian group G, into functions f1,...,fm, described by keys k1,...,km, such that f = f1 + ... + fm and every strict subset of the keys hides f. A Distributed Point Function (DPF) is a special case where F is the family of point functions, namely functions f\_\a,b\ that evaluate to b on the input a and to 0 on all other inputs. FSS schemes are useful for applications that involve privately reading from or writing to distributed databases while minimizing the amount of communication. These include different flavors of private information retrieval (PIR), as well as a recent application of DPF for large-scale anonymous messaging. We improve and extend previous results in several ways: * Simplified FSS constructions. We introduce a tensoring operation for FSS which is used to obtain a conceptually simpler derivation of previous constructions and present our new constructions. * Improved 2-party DPF. We reduce the key size of the PRG-based DPF scheme of Boyle et al. roughly by a factor of 4 and optimize its computational cost. The optimized DPF significantly improves the concrete costs of 2-server PIR and related primitives. * FSS for new function families. We present an efficient PRG-based 2-party FSS scheme for the family of decision trees, leaking only the topology of the tree and the internal node labels. We apply this towards FSS for multi-dimensional intervals. We also present a general technique for extending FSS schemes by increasing the number of parties. * Verifiable FSS. We present efficient protocols for verifying that keys (k*/1,...,k*/m ), obtained from a potentially malicious user, are consistent with some f in F. Such a verification may be critical for applications that involve private writing or voting by many users.
We tackle the problem of automated exploit generation for web applications. In this regard, we present an approach that significantly improves the state-of-art in web injection vulnerability identification and exploit generation. Our approach for exploit generation tackles various challenges associated with typical web application characteristics: their multi-module nature, interposed user input, and multi-tier architectures using a database backend. Our approach develops precise models of application workflows, database schemas, and native functions to achieve high quality exploit generation. We implemented our approach in a tool called Chainsaw. Chainsaw was used to analyze 9 open source applications and generated over 199 first- and second-order injection exploits combined, significantly outperforming several related approaches.
One of the most challenging issues facing academic conferences and educational institutions today is plagiarism detection. Typically, these entities wish to ensure that the work products submitted to them have not been plagiarized from another source (e.g., authors submitting identical papers to multiple journals). Assembling large centralized databases of documents dramatically improves the effectiveness of plagiarism detection techniques, but introduces a number of privacy and legal issues: all document contents must be completely revealed to the database operator, making it an attractive target for abuse or attack. Moreover, this content aggregation involves the disclosure of potentially sensitive private content, and in some cases this disclosure may be prohibited by law. In this work, we introduce Elxa, the first scalable centralized plagiarism detection system that protects the privacy of the submissions. Elxa incorporates techniques from the current state of the art in plagiarism detection, as evaluated by the information retrieval community. Our system is designed to be operated on existing cloud computing infrastructure, and to provide incentives for the untrusted database operator to maintain the availability of the network. Elxa can be used to detect plagiarism in student work, duplicate paper submissions (and their associated peer reviews), similarities between confidential reports (e.g., malware summaries), or any approximate text reuse within a network of private documents. We implement a prototype using the Hadoop MapReduce framework, and demonstrate that it is feasible to achieve competitive detection effectiveness in the private setting.
The popularity of digital currencies, especially cryptocurrencies, has been continuously growing since the appearance of Bitcoin. Bitcoin is a peer-to-peer (P2P) cryptocurrency protocol enabling transactions between individuals without the need of a trusted authority. Its network is formed from resources contributed by individuals known as miners. Users of Bitcoin currency create transactions that are stored in a specialised data structure called a block chain. Bitcoin's security lies in a proof-of-work scheme, which requires high computational resources at the miners. These miners have to be synchronised with any update in the network, which produces high data traffic rates. Despite advances in mobile technology, no cryptocurrencies have been proposed for mobile devices. This is largely due to the lower processing capabilities of mobile devices when compared with conventional computers and the poorer Internet connectivity to that of the wired networking. In this work, we propose LocalCoin, an alternative cryptocurrency that requires minimal computational resources, produces low data traffic and works with off-the-shelf mobile devices. LocalCoin replaces the computational hardness that is at the root of Bitcoin's security with the social hardness of ensuring that all witnesses to a transaction are colluders. It is based on opportunistic networking rather than relying on infrastructure and incorporates characteristics of mobile networks such as users' locations and their coverage radius in order to employ an alternative proof-of-work scheme. Localcoin features (i) a lightweight proof-of-work scheme and (ii) a distributed block chain.
Path prediction on the Internet has been a topic of research in the networking community for close to a decade. Applications of path prediction solutions have ranged from optimizing selection of peers in peer- to-peer networks to improving and debugging CDN predictions. Recently, revelations of traffic correlation and surveillance on the Internet have raised the topic of path prediction in the context of network security. Specifically, predicting network paths can allow us to identify and avoid given organizations on network paths (e.g., to avoid traffic correlation attacks in Tor) or to infer the impact of hijacks and interceptions when direct measurements are not available. In this poster we propose the design and implementation of PathCache which aims to reuse measurement data to estimate AS level paths on the Internet. Unlike similar systems, PathCache does not assume that routing on the Internet is destination based. Instead, we develop an algorithm to compute confidence in paths between ASes. These multiple paths ranked by their confidence values are returned to the user.
Hardware-based enclave protection mechanisms, such as Intelâs SGX, ARMâs TrustZone, and Appleâs Secure Enclave, can protect code and data from powerful low-level attackers. In this work, we use enclaves to enforce strong application-specific information security policies. We present IMPE, a novel calculus that captures the essence of SGX-like enclave mechanisms, and show that a security-type system for IMPE can enforce expressive confidentiality policies (including erasure policies and delimited release policies) against powerful low-level attackers, including attackers that can arbitrarily corrupt non-enclave code, and, under some circumstances, corrupt enclave code. We present a translation from an expressive security-typed calculus (that is not aware of enclaves) to IMPE. The translation automatically places code and data into enclaves to enforce the security policies of the source program.
Mobile code distribution relies on digital signatures to guarantee code authenticity. Unfortunately, standard signature schemes are not well suited for use in conjunction with program transformation techniques, such as aspect-oriented programming. With these techniques, code development is performed in sequence by multiple teams of programmers. This is fundamentally different from traditional single-developer/ single-user models, where users can verify end-to-end (i.e., developer-to-user) authenticity of the code using digital signatures. To address this limitation, we introduce FLEX, a flexible code authentication framework for mobile applications. FLEX allows semi-trusted intermediaries to modify mobile code without invalidating the developer's signature, as long as the modification complies with a "contract" issued by the developer. We introduce formal definitions for secure code modification, and show that our instantiation of FLEX is secure under these definitions. Although FLEX can be instantiated using any language, we design AMJ–a novel programming language that supports code annotations–and implement a FLEX prototype based on our new language.
The low-level C++ programming language is ubiquitously used for its modularity and performance. Typecasting is a fundamental concept in C++ (and object-oriented programming in general) to convert a pointer from one object type into another. However, downcasting (converting a base class pointer to a derived class pointer) has critical security implications due to potentially different object memory layouts. Due to missing type safety in C++, a downcasted pointer can violate a programmer's intended pointer semantics, allowing an attacker to corrupt the underlying memory in a type-unsafe fashion. This vulnerability class is receiving increasing attention and is known as type confusion (or bad-casting). Several existing approaches detect different forms of type confusion, but these solutions are severely limited due to both high run-time performance overhead and low detection coverage. This paper presents TypeSan, a practical type-confusion detector which provides both low run-time overhead and high detection coverage. Despite improving the coverage of state-of-the-art techniques, TypeSan significantly reduces the type-confusion detection overhead compared to other solutions. TypeSan relies on an efficient per-object metadata storage service based on a compact memory shadowing scheme. Our scheme treats all the memory objects (i.e., globals, stack, heap) uniformly to eliminate extra checks on the fast path and relies on a variable compression ratio to minimize run-time performance and memory overhead. Our experimental results confirm that TypeSan is practical, even when explicitly checking almost all the relevant typecasts in a given C++ program. Compared to the state of the art, TypeSan yields orders of magnitude higher coverage at 4–10 times lower performance overhead on SPEC and 2 times on Firefox. As a result, our solution offers superior protection and is suitable for deployment in production software. Moreover, our highly efficient metadata storage back-end is potentially useful for other defenses that require memory object tracking.