Biblio
We investigate minimization of tree pattern queries that use the child relation, descendant relation, node labels, and wildcards. We prove that minimization for such tree patterns is Sigma2P-complete and thus solve a problem first attacked by Flesca, Furfaro, and Masciari in 2003. We first provide an example that shows that tree patterns cannot be minimized by deleting nodes. This example shows that the M-NR conjecture, which states that minimality of tree patterns is equivalent to their nonredundancy, is false. We then show how the example can be turned into a gadget that allows us to prove Sigma2P-completeness.
In this paper we provide a framework to analyze the effect of uniform sampling on graph optimization problems. Interestingly, we apply this framework to a general class of graph optimization problems that we call heavy subgraph problems, and show that uniform sampling preserves a 1-ε approximate solution to these problems. This class contains many interesting problems such as densest subgraph, directed densest subgraph, densest bipartite subgraph, d-max cut, and d-sum-max clustering. As an immediate impact of this result, one can use uniform sampling to solve these problems in streaming, turnstile or Map-Reduce settings. Indeed, our results by characterizing heavy subgraph problems address Open Problem 13 at the IITK Workshop on Algorithms for Data Streams in 2006 regarding the effects of subsampling, in the context of graph streams. Recently Bhattacharya et al. in STOC 2015 provide the first one pass algorithm for the densest subgraph problem in the streaming model with additions and deletions to its edges, i.e., for dynamic graph streams. They present a (0.5-ε)-approximation algorithm using \textasciitildeO(n) space, where factors of ε and log(n) are suppressed in the \textasciitildeO notation. In this paper we improve the (0.5-ε)-approximation algorithm of Bhattacharya et al. by providing a (1-ε)-approximation algorithm using \textasciitildeO(n) space.
Unlike most social media, where automatic archiving of data is the default, Snapchat defaults to ephemerality: deleting content shortly after it is viewed by a receiver. Interviews with 25 Snapchat users show that ephemerality plays a key role in shaping their practices. Along with friend-adding features that facilitate a network of mostly close relations, default deletion affords everyday, mundane talk and reduces self-consciousness while encouraging playful interaction. Further, although receivers can save content through screenshots, senders are notified; this selective saving with notification supports complex information norms that preserve the feel of ephemeral communication while supporting the capture of meaningful content. This dance of giving and taking, sharing and showing, and agency for both senders and receivers provides the basis for a rich design space of mechanisms, levels, and domains for ephemerality.
Conventional overwriting-based and encryption-based secure deletion schemes can only sanitize data. However, the past existence of the deleted data may leave artifacts in the layout at all layers of a computing system. These structural artifacts may be utilized by the adversary to infer sensitive information about the deleted data or even to fully recover them. The conventional secure deletion solutions unfortunately cannot sanitize them. In this work, we introduce truly secure deletion, a novel security notion that is much stronger than the conventional secure deletion. Truly secure deletion requires sanitizing both the obsolete data as well as the corresponding structural artifacts, so that the resulting storage layout after a delete operation is indistinguishable from that the deleted data never appeared. We propose TedFlash, a Truly secure deletion scheme for Flash-based block devices. TedFlash can successfully sanitize both the data and the structural artifacts, while satisfying the design constraints imposed for flash memory. Security analysis and experimental evaluation show that TedFlash can achieve the truly secure deletion guarantee with a small additional overhead compared to conventional secure deletion solutions.
In the cloud storage, users lose direct control over their data. How to surely delete data in the cloud becomes a crucial problem for a secure cloud storage system. The existing way to this problem is to encrypt the data before outsourcing and destroy the encryption key when deleting. However, this solution may cause heavy computation overhead for the user-side and the encrypted data remains intact in the cloud after the deletion operation. To solve this challenge problem, we propose a novel method to surely delete data in the cloud storage by overwriting. Different from existing works, our scheme is efficient in the user-side and is able to wipe out the deleted data from the drives of the cloud servers.
The Elliptic Curve Cryptosystems(ECC) are proved to be the cryptosystem of future generation because of its smaller key size and uncompromised security. It is well suited for applications running in resource-restricted devices such as smart cards. At present, there is no efficient algorithm or known sub-exponential algorithm to break ECC theoretically. However, a hardware implementation of ECC leaks secret key information due to power analysis attacks particularly differential power analysis attack(DPA). These attacks break the system with far less effort when compared to all other attacks based on algebraic weaknesses of the algorithms. There are many solutions to overcome the power analysis attack, but all the available solutions have their own advantages and disadvantages by compromising either its security or performance. In this paper, we present a secure and efficient algorithm to solve the elliptic curve scalar multiplication(ECSM) using initial points randomization and by delaying the point addition operation. The implementation results and performance analysis shows that the proposed algorithm is efficient and secure against power analysis attacks.
Cellular towers capture logs of mobile subscribers whenever their devices connect to the network. When the logs show data traffic at a cell tower generated by a device, it reveals that this device is close to the tower. The logs can then be used to trace the locations of mobile subscribers for different applications, such as studying customer behaviour, improving location-based services, or helping urban planning. However, the logs often suffer from an oscillation phenomenon. Oscillations may happen when a device, even when not moving, does not only connect to the nearest cell tower, but is instead unpredictably switching between multiple cell towers because of random noise, load balancing, or simply dynamic changes in signal strength. Detecting and removing oscillations are a challenge when analyzing location data collected from the cellular network. In this paper, we propose an algorithm called SOL (Stable, Oscillation, Leap periods) aimed at discovering and reducing oscillations in the collected logs. We apply our algorithm on real datasets which contain about 18.9\textasciitildeTB of traffic logs generated by more than 3\textasciitildemillion mobile subscribers covering about 21000 cell towers and collected during 27\textasciitildedays from both GSM and UMTS networks in northern China. Experimental results demonstrate the ability and effectiveness of SOL to reduce oscillations in cellular network logs.
Among the signature schemes most widely deployed in practice are the DSA (Digital Signature Algorithm) and its elliptic curves variant ECDSA. They are represented in many international standards, including IEEE P1363, ANSI X9.62, and FIPS 186-4. Their popularity stands in stark contrast to the absence of rigorous security analyses: Previous works either study modified versions of (EC)DSA or provide a security analysis of unmodified ECDSA in the generic group model. Unfortunately, works following the latter approach assume abstractions of non-algebraic functions over generic groups for which it remains unclear how they translate to the security of ECDSA in practice. For instance, it has been pointed out that prior results in the generic group model actually establish strong unforgeability of ECDSA, a property that the scheme de facto does not possess. As, further, no formal results are known for DSA, understanding the security of both schemes remains an open problem. In this work we propose GenericDSA, a signature framework that subsumes both DSA and ECDSA in unmodified form. It carefully models the "modulo q" conversion function of (EC)DSA as a composition of three independent functions. The two outer functions mimic algebraic properties in the function's domain and range, the inner one is modeled as a bijective random oracle. We rigorously prove results on the security of GenericDSA that indicate that forging signatures in (EC)DSA is as hard as solving discrete logarithms. Importantly, our proofs do not assume generic group behavior.
Motivated by the growing complexity and heterogeneity of modern data centers, and the prevalence of commodity component failures, this paper studies the failure-aware placement problem of placing tasks of a parallel job on machines in the data center with the goal of increasing availability. We consider two models of failures: adversarial and probabilistic. In the adversarial model, each node has a weight (higher weight implying higher reliability) and the adversary can remove any subset of nodes of total weight at most a given bound W and our goal is to find a placement that incurs the least disruption against such an adversary. In the probabilistic model, each node has a probability of failure and we need to find a placement that maximizes the probability that at least K out of N tasks survive at any time. For adversarial failures, we first show that (i) the problems are in Σ2, the second level of the polynomial hierarchy, (ii) a basic variant, that we call RobustFAP, is co-NP-hard, and (iii) an all-or-nothing version of RobustFAP is Σ2-complete. We then give a PTAS for RobustFAP, a key ingredient of which is a solution that we design for a fractional version of RobustFAP. We then study fractional RobustFAP over hierarchies, denoted HierRobustFAP, and introduce a notion of hierarchical max-min fairness/ and a novel Generalized Spreading/ algorithm which is simultaneously optimal for all W. These generalize the classical notion of max-min fairness to work with nodes of differing capacities, differing reliability weights and hierarchical structures. Using randomized rounding, we extend this to give an algorithm for integral HierRobustFAP. For the probabilistic version, we first give an algorithm that achieves an additive ε approximation in the failure probability for the single level version, called ProbFAP, while giving up a (1 + ε) multiplicative factor in the number of failures. We then extend the result to the hierarchical version, HierProbFAP, achieving an ε additive approximation in failure probability while giving up an (L + ε) multiplicative factor in the number of failures, where \$L\$ is the number of levels in the hierarchy.
Networks are a fundamental tool for understanding and modeling complex systems in physics, biology, neuroscience, engineering, and social science. Many networks are known to exhibit rich, lower-order connectivity patterns that can be captured at the level of individual nodes and edges. However, higher-order organization of complex networks – at the level of small network subgraphs – remains largely unknown. Here, we develop a generalized framework for clustering networks on the basis of higher-order connectivity patterns. This framework provides mathematical guarantees on the optimality of obtained clusters and scales to networks with billions of edges. The framework reveals higher-order organization in a number of networks, including information propagation units in neuronal networks and hub structure in transportation networks. Results show that networks exhibit rich higher-order organizational structures that are exposed by clustering based on higher-order connectivity patterns. Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.
We explicitly construct an extractor for two independent sources on n bits, each with polylogarithmic min-entropy. Our extractor outputs one bit and has polynomially small error. The best previous extractor, by Bourgain, required each source to have min-entropy .499n. A key ingredient in our construction is an explicit construction of a monotone, almost-balanced Boolean functions that are resilient to coalitions. In fact, our construction is stronger in that it gives an explicit extractor for a generalization of non-oblivious bit-fixing sources on n bits, where some unknown n-q bits are chosen almost polylogarithmic-wise independently, and the remaining q bits are chosen by an adversary as an arbitrary function of the n-q bits. The best previous construction, by Viola, achieved q quadratically smaller than our result. Our explicit two-source extractor directly implies improved constructions of a K-Ramsey graph over N vertices, improving bounds obtained by Barak et al. and matching independent work by Cohen.
Cryptographic hash functions are used to protect the integrity of information. Hash functions are designed by using existing block ciphers as compression functions. This is due to challenges and difficulties that are encountered in constructing new hash functions from the scratch. However, the key generations for encryption process result to huge computational cost which affects the efficiency of the hash function. This paper proposes a new, secure and efficient compression function based on a pseudorandom function, that takes in two 2n-bits inputs and produce one n-bit output (2n-to-n bit). In addition, a new keyed hash function with three variants is proposed (PinTar 128 bits, 256 bits and 512 bits) which uses the proposed compression as its underlying building block. Statistical analysis shows that the compression function is an efficient one way random function. Similarly, statistical analysis of the keyed hash function shows that the proposed keyed function has strong avalanche property and is resistant to key exhaustive search attack. The proposed key hash function can be used as candidate for developing security systems.
We propose pseudo random bit generator(PRBG) by constructing an ergodic orbit of logistic map generated using intermittent perturbation and sampling the orbit using Bernoulli shift map. The pseudo randomness of the key streams is tested with the standard NIST statistical test suite. The key streams generated by the proposed PRBG pass all the tests of the test suite. As an application of PRBG we propose chaotic stream ciphers. Goodness-of-Fit tests are conducted for these ciphers. Stream ciphers generated using PRBG are shown to generate ciphers in which the distribution of the characters show significant correlation to the uniform distribution. It is also shown that the ciphers exhibit good avalanche effect when the initial seed is varied infinitesimally and we can also clearly observe the avalanche effect at different points within the ciphertext as the ciphertext is being generated. Also we show that the proposed PRBG is on par with conventional PRBG RC4. This indeed vindicates our hypothesis that stream of bits obtained from orbits of logistic map using perturbation and Bernoulli shift map exhibits randomness.
Growing numbers of ubiquitous electronic devices and services motivate the need for effortless user authentication and identification. While biometrics are a natural means of achieving these goals, their use poses privacy risks, due mainly to the difficulty of preventing theft and abuse of biometric data. One way to minimize information leakage is to derive biometric keys from users' raw biometric measurements. Such keys can be used in subsequent security protocols and ensure that no sensitive biometric data needs to be transmitted or permanently stored. This paper is the first attempt to explore the use of human body impedance as a biometric trait for deriving secret keys. Building upon Randomized Biometric Templates as a key generation scheme, we devise a mechanism that supports consistent regeneration of unique keys from users' impedance measurements. The underlying set of biometric features are found using a feature learning technique based on Siamese networks. Compared to prior feature extraction methods, the proposed technique offers significantly improved recognition rates in the context of key generation. Besides computing experimental error rates, we tailor a known key guessing approach specifically to the used key generation scheme and assess security provided by the resulting keys. We give a very conservative estimate of the number of guesses an adversary must make to find a correct key. Results show that the proposed key generation approach produces keys comparable to those obtained by similar methods based on other biometrics.
Wireless Sensor Network (WSN) consists of a numerous of small devices called sensor which has a limitation in resources such as low energy, memory, and computation. Sensors deployed in a harsh environment and vulnerable to various security issues and due to the resource restriction in a sensor, key management and provide robust security in this type of networks is a challenge. keys may be used in two ways in cryptography is symmetric or asymmetric, asymmetric is required more communication, memory, and computing when compared with symmetric, so it is not appropriate for WSN. In this paper, key management scheme based on symmetric keys has been proposed where each node uses pseudo-random generator (PRNG)to generate key that is shared with base station based on pre-distributed initial key and CBC - RC5 to reached to confidently, integrity and authentication.
With the help of rapidly developing technology, DNA sequencing is becoming less expensive. As a consequence, the research in genomics has gained speed in paving the way to personalized (genomic) medicine, and geneticists need large collections of human genomes to further increase this speed. Furthermore, individuals are using their genomes to learn about their (genetic) predispositions to diseases, their ancestries, and even their (genetic) compatibilities with potential partners. This trend has also caused the launch of health-related websites and online social networks (OSNs), in which individuals share their genomic data (e.g., OpenSNP or 23andMe). On the other hand, genomic data carries much sensitive information about its owner. By analyzing the DNA of an individual, it is now possible to learn about his disease predispositions (e.g., for Alzheimer's or Parkinson's), ancestries, and physical attributes. The threat to genomic privacy is magnified by the fact that a person's genome is correlated to his family members' genomes, thus leading to interdependent privacy risks. This short tutorial will help computer scientists better understand the privacy and security challenges in today's genomic era. We will first highlight the significance of genomic data and the threats for genomic privacy. Then, we will present the high level descriptions of the proposed solutions to protect the privacy of genomic data and we will discuss future research directions. No prerequisite knowledge on biology or genomics is required for the attendees of this proposal. We only require the attendees to have a slight background on cryptography and statistics.
Personalized medicine brings the promise of better diagnoses, better treatments, a higher quality of life and increased longevity. To achieve these noble goals, it exploits a number of revolutionary technologies, including genome sequencing and DNA editing, as well as wearable devices and implantable or even edible biosensors. In parallel, the popularity of "quantified self" gadgets shows the willingness of citizens to be more proactive with respect to their own health. Yet, this evolution opens the door to all kinds of abuses, notably in terms of discrimination, blackmailing, stalking, and subversion of devices. After giving a general description of this situation, in this talk we will expound on some of the main concerns, including the temptation to permanently and remotely monitor the physical (and metabolic) activity of individuals. We will describe the potential and the limitations of techniques such as cryptography (including secure multi-party computation), trusted hardware and differential privacy. We will also discuss the notion of consent in the face of the intrinsic correlations of human data. We will argue in favor of a more systematic, principled and cross-disciplinary research effort in this field and will discuss the motives of the various stakeholders.
Microfluidics is an interdisciplinary science focusing on the development of devices and systems that process low volumes of fluid for applications such as high throughput DNA sequencing, immunoassays, and entire Labs-on-Chip platforms. Microfluidic diagnostic technology enables these advances by facilitating the miniaturization and integration of complex biochemical processing through a microfluidic biochip [1]. This approach tightly couples the biochemical operations, sensing system, control algorithm, and droplet-based biochip. During the process the status of a droplet is monitored in real-time to detect operational errors. If an error has occurred, the control algorithm dynamically reconfigures to allow recovery and rescheduling of on-chip operations. During this recovery procedure the droplet that is the source of the error is discarded to prevent the propagation of the error and the operation is repeated. Threats to the operation of the microfluidics biochip include (1) integrity: an attack can modify control electrodes to corrupt the diagnosis, and (2) privacy: what can a user/operator deduce about the diagnosis? It is challenging to describe both these aspects using existing models; as Figure 1 depicts there are multiple security domains, Unidirectional information flows shown in black indicate undesirable flows, the bidirectional black arrows indicate desirable, but possibly corrupted, information flows, and the unidirectional red arrows indicate undesirable information flows. As with Stuxnet, a bidirectional, deducible information flow is needed between the monitoring security domain and internal security domain (biochip) [2]. Simultaneously, the attacker and the operators should receive a nondeducible information flow. Likewise, the red attack arrows should be deducible to the internal domain. Our current security research direction uses the novel approach of Multiple Security Domain Nondeducibility [2] to explore the vulnerabilities of exploiting this error recovery process through information flow leakages and leads to protection of the system through desirable information flows.
The concept of being a Smart Community has been at least since 1999 when the Intelligent Community Forum (ICF) chose Singapore as its first Intelligent Community of the Year. The ICF's criteria have been refined over the years, but they still seek out places that "understand the enormous challenges of the Broadband Economy, and have taken conscious steps to create an economy capable of prospering in it." But what does that really mean in a world where cardboard VR viewers are given away as conference swag, high school students are creating augmented reality tours of their schools, and citizens can report a pothole, or a messy neighbor, on an app like SeeClickFix? There is also an important confluence between the fields of "Smart Communities" and the concept of "Safe and Secure Communities". A community cannot really be considered "smart" if its habitants do not field "safe" going about their daily routines. On the flip side, many of the tools that will enable and enhance public safety (sensor based networks, big data analytics, public participation in decision making) have significant security and privacy implications. An innovative new program is being created at the University of Calgary to help in "Designing Smart and Secure Communities" while preserving privacy and avoiding a "Big Brother" world which would be antithetical to the goals of being a Smart Community. Thoughtful experts in this field believe that we can have many for of the benefits of being a smart community, including enhanced safety, without having to give up too much of our personal privacy. They also acknowledge that this is a tricky balance to strike, and that it will be hard work. There are excellent examples from around the world of cities that have found innovative ways to involve their citizens in a meaningful way in decisions that affect their lives. It will also consider some "platform technologies" such as non-financial applications of the blockchain, which can be used to build trust and confidence in civic applications. Things can also go horribly wrong when citizen engagement projects are poorly designed and implemented. As a creepy cautionary tale, and a warning about what might be coming down the road, we just have to look at the controversial use of "DNA Shaming" in Hong Kong to catch spitters and litterbugs.
A new paradigm in wireless network access is presented and analyzed. In this concept, certain classes of wireless terminals can be turned temporarily into an access point (AP) anytime while connected to the Internet. This creates a dynamic network architecture (DNA) since the number and location of these APs vary in time. In this paper, we present a framework to optimize different aspects of this architecture. First, the dynamic AP association problem is addressed with the aim to optimize the network by choosing the most convenient APs to provide the quality-of-service (QoS) levels demanded by the users with the minimum cost. Then, an economic model is developed to compensate the users for serving as APs and, thus, augmenting the network resources. The users' security investment is also taken into account in the AP selection. A preclustering process of the DNA is proposed to keep the optimization process feasible in a high dense network. To dynamically reconfigure the optimum topology and adjust it to the traffic variations, a new specific encoding of genetic algorithm (GA) is presented. Numerical results show that GA can provide the optimum topology up to two orders of magnitude faster than exhaustive search for network clusters, and the improvement significantly increases with the cluster size.
In the public clouds, an adversary can co-locate his or her virtual machines (VMs) with others on the same physical servers to start an attack against the integrity, confidentiality or availability. The one important factor to decrease the likelihood of this co-location attack is the VMs placement strategy. However, a co-location resistant strategy will compromise the resources optimization of the cloud providers. The tradeoff between security and resources optimization introduces one of the most crucial challenges in the cloud security. In this work we propose a placement strategy allowing the decrease of co-location rate by compromising the VM startup time instead of the optimization of resources. We give a mathematical analysis to quantify the co-location resistance. The proposed strategy is evaluated against the abusing placement locality, where the attack and target VMs are launched simultaneously or within a short time window. Referring to EC2 placement strategy, the best co-location resistant strategy out of the existing public cloud providers strategies, our strategy decreases enormously the co-location attacks with a slight VM startup delay (relatively to the actual VM startup delay in the public cloud providers).
In a secret sharing scheme a dealer shares a secret s among n parties such that an adversary corrupting up to t parties does not learn s, while any t+1 parties can efficiently recover s. Over a long period of time all parties may be corrupted thus violating the threshold, which is accounted for in Proactive Secret Sharing (PSS). PSS schemes periodically rerandomize (refresh) the shares of the secret and invalidate old ones. PSS retains confidentiality even when all parties are corrupted over the lifetime of the secret, but no more than t during a certain window of time, called the refresh period. Existing PSS schemes only guarantee secrecy in the presence of an honest majority with less than n2 total corruptions during a refresh period; an adversary corrupting a single additional party, even if only passively, obtains the secret. This work is the first feasibility result demonstrating PSS tolerating a dishonest majority, it introduces the first PSS scheme secure against t passive adversaries without recovery of lost shares, it can also recover from honest faulty parties losing their shares, and when tolerating e faults the scheme tolerates t passive corruptions. A non-robust version of the scheme can tolerate t active adversaries, and mixed adversaries that control a combination of passively and actively corrupted parties that are a majority, but where less than n/2-e of such corruptions are active. We achieve these high thresholds with O(n4) communication when sharing a single secret, and O(n3) communication when sharing multiple secrets in batches.
Memory disclosure vulnerabilities enable an adversary to successfully mount arbitrary code execution attacks against applications via so-called just-in-time code reuse attacks, even when those applications are fortified with fine-grained address space layout randomization. This attack paradigm requires the adversary to first read the contents of randomized application code, then construct a code reuse payload using that knowledge. In this paper, we show that the recently proposed Execute-no-Read (XnR) technique fails to prevent just-in-time code reuse attacks. Next, we introduce the design and implementation of a novel memory permission primitive, dubbed No-Execute-After-Read (near), that foregoes the problems of XnR and provides strong security guarantees against just-in-time attacks in commodity binaries. Specifically, near allows all code to be disclosed, but prevents any disclosed code from subsequently being executed, thus thwarting just-in-time code reuse. At the same time, commodity binaries with mixed code and data regions still operate correctly, as legitimate data is still readable. To demonstrate the practicality and portability of our approach we implemented prototypes for both Linux and Android on the ARMv8 architecture, as well as a prototype that protects unmodified Microsoft Windows executables and dynamically linked libraries. In addition, our evaluation on the SPEC2006 benchmark demonstrates that our prototype has negligible runtime overhead, making it suitable for practical deployment.
Identifying threats contained within encrypted network traffic poses a unique set of challenges. It is important to monitor this traffic for threats and malware, but do so in a way that maintains the integrity of the encryption. Because pattern matching cannot operate on encrypted data, previous approaches have leveraged observable metadata gathered from the flow, e.g., the flow's packet lengths and inter-arrival times. In this work, we extend the current state-of-the-art by considering a data omnia approach. To this end, we develop supervised machine learning models that take advantage of a unique and diverse set of network flow data features. These data features include TLS handshake metadata, DNS contextual flows linked to the encrypted flow, and the HTTP headers of HTTP contextual flows from the same source IP address within a 5 minute window. We begin by exhibiting the differences between malicious and benign traffic's use of TLS, DNS, and HTTP on millions of unique flows. This study is used to design the feature sets that have the most discriminatory power. We then show that incorporating this contextual information into a supervised learning system significantly increases performance at a 0.00% false discovery rate for the problem of classifying encrypted, malicious flows. We further validate our false positive rate on an independent, real-world dataset.
Malicious I/O devices might compromise the OS using DMAs. The OS therefore utilizes the IOMMU to map and unmap every target buffer right before and after its DMA is processed, thereby restricting DMAs to their designated locations. This usage model, however, is not truly secure for two reasons: (1) it provides protection at page granularity only, whereas DMA buffers can reside on the same page as other data; and (2) it delays DMA buffer unmaps due to performance considerations, creating a vulnerability window in which devices can access in-use memory. We propose that OSes utilize the IOMMU differently, in a manner that eliminates these two flaws. Our new usage model restricts device access to a set of shadow DMA buffers that are never unmapped, and it copies DMAed data to/from these buffers, thus providing sub-page protection while eliminating the aforementioned vulnerability window. Our key insight is that the cost of interacting with, and synchronizing access to the slow IOMMU hardware–-required for zero-copy protection against devices–-make copying preferable to zero-copying. We implement our model in Linux and evaluate it with standard networking benchmarks utilizing a 40,Gb/s NIC. We demonstrate that despite being more secure than the safest preexisting usage model, our approach provides up to 5x higher throughput. Additionally, whereas it is inherently less scalable than an IOMMU-less (unprotected) system, our approach incurs only 0%–25% performance degradation in comparison.