Biblio
The prodigious amount of user-generated content continues to grow at an enormous rate. While it greatly facilitates the flow of information and ideas among people and communities, it may pose great threat to our individual privacy. In this paper, we demonstrate that the private traits of individuals can be inferred from user-generated content by using text classification techniques. Specifically, we study three private attributes on Twitter users: religion, political leaning, and marital status. The ground truth labels of the private traits can be readily collected from the Twitter bio field. Based on the tweets posted by the users and their corresponding bios, we show that text classification yields a high accuracy of identification of these personal attributes, which poses a great privacy risk on user-generated content. We further propose a constrained utility maximization framework for preserving user privacy. The goal is to maximize the utility of data when modifying the user-generated content, while degrading the prediction performance of the adversary. The KL divergence is minimized between the prior knowledge about the private attribute and the posterior probability after seeing the user-generated data. Based on this proposed framework, we investigate several specific data sanitization operations for privacy preservation: add, delete, or replace words in the tweets. We derive the exact transformation of the data under each operation. The experiments demonstrate the effectiveness of the proposed framework.
We present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data structure that is not already available through the API. We show how to build a history-independent cache-oblivious B-tree and a history-independent external-memory skip list. One of the main contributions is a data structure we build on the way–-a history-independent packed-memory array (PMA). The PMA supports efficient range queries, one of the most important operations for answering database queries. Our HI PMA matches the asymptotic bounds of prior non-HI packed-memory arrays and sparse tables. Specifically, a PMA maintains a dynamic set of elements in sorted order in a linear-sized array. Inserts and deletes take an amortized O(log2 N) element moves with high probability. Simple experiments with our implementation of HI PMAs corroborate our theoretical analysis. Comparisons to regular PMAs give preliminary indications that the practical cost of adding history-independence is not too large. Our HI cache-oblivious B-tree bounds match those of prior non-HI cache-oblivious B-trees. Searches take O(logB N) I/Os; inserts and deletes take O((log2 N)/B+ logB N) amortized I/Os with high probability; and range queries returning k elements take O(logB N + k/B) I/Os. Our HI external-memory skip list achieves optimal bounds with high probability, analogous to in-memory skip lists: O(logB N) I/Os for point queries and amortized O(logB N) I/Os for inserts/deletes. Range queries returning k elements run in O(logB N + k/B) I/Os. In contrast, the best possible high-probability bounds for inserting into the folklore B-skip list, which promotes elements with probability 1/B, is just Theta(log N) I/Os. This is no better than the bounds one gets from running an in-memory skip list in external memory.
We investigate minimization of tree pattern queries that use the child relation, descendant relation, node labels, and wildcards. We prove that minimization for such tree patterns is Sigma2P-complete and thus solve a problem first attacked by Flesca, Furfaro, and Masciari in 2003. We first provide an example that shows that tree patterns cannot be minimized by deleting nodes. This example shows that the M-NR conjecture, which states that minimality of tree patterns is equivalent to their nonredundancy, is false. We then show how the example can be turned into a gadget that allows us to prove Sigma2P-completeness.
In this paper we provide a framework to analyze the effect of uniform sampling on graph optimization problems. Interestingly, we apply this framework to a general class of graph optimization problems that we call heavy subgraph problems, and show that uniform sampling preserves a 1-ε approximate solution to these problems. This class contains many interesting problems such as densest subgraph, directed densest subgraph, densest bipartite subgraph, d-max cut, and d-sum-max clustering. As an immediate impact of this result, one can use uniform sampling to solve these problems in streaming, turnstile or Map-Reduce settings. Indeed, our results by characterizing heavy subgraph problems address Open Problem 13 at the IITK Workshop on Algorithms for Data Streams in 2006 regarding the effects of subsampling, in the context of graph streams. Recently Bhattacharya et al. in STOC 2015 provide the first one pass algorithm for the densest subgraph problem in the streaming model with additions and deletions to its edges, i.e., for dynamic graph streams. They present a (0.5-ε)-approximation algorithm using \textasciitildeO(n) space, where factors of ε and log(n) are suppressed in the \textasciitildeO notation. In this paper we improve the (0.5-ε)-approximation algorithm of Bhattacharya et al. by providing a (1-ε)-approximation algorithm using \textasciitildeO(n) space.
Unlike most social media, where automatic archiving of data is the default, Snapchat defaults to ephemerality: deleting content shortly after it is viewed by a receiver. Interviews with 25 Snapchat users show that ephemerality plays a key role in shaping their practices. Along with friend-adding features that facilitate a network of mostly close relations, default deletion affords everyday, mundane talk and reduces self-consciousness while encouraging playful interaction. Further, although receivers can save content through screenshots, senders are notified; this selective saving with notification supports complex information norms that preserve the feel of ephemeral communication while supporting the capture of meaningful content. This dance of giving and taking, sharing and showing, and agency for both senders and receivers provides the basis for a rich design space of mechanisms, levels, and domains for ephemerality.
Conventional overwriting-based and encryption-based secure deletion schemes can only sanitize data. However, the past existence of the deleted data may leave artifacts in the layout at all layers of a computing system. These structural artifacts may be utilized by the adversary to infer sensitive information about the deleted data or even to fully recover them. The conventional secure deletion solutions unfortunately cannot sanitize them. In this work, we introduce truly secure deletion, a novel security notion that is much stronger than the conventional secure deletion. Truly secure deletion requires sanitizing both the obsolete data as well as the corresponding structural artifacts, so that the resulting storage layout after a delete operation is indistinguishable from that the deleted data never appeared. We propose TedFlash, a Truly secure deletion scheme for Flash-based block devices. TedFlash can successfully sanitize both the data and the structural artifacts, while satisfying the design constraints imposed for flash memory. Security analysis and experimental evaluation show that TedFlash can achieve the truly secure deletion guarantee with a small additional overhead compared to conventional secure deletion solutions.
In the cloud storage, users lose direct control over their data. How to surely delete data in the cloud becomes a crucial problem for a secure cloud storage system. The existing way to this problem is to encrypt the data before outsourcing and destroy the encryption key when deleting. However, this solution may cause heavy computation overhead for the user-side and the encrypted data remains intact in the cloud after the deletion operation. To solve this challenge problem, we propose a novel method to surely delete data in the cloud storage by overwriting. Different from existing works, our scheme is efficient in the user-side and is able to wipe out the deleted data from the drives of the cloud servers.
The Elliptic Curve Cryptosystems(ECC) are proved to be the cryptosystem of future generation because of its smaller key size and uncompromised security. It is well suited for applications running in resource-restricted devices such as smart cards. At present, there is no efficient algorithm or known sub-exponential algorithm to break ECC theoretically. However, a hardware implementation of ECC leaks secret key information due to power analysis attacks particularly differential power analysis attack(DPA). These attacks break the system with far less effort when compared to all other attacks based on algebraic weaknesses of the algorithms. There are many solutions to overcome the power analysis attack, but all the available solutions have their own advantages and disadvantages by compromising either its security or performance. In this paper, we present a secure and efficient algorithm to solve the elliptic curve scalar multiplication(ECSM) using initial points randomization and by delaying the point addition operation. The implementation results and performance analysis shows that the proposed algorithm is efficient and secure against power analysis attacks.
Cellular towers capture logs of mobile subscribers whenever their devices connect to the network. When the logs show data traffic at a cell tower generated by a device, it reveals that this device is close to the tower. The logs can then be used to trace the locations of mobile subscribers for different applications, such as studying customer behaviour, improving location-based services, or helping urban planning. However, the logs often suffer from an oscillation phenomenon. Oscillations may happen when a device, even when not moving, does not only connect to the nearest cell tower, but is instead unpredictably switching between multiple cell towers because of random noise, load balancing, or simply dynamic changes in signal strength. Detecting and removing oscillations are a challenge when analyzing location data collected from the cellular network. In this paper, we propose an algorithm called SOL (Stable, Oscillation, Leap periods) aimed at discovering and reducing oscillations in the collected logs. We apply our algorithm on real datasets which contain about 18.9\textasciitildeTB of traffic logs generated by more than 3\textasciitildemillion mobile subscribers covering about 21000 cell towers and collected during 27\textasciitildedays from both GSM and UMTS networks in northern China. Experimental results demonstrate the ability and effectiveness of SOL to reduce oscillations in cellular network logs.
Among the signature schemes most widely deployed in practice are the DSA (Digital Signature Algorithm) and its elliptic curves variant ECDSA. They are represented in many international standards, including IEEE P1363, ANSI X9.62, and FIPS 186-4. Their popularity stands in stark contrast to the absence of rigorous security analyses: Previous works either study modified versions of (EC)DSA or provide a security analysis of unmodified ECDSA in the generic group model. Unfortunately, works following the latter approach assume abstractions of non-algebraic functions over generic groups for which it remains unclear how they translate to the security of ECDSA in practice. For instance, it has been pointed out that prior results in the generic group model actually establish strong unforgeability of ECDSA, a property that the scheme de facto does not possess. As, further, no formal results are known for DSA, understanding the security of both schemes remains an open problem. In this work we propose GenericDSA, a signature framework that subsumes both DSA and ECDSA in unmodified form. It carefully models the "modulo q" conversion function of (EC)DSA as a composition of three independent functions. The two outer functions mimic algebraic properties in the function's domain and range, the inner one is modeled as a bijective random oracle. We rigorously prove results on the security of GenericDSA that indicate that forging signatures in (EC)DSA is as hard as solving discrete logarithms. Importantly, our proofs do not assume generic group behavior.
Motivated by the growing complexity and heterogeneity of modern data centers, and the prevalence of commodity component failures, this paper studies the failure-aware placement problem of placing tasks of a parallel job on machines in the data center with the goal of increasing availability. We consider two models of failures: adversarial and probabilistic. In the adversarial model, each node has a weight (higher weight implying higher reliability) and the adversary can remove any subset of nodes of total weight at most a given bound W and our goal is to find a placement that incurs the least disruption against such an adversary. In the probabilistic model, each node has a probability of failure and we need to find a placement that maximizes the probability that at least K out of N tasks survive at any time. For adversarial failures, we first show that (i) the problems are in Σ2, the second level of the polynomial hierarchy, (ii) a basic variant, that we call RobustFAP, is co-NP-hard, and (iii) an all-or-nothing version of RobustFAP is Σ2-complete. We then give a PTAS for RobustFAP, a key ingredient of which is a solution that we design for a fractional version of RobustFAP. We then study fractional RobustFAP over hierarchies, denoted HierRobustFAP, and introduce a notion of hierarchical max-min fairness/ and a novel Generalized Spreading/ algorithm which is simultaneously optimal for all W. These generalize the classical notion of max-min fairness to work with nodes of differing capacities, differing reliability weights and hierarchical structures. Using randomized rounding, we extend this to give an algorithm for integral HierRobustFAP. For the probabilistic version, we first give an algorithm that achieves an additive ε approximation in the failure probability for the single level version, called ProbFAP, while giving up a (1 + ε) multiplicative factor in the number of failures. We then extend the result to the hierarchical version, HierProbFAP, achieving an ε additive approximation in failure probability while giving up an (L + ε) multiplicative factor in the number of failures, where \$L\$ is the number of levels in the hierarchy.
Networks are a fundamental tool for understanding and modeling complex systems in physics, biology, neuroscience, engineering, and social science. Many networks are known to exhibit rich, lower-order connectivity patterns that can be captured at the level of individual nodes and edges. However, higher-order organization of complex networks – at the level of small network subgraphs – remains largely unknown. Here, we develop a generalized framework for clustering networks on the basis of higher-order connectivity patterns. This framework provides mathematical guarantees on the optimality of obtained clusters and scales to networks with billions of edges. The framework reveals higher-order organization in a number of networks, including information propagation units in neuronal networks and hub structure in transportation networks. Results show that networks exhibit rich higher-order organizational structures that are exposed by clustering based on higher-order connectivity patterns. Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.
We explicitly construct an extractor for two independent sources on n bits, each with polylogarithmic min-entropy. Our extractor outputs one bit and has polynomially small error. The best previous extractor, by Bourgain, required each source to have min-entropy .499n. A key ingredient in our construction is an explicit construction of a monotone, almost-balanced Boolean functions that are resilient to coalitions. In fact, our construction is stronger in that it gives an explicit extractor for a generalization of non-oblivious bit-fixing sources on n bits, where some unknown n-q bits are chosen almost polylogarithmic-wise independently, and the remaining q bits are chosen by an adversary as an arbitrary function of the n-q bits. The best previous construction, by Viola, achieved q quadratically smaller than our result. Our explicit two-source extractor directly implies improved constructions of a K-Ramsey graph over N vertices, improving bounds obtained by Barak et al. and matching independent work by Cohen.
Cryptographic hash functions are used to protect the integrity of information. Hash functions are designed by using existing block ciphers as compression functions. This is due to challenges and difficulties that are encountered in constructing new hash functions from the scratch. However, the key generations for encryption process result to huge computational cost which affects the efficiency of the hash function. This paper proposes a new, secure and efficient compression function based on a pseudorandom function, that takes in two 2n-bits inputs and produce one n-bit output (2n-to-n bit). In addition, a new keyed hash function with three variants is proposed (PinTar 128 bits, 256 bits and 512 bits) which uses the proposed compression as its underlying building block. Statistical analysis shows that the compression function is an efficient one way random function. Similarly, statistical analysis of the keyed hash function shows that the proposed keyed function has strong avalanche property and is resistant to key exhaustive search attack. The proposed key hash function can be used as candidate for developing security systems.
We propose pseudo random bit generator(PRBG) by constructing an ergodic orbit of logistic map generated using intermittent perturbation and sampling the orbit using Bernoulli shift map. The pseudo randomness of the key streams is tested with the standard NIST statistical test suite. The key streams generated by the proposed PRBG pass all the tests of the test suite. As an application of PRBG we propose chaotic stream ciphers. Goodness-of-Fit tests are conducted for these ciphers. Stream ciphers generated using PRBG are shown to generate ciphers in which the distribution of the characters show significant correlation to the uniform distribution. It is also shown that the ciphers exhibit good avalanche effect when the initial seed is varied infinitesimally and we can also clearly observe the avalanche effect at different points within the ciphertext as the ciphertext is being generated. Also we show that the proposed PRBG is on par with conventional PRBG RC4. This indeed vindicates our hypothesis that stream of bits obtained from orbits of logistic map using perturbation and Bernoulli shift map exhibits randomness.
Growing numbers of ubiquitous electronic devices and services motivate the need for effortless user authentication and identification. While biometrics are a natural means of achieving these goals, their use poses privacy risks, due mainly to the difficulty of preventing theft and abuse of biometric data. One way to minimize information leakage is to derive biometric keys from users' raw biometric measurements. Such keys can be used in subsequent security protocols and ensure that no sensitive biometric data needs to be transmitted or permanently stored. This paper is the first attempt to explore the use of human body impedance as a biometric trait for deriving secret keys. Building upon Randomized Biometric Templates as a key generation scheme, we devise a mechanism that supports consistent regeneration of unique keys from users' impedance measurements. The underlying set of biometric features are found using a feature learning technique based on Siamese networks. Compared to prior feature extraction methods, the proposed technique offers significantly improved recognition rates in the context of key generation. Besides computing experimental error rates, we tailor a known key guessing approach specifically to the used key generation scheme and assess security provided by the resulting keys. We give a very conservative estimate of the number of guesses an adversary must make to find a correct key. Results show that the proposed key generation approach produces keys comparable to those obtained by similar methods based on other biometrics.
Wireless Sensor Network (WSN) consists of a numerous of small devices called sensor which has a limitation in resources such as low energy, memory, and computation. Sensors deployed in a harsh environment and vulnerable to various security issues and due to the resource restriction in a sensor, key management and provide robust security in this type of networks is a challenge. keys may be used in two ways in cryptography is symmetric or asymmetric, asymmetric is required more communication, memory, and computing when compared with symmetric, so it is not appropriate for WSN. In this paper, key management scheme based on symmetric keys has been proposed where each node uses pseudo-random generator (PRNG)to generate key that is shared with base station based on pre-distributed initial key and CBC - RC5 to reached to confidently, integrity and authentication.
Malicious domains are key components to a variety of cyber attacks. Several recent techniques are proposed to identify malicious domains through analysis of DNS data. The general approach is to build classifiers based on DNS-related local domain features. One potential problem is that many local features, e.g., domain name patterns and temporal patterns, tend to be not robust. Attackers could easily alter these features to evade detection without affecting much their attack capabilities. In this paper, we take a complementary approach. Instead of focusing on local features, we propose to discover and analyze global associations among domains. The key challenges are (1) to build meaningful associations among domains; and (2) to use these associations to reason about the potential maliciousness of domains. For the first challenge, we take advantage of the modus operandi of attackers. To avoid detection, malicious domains exhibit dynamic behavior by, for example, frequently changing the malicious domain-IP resolutions and creating new domains. This makes it very likely for attackers to reuse resources. It is indeed commonly observed that over a period of time multiple malicious domains are hosted on the same IPs and multiple IPs host the same malicious domains, which creates intrinsic association among them. For the second challenge, we develop a graph-based inference technique over associated domains. Our approach is based on the intuition that a domain having strong associations with known malicious domains is likely to be malicious. Carefully established associations enable the discovery of a large set of new malicious domains using a very small set of previously known malicious ones. Our experiments over a public passive DNS database show that the proposed technique can achieve high true positive rates (over 95%) while maintaining low false positive rates (less than 0.5%). Further, even with a small set of known malicious domains (a couple of hundreds), our technique can discover a large set of potential malicious domains (in the scale of up to tens of thousands).
The Common Vulnerability Scoring System (CVSS) is the de facto standard for vulnerability severity measurement today and is crucial in the analytics driving software fortification. Required by the U.S. National Vulnerability Database, over 75,000 vulnerabilities have been scored using CVSS. We compare how the CVSS correlates with another, closely-related measure of security impact: bounties. Recent economic studies of vulnerability disclosure processes show a clear relationship between black market value and bounty payments. We analyzed the CVSS scores and bounty awarded for 703 vulnerabilities across 24 products. We found a weak (Spearmanâs Ï = 0.34) correlation between CVSS scores and bounties, with CVSS being more likely to underestimate bounty. We believe such a negative result is a cause for concern. We investigated why these measurements were so discordant by (a) analyzing the individual questions of CVSS with respect to bounties and (b) conducting a qualitative study to find the similarities and differences between CVSS and the publicly-available criteria for awarding bounties. Among our findings were that the bounty criteria were more explicit about code execution and privilege escalation whereas CVSS makes no explicit mention of those. We also found that bounty valuations are evaluated solely by project maintainers, whereas CVSS has little provenance in practice.
Recent technology shifts such as cloud computing, the Internet of Things, and big data lead to a significant transfer of sensitive data out of trusted edge networks. To counter resulting privacy concerns, we must ensure that this sensitive data is not inadvertently forwarded to third-parties, used for unintended purposes, or handled and stored in violation of legal requirements. Related work proposes to solve this challenge by annotating data with privacy policies before data leaves the control sphere of its owner. However, we find that existing privacy policy languages are either not flexible enough or require excessive processing, storage, or bandwidth resources which prevents their widespread deployment. To fill this gap, we propose CPPL, a Compact Privacy Policy Language which compresses privacy policies by taking advantage of flexibly specifiable domain knowledge. Our evaluation shows that CPPL reduces policy sizes by two orders of magnitude compared to related work and can check several thousand of policies per second. This allows for individual per-data item policies in the context of cloud computing, the Internet of Things, and big data.
In the last few years, there has been significant interest in developing methods to search over encrypted data. In the case of range queries, a simple solution is to encrypt the contents of the database using an order-preserving encryption (OPE) scheme (i.e., an encryption scheme that supports comparisons over encrypted values). However, Naveed et al. (CCS 2015) recently showed that OPE-encrypted databases are extremely vulnerable to "inference attacks." In this work, we consider a related primitive called order-revealing encryption (ORE), which is a generalization of OPE that allows for stronger security. We begin by constructing a new ORE scheme for small message spaces which achieves the "best-possible" notion of security for ORE. Next, we introduce a "domain extension" technique and apply it to our small-message-space ORE. While our domain-extension technique does incur a loss in security, the resulting ORE scheme we obtain is more secure than all existing (stateless and non-interactive) OPE and ORE schemes which are practical. All of our constructions rely only on symmetric primitives. As part of our analysis, we also give a tight lower bound for OPE and show that no efficient OPE scheme can satisfy best-possible security if the message space contains just three messages. Thus, achieving strong notions of security for even small message spaces requires moving beyond OPE. Finally, we examine the properties of our new ORE scheme and show how to use it to construct an efficient range query protocol that is robust against the inference attacks of Naveed et al. We also give a full implementation of our new ORE scheme, and show that not only is our scheme more secure than existing OPE schemes, it is also faster: encrypting a 32-bit integer requires just 55 microseconds, which is more than 65 times faster than existing OPE schemes.
Function Secret Sharing (FSS), introduced by Boyle et al. (Eurocrypt 2015), provides a way for additively secret-sharing a function from a given function family F. More concretely, an m-party FSS scheme splits a function f : \0, 1\n -textgreater G, for some abelian group G, into functions f1,...,fm, described by keys k1,...,km, such that f = f1 + ... + fm and every strict subset of the keys hides f. A Distributed Point Function (DPF) is a special case where F is the family of point functions, namely functions f\_\a,b\ that evaluate to b on the input a and to 0 on all other inputs. FSS schemes are useful for applications that involve privately reading from or writing to distributed databases while minimizing the amount of communication. These include different flavors of private information retrieval (PIR), as well as a recent application of DPF for large-scale anonymous messaging. We improve and extend previous results in several ways: * Simplified FSS constructions. We introduce a tensoring operation for FSS which is used to obtain a conceptually simpler derivation of previous constructions and present our new constructions. * Improved 2-party DPF. We reduce the key size of the PRG-based DPF scheme of Boyle et al. roughly by a factor of 4 and optimize its computational cost. The optimized DPF significantly improves the concrete costs of 2-server PIR and related primitives. * FSS for new function families. We present an efficient PRG-based 2-party FSS scheme for the family of decision trees, leaking only the topology of the tree and the internal node labels. We apply this towards FSS for multi-dimensional intervals. We also present a general technique for extending FSS schemes by increasing the number of parties. * Verifiable FSS. We present efficient protocols for verifying that keys (k*/1,...,k*/m ), obtained from a potentially malicious user, are consistent with some f in F. Such a verification may be critical for applications that involve private writing or voting by many users.
The domain name system (DNS) offers an ideal distributed database for big data mining related to different cyber security questions. Besides infrastructural problems, scalability issues, and security challenges related to the protocol itself, information from DNS is often required also for more nuanced cyber security questions. Against this backdrop, this paper discusses the fundamental characteristics of DNS in relation to cyber security and different research prototypes designed for passive but continuous DNS-based monitoring of domains and addresses. With this discussion, the paper also illustrates a few general software design aspects.
We tackle the problem of automated exploit generation for web applications. In this regard, we present an approach that significantly improves the state-of-art in web injection vulnerability identification and exploit generation. Our approach for exploit generation tackles various challenges associated with typical web application characteristics: their multi-module nature, interposed user input, and multi-tier architectures using a database backend. Our approach develops precise models of application workflows, database schemas, and native functions to achieve high quality exploit generation. We implemented our approach in a tool called Chainsaw. Chainsaw was used to analyze 9 open source applications and generated over 199 first- and second-order injection exploits combined, significantly outperforming several related approaches.
There are currently no requirements (technical or otherwise) that routing paths must be contained within national boundaries. Indeed, some paths experience international detours, i.e., originate in one country, cross international boundaries and return to the same country. In most cases these are sensible traffic engineering or peering decisions at ISPs that serve multiple countries. In some cases such detours may be suspicious. Characterizing international detours is useful to a number of players: (a) network engineers trying to diagnose persistent problems, (b) policy makers aiming at adhering to certain national communication policies, (c) entrepreneurs looking for opportunities to deploy new networks, or (d) privacy-conscious states trying to minimize the amount of internal communication traversing different jurisdictions. In this paper we characterize international detours in the Internet during the month of January 2016. To detect detours we sample BGP RIBs every 8 hours from 461 RouteViews and RIPE RIS peers spanning 30 countries. We use geolocation of ASes which geolocates each BGP prefix announced by each AS, mapping its presence at IXPs and geolocation infrastructure IPs. Finally, we analyze each global BGP RIB entry looking for detours. Our analysis shows more than 5K unique BGP prefixes experienced a detour. 132 prefixes experienced more than 50% of the detours. We observe about 544K detours. Detours either last for a few days or persist the entire month. Out of all the detours, more than 90% were transient detours that lasted for 72 hours or less. We also show different countries experience different characteristics of detours.