Biblio
Map-based services are becoming increasingly important in many applications. These services often need to show geospatial objects (e.g., cities and parks) in Web browsers, and being able to retrieve such objects efficiently is critical to achieving a low response time for user queries. In this demonstration we present a browser-based caching technique to store and load geospatial objects on a map in a Web page. The technique employs a hierarchical structure to store and index polygons, and does intelligent prefetching and cache replacement by utilizing the information about the user's recent browser activities. We demonstrate the usage of the technique in an application called TwitterMap for visualizing more than 1 billion tweets in real time. We show its effectiveness by using different replacement policies. The technique is implemented as a general-purpose Javascript library, making it suitable for other applications as well.
Most popular Web applications rely on persistent databases based on languages like SQL for declarative specification of data models and the operations that read and modify them. As applications scale up in user base, they often face challenges responding quickly enough to the high volume of requests. A common aid is caching of database results in the application's memory space, taking advantage of program-specific knowledge of which caching schemes are sound and useful, embodied in handwritten modifications that make the program less maintainable. These modifications also require nontrivial reasoning about the read-write dependencies across operations. In this paper, we present a compiler optimization that automatically adds sound SQL caching to Web applications coded in the Ur/Web domain-specific functional language, with no modifications required to source code. We use a custom cache implementation that supports concurrent operations without compromising the transactional semantics of the database abstraction. Through experiments with microbenchmarks and production Ur/Web applications, we show that our optimization in many cases enables an easy doubling or more of an application's throughput, requiring nothing more than passing an extra command-line flag to the compiler.
End-users in emerging markets experience poor web performance due to a combination of three factors: high server response time, limited edge bandwidth and the complexity of web pages. The absence of cloud infrastructure in developing regions and the limited bandwidth experienced by edge nodes constrain the effectiveness of conventional caching solutions for these contexts. This paper describes the design, implementation and deployment of xCache, a cloud-managed Internet caching architecture that aims to proactively profile popular web pages and maintain the liveness of popular content at software defined edge caches to enhance the cache hit rate with minimal bandwidth overhead. xCache uses a Cloud Controller that continuously analyzes active cloud-managed web pages and derives an object-group representation of web pages based on the objects of a page. Using this object-group representation, xCache computes a bandwidth-aware utility measure to derive the most valuable configuration for each edge cache. Our preliminary real-world deployment across university campuses in three developing regions demonstrates its potential compared to conventional caching by improving cache hit rates by about 15%. Our evaluations of xCache have also shown that it can be applied in conjunction with other web optimizations solutions like Shandian, and can improve page load times by more than 50%.
In this paper we discuss a simple and inexpensive method to introduce students to Newton's law of cooling using only their smartphones, according to the Bring-Your-Own-Device philosophy. A popular experiment in basic thermodynamics, both at a high-school and at University level, is the determination of the specific heat of solids and liquids using a water calorimeter, resourcing in many cases to a mercury thermometer. With our approach the analogical instrument is quickly turned into a digital device by analyzing the movement of the mercury with a video tracker. Thus, using very simple labware and the students' smartphones or tablets, it is possible to observe the decay behavior of the temperature of a liquid left to cool at room temperature. The dependence of the time constant with the mass and surface of the liquid can be easily probed, and the results of the different groups in the classroom can be brought together to observe the linear dependence1.
Recent years have witnessed a rapid growth in the domain of Internet of Things (IoT). This network of billions of devices generates and exchanges huge amount of data. The limited cache capacity and memory bandwidth make transferring and processing such data on traditional CPUs and GPUs highly inefficient, both in terms of energy consumption and delay. However, many IoT applications are statistical at heart and can accept a part of inaccuracy in their computation. This enables the designers to reduce complexity of processing by approximating the results for a desired accuracy. In this paper, we propose an ultra-efficient approximate processing in-memory architecture, called APIM, which exploits the analog characteristics of non-volatile memories to support addition and multiplication inside the crossbar memory, while storing the data. The proposed design eliminates the overhead involved in transferring data to processor by virtually bringing the processor inside memory. APIM dynamically configures the precision of computation for each application in order to tune the level of accuracy during runtime. Our experimental evaluation running six general OpenCL applications shows that the proposed design achieves up to 20x performance improvement and provides 480x improvement in energy-delay product, ensuring acceptable quality of service. In exact mode, it achieves 28x energy savings and 4.8x speed up compared to the state-of-the-art GPU cores.
We transfer a key idea from the field of sentiment analysis to a new domain: community question answering (cQA). The cQA task we are interested in is the following: given a question and a thread of comments, we want to re-rank the comments, so that the ones that are good answers to the question would be ranked higher than the bad ones. We notice that good vs. bad comments use specific vocabulary and that one can often predict the goodness/badness of a comment even ignoring the question, based on the comment contents only. This leads us to the idea to build a good/bad polarity lexicon as an analogy to the positive/negative sentiment polarity lexicons, commonly used in sentiment analysis. In particular, we use pointwise mutual information in order to build large-scale goodness polarity lexicons in a semi-supervised manner starting with a small number of initial seeds. The evaluation results show an improvement of 0.7 MAP points absolute over a very strong baseline, and state-of-the art performance on SemEval-2016 Task 3.
Recent methods for learning vector space representations of words, word embedding, such as GloVe and Word2Vec have succeeded in capturing fine-grained semantic and syntactic regularities. We analyzed the effectiveness of these methods for e-commerce recommender systems by transferring the sequence of items generated by users' browsing journey in an e-commerce website into a sentence of words. We examined the prediction of fine-grained item similarity (such as item most similar to iPhone 6 64GB smart phone) and item analogy (such as iPhone 5 is to iPhone 6 as Samsung S5 is to Samsung S6) using real life users' browsing history of an online European department store. Our results reveal that such methods outperform related models such as singular value decomposition (SVD) with respect to item similarity and analogy tasks across different product categories. Furthermore, these methods produce a highly condensed item vector space representation, item embedding, with behavioral meaning sub-structure. These vectors can be used as features in a variety of recommender system applications. In particular, we used these vectors as features in a neural network based models for anonymous user recommendation based on session's first few clicks. It is found that recurrent neural network that preserves the order of user's clicks outperforms standard neural network, item-to-item similarity and SVD (recall@10 value of 42% based on first three clicks) for this task.
FPGAs have been used as accelerators in a wide variety of domains such as learning, search, genomics, signal processing, compression, analytics and so on. In recent years, the availability of tools and flows such as high-level synthesis has made it even easier to accelerate a variety of high-performance computing applications onto FPGAs. In this paper we propose a systematic methodology for optimizing the performance of an accelerated block using the notion of compute intensity to guide optimizations in high-level synthesis. We demonstrate the effectiveness of our methodology on an FPGA implementation of a non-uniform discrete Fourier transform (NUDFT), used to convert a wireless channel model from the time-domain to the frequency domain. The acceleration of this particular computation can be used to improve the performance and capacity of wireless channel simulation, which has wide applications in the system level design and performance evaluation of wireless networks. Our results show that our FPGA implementation outperforms the same code offloaded onto GPUs and CPUs by 1.6x and 10x respectively, in performance as measured by the throughput of the accelerated block. The gains in performance per watt versus GPUs and CPUs are 15.6x and 41.5x respectively.
This paper presents an integrated Analog Delay Line (ADL) for analog RF signal processing. The design is inspired by a Bucket Brigade Device (BBD) structure. It transfers charges from a sampled input signal stage after stage. It belongs to the Charge Coupled Devices (CCD). This ADL is fully differential with Common Mode (CM) control. The 28nm Fully Depleted Silicon on Insulator (FDSOI) Technology from ST Microelectronics is used for the design. Further results come from simulations using Spectre Cadence.
Capturing knowledge via learned latent vector representations of words, images and knowledge graph (KG) entities has shown state-of-the-art performance in computer vision, computational linguistics and KG tasks. Recent results demonstrate that the learning of such representations across modalities can be beneficial, since each modality captures complementary information. However, those approaches are limited to concepts with cross-modal alignments in the training data which are only available for just a few concepts. Especially for visual objects exist far fewer embeddings than for words or KG entities. We investigate whether a word embedding (e.g., for "apple") can still capture information from other modalities even if there is no matching concept within the other modalities (i.e., no images or KG entities of apples but of oranges as pictured in the title analogy). The empirical results of our knowledge transfer approach demonstrate that word embeddings do benefit from extrapolating information across modalities even for concepts that are not represented in the other modalities. Interestingly, this applies most to concrete concepts (e.g., dragonfly) while abstract concepts (e.g., animal) benefit most if aligned concepts are available in the other modalities.
Establishing a secret and reliable wireless communication is a challenging task that is of paramount importance. In this paper, we investigate the physical layer security of a legitimate transmission link between a user that assists an Intrusion Detection System (IDS) in detecting eavesdropping and jamming attacks in the presence of an adversary that is capable of conducting an eavesdropping or a jamming attack. The user is being faced by a challenge of whether to transmit, thus becoming vulnerable to an eavesdropping or a jamming attack, or to keep silent and consequently his/her transmission will be delayed. The adversary is also facing a challenge of whether to conduct an eavesdropping or a jamming attack that will not get him/her to be detected. We model the interactions between the user and the adversary as a two-state stochastic game. Explicit solutions characterize some properties while highlighting some interesting strategies that are being embraced by the user and the adversary. Results show that our proposed system outperform current systems in terms of communication secrecy.
High-accuracy localization is a prerequisite for many wireless applications. To obtain accurate location information, it is often required to share users' positional knowledge and this brings the risk of leaking location information to adversaries during the localization process. This paper develops a theory and algorithms for protecting location secrecy. In particular, we first introduce a location secrecy metric (LSM) for a general measurement model of an eavesdropper. Compared to previous work, the measurement model accounts for parameters such as channel conditions and time offsets in addition to the positions of users. We determine the expression of the LSM for typical scenarios and show how the LSM depends on the capability of an eavesdropper and the quality of the eavesdropper's measurement. Based on the insights gained from the analysis, we consider a case study in wireless localization network and develop an algorithm that diminish the eavesdropper's capabilities by exploiting the reciprocity of channels. Numerical results show that the proposed algorithm can effectively increase the LSM and protect location secrecy.
In this paper, machine learning attacks are performed on a novel hybrid delay based Arbiter Ring Oscillator PUF (AROPUF). The AROPUF exhibits improved results when compared to traditional Arbiter Physical Unclonable Function (APUF). The challenge-response pairs (CRPs) from both PUFs are fed to the multilayered perceptron model (MLP) with one hidden layer. The results show that the CRPs generated from the proposed AROPUF has more training and prediction errors when compared to the APUF, thus making it more difficult for the adversary to predict the CRPs.
In this paper, we present an algorithm for estimating the state of the power grid following a cyber-physical attack. We assume that an adversary attacks an area by: (i) disconnecting some lines within that area (failed lines), and (ii) obstructing the information from within the area to reach the control center. Given the phase angles of the buses outside the attacked area under the AC power flow model (before and after the attack), the algorithm estimates the phase angles of the buses and detects the failed lines inside the attacked area. The novelty of our approach is the transformation of the line failures detection problem, which is combinatorial in nature, to a convex optimization problem. As a result, our algorithm can detect any number of line failures in a running time that is independent of the number of failures and is solely dependent on the size of the network. To the best of our knowledge, this is the first convex relaxation for the problem of line failures detection using phase angle measurements under the AC power flow model. We evaluate the performance of our algorithm in the IEEE 118- and 300-bus systems, and show that it estimates the phase angles of the buses with less that 1% error, and can detect the line failures with 80% accuracy for single, double, and triple line failures.
The false data injection attack (FDIA) is a form of cyber-attack capable of affecting the secure and economic operation of the smart grid. With DC model-based state estimation, this paper analyzes ways of constructing a successful attacking vector to fulfill specific targets, i.e., pre-specified state variable target and pre-specified meter target according to the adversary's willingness. The grid operator's historical reading experiences on meters are considered as a constraint for the adversary to avoid being detected. Also from the viewpoint of the adversary, we propose to take full advantage of the dual concept of the coefficients in the topology matrix to handle with the problem that the adversary has no access to some meters. Effectiveness of the proposed method is validated by numerical experiments on the IEEE-14 benchmark system.
Machine learning and data mining algorithms typically assume that the training and testing data are sampled from the same fixed probability distribution; however, this violation is often violated in practice. The field of domain adaptation addresses the situation where this assumption of a fixed probability between the two domains is violated; however, the difference between the two domains (training/source and testing/target) may not be known a priori. There has been a recent thrust in addressing the problem of learning in the presence of an adversary, which we formulate as a problem of domain adaption to build a more robust classifier. This is because the overall security of classifiers and their preprocessing stages have been called into question with the recent findings of adversaries in a learning setting. Adversarial training (and testing) data pose a serious threat to scenarios where an attacker has the opportunity to ``poison'' the training or ``evade'' on the testing data set(s) in order to achieve something that is not in the best interest of the classifier. Recent work has begun to show the impact of adversarial data on several classifiers; however, the impact of the adversary on aspects related to preprocessing of data (i.e., dimensionality reduction or feature selection) has widely been ignored in the revamp of adversarial learning research. Furthermore, variable selection, which is a vital component to any data analysis, has been shown to be particularly susceptible under an attacker that has knowledge of the task. In this work, we explore avenues for learning resilient classification models in the adversarial learning setting by considering the effects of adversarial data and how to mitigate its effects through optimization. Our model forms a single convex optimization problem that uses the labeled training data from the source domain and known- weaknesses of the model for an adversarial component. We benchmark the proposed approach on synthetic data and show the trade-off between classification accuracy and skew-insensitive statistics.
Deregulated electricity markets rely on a two-settlement system consisting of day-ahead and real-time markets, across which electricity price is volatile. In such markets, locational marginal pricing is widely adopted to set electricity prices and manage transmission congestion. Locational marginal prices are vulnerable to measurement errors. Existing studies show that if the adversaries are omniscient, they can design profitable attack strategies without being detected by the residue-based bad data detectors. This paper focuses on a more realistic setting, in which the attackers have only partial and imperfect information due to their limited resources and restricted physical access to the grid. Specifically, the attackers are assumed to have uncertainties about the state of the grid, and the uncertainties are modeled stochastically. Based on this model, this paper offers a framework for characterizing the optimal stochastic guarantees for the effectiveness of the attacks and the associated pricing impacts.
Cooperative spectrum sensing is often necessary in cognitive radios systems to localize a transmitter by fusing the measurements from multiple sensing radios. However, revealing spectrum sensing information also generally leaks information about the location of the radio that made those measurements. We propose a protocol for performing cooperative spectrum sensing while preserving the privacy of the sensing radios. In this protocol, radios fuse sensing information through a distributed particle filter based on a tree structure. All sensing information is encrypted using public-key cryptography, and one of the radios serves as an anonymizer, whose role is to break the connection between the sensing radios and the public keys they use. We consider a semi-honest (honest-but-curious) adversary model in which there is at most a single adversary that is internal to the sensing network and complies with the specified protocol but wishes to determine information about the other participants. Under this scenario, an adversary may learn the sensing information of some of the radios, but it does not have any way to tie that information to a particular radio's identity. We test the performance of our proposed distributed, tree-based particle filter using physical measurements of FM broadcast stations.
This paper offers a new approach to modelling the effect of cyber-attacks on reliability of software used in industrial control applications. The model is based on the view that successful cyber-attacks introduce failure regions, which are not present in non-compromised software. The model is then extended to cover a fault tolerant architecture, such as the 1-out-of-2 software, popular for building industrial protection systems. The model is used to study the effectiveness of software maintenance policies such as patching and "cleansing" ("proactive recovery") under different adversary models ranging from independent attacks to sophisticated synchronized attacks on the channels. We demonstrate that the effect of attacks on reliability of diverse software significantly depends on the adversary model. Under synchronized attacks system reliability may be more than an order of magnitude worse than under independent attacks on the channels. These findings, although not surprising, highlight the importance of using an adequate adversary model in the assessment of how effective various cyber-security controls are.
Distributed Denial of Service (DDoS) attacks are some of the most persistent threats on the Internet today. The evolution of DDoS attacks calls for an in-depth analysis of those attacks. A better understanding of the attackers' behavior can provide insights to unveil patterns and strategies utilized by attackers. The prior art on the attackers' behavior analysis often falls in two aspects: it assumes that adversaries are static, and makes certain simplifying assumptions on their behavior, which often are not supported by real attack data. In this paper, we take a data-driven approach to designing and validating three DDoS attack models from temporal (e.g., attack magnitudes), spatial (e.g., attacker origin), and spatiotemporal (e.g., attack inter-launching time) perspectives. We design these models based on the analysis of traces consisting of more than 50,000 verified DDoS attacks from industrial mitigation operations. Each model is also validated by testing its effectiveness in accurately predicting future DDoS attacks. Comparisons against simple intuitive models further show that our models can more accurately capture the essential features of DDoS attacks.
There are continuous hacking and social issues regarding APT (Advanced Persistent Threat - APT) attacks and a number of antivirus businesses and researchers are making efforts to analyze such APT attacks in order to prevent or cope with APT attacks, some host PC security technologies such as firewalls and intrusion detection systems are used. Therefore, in this study, malignant behavior patterns were extracted by using an API of PE files. Moreover, the FP-Growth Algorithm to extract behavior information generated in the host PC in order to overcome the limitation of the previous signature-based intrusion detection systems. We will utilize this study as fundamental research about a system that extracts malignant behavior patterns within networks and APIs in the future.
This paper proposes a method to detect two primary means of using the Domain Name System (DNS) for malicious purposes. We develop machine learning models to detect information exfiltration from compromised machines and the establishment of command & control (C&C) servers via tunneling. We validate our approach by experiments where we successfully detect a malware used in several recent Advanced Persistent Threat (APT) attacks [1]. The novelty of our method is its robustness, simplicity, scalability, and ease of deployment in a production environment.
The extensive use of information and communication technologies in power grid systems make them vulnerable to cyber-attacks. One class of cyber-attack is advanced persistent threats where highly skilled attackers can steal user authentication information's and then move laterally in the network, from host to host in a hidden manner, until they reach an attractive target. Once the presence of the attacker has been detected in the network, appropriate actions should be taken quickly to prevent the attacker going deeper. This paper presents a game theoretic approach to optimize the defense against an invader attempting to use a set of known vulnerabilities to reach critical nodes in the network. First, the network is modeled as a vulnerability multi-graph where the nodes represent physical hosts and edges the vulnerabilities that the attacker can exploit to move laterally from one host to another. Secondly, a two-player zero-sum Markov game is built where the states of the game represent the nodes of the vulnerability multi-graph graph and transitions correspond to the edge vulnerabilities that the attacker can exploit. The solution of the game gives the optimal strategy to disconnect vulnerable services and thus slow down the attack.
Detecting botnets and advanced persistent threats is a major challenge for network administrators. An important component of such malware is the command and control channel, which enables the malware to respond to controller commands. The detection of malware command and control channels could help prevent further malicious activity by cyber criminals using the malware. Detection of malware in network traffic is traditionally carried out by identifying specific patterns in packet payloads. Now bot writers encrypt the command and control payloads, making pattern recognition a less effective form of detection. This paper focuses instead on an effective anomaly based detection technique for bot and advanced persistent threats using a data mining approach combined with applied classification algorithms. After additional tuning, the final test on an unseen dataset, false positive rates of 0% with malware detection rates of 100% were achieved on two examined malware threats, with promising results on a number of other threats.
We introduce a novel mathematical model that treats network security as a game between cyber attackers and network administrators. The model takes the form of a zero-sum repeated game where each sub-game corresponds to a possible state of the attacker. Our formulation views state as the set of compromised edges in a graph opposed to the more traditional node-based view. This provides a more expressive model since it allows the defender to anticipate the direction of attack. Both players move independently and in continuous time allowing for the possibility of one player moving several times before the other does. This model shows that defense-in-depth is not always a rational strategy for budget constrained network administrators. Furthermore, a defender can dissuade a rational attacker from attempting to attack a network if the defense budget is sufficiently high. This means that a network administrator does not need to make their system completely free of vulnerabilities, they only to ensure the penalties for being caught outweigh the potential rewards gained.