Biblio
Recommender system is to suggest items that might be interest of the users in social networks. Collaborative filtering is an approach that works based on similarity and recommends items liked by other similar users. Trust model adopts users' trust network in place of similarity. Multi-faceted trust model considers multiple and heterogeneous trust relationship among the users and recommend items based on rating exist in the network of trustees of a specific facet. This paper applies genetic algorithm to estimate parameters of multi-faceted trust model, in which the trust weights are calculated based on the ratings and the trust network for each facet, separately. The model was built on Epinions data set that includes consumers' opinion, rating for items and the web of trust network. It was used to predict users' rating for items in different facets and root mean squared of prediction error (RMSE) was considered as a measure of performance. Empirical evaluations demonstrated that multi-facet models improve performance of the recommender system.
Trust Management (TM) systems for authentication are vital to the security of online interactions, which are ubiquitous in our everyday lives. Various systems, like the Web PKI (X.509) and PGP's Web of Trust are used to manage trust in this setting. In recent years, blockchain technology has been introduced as a panacea to our security problems, including that of authentication, without sufficient reasoning, as to its merits.In this work, we investigate the merits of using open distributed ledgers (ODLs), such as the one implemented by blockchain technology, for securing TM systems for authentication. We formally model such systems, and explore how blockchain can help mitigate attacks against them. After formal argumentation, we conclude that in the context of Trust Management for authentication, blockchain technology, and ODLs in general, can offer considerable advantages compared to previous approaches. Our analysis is, to the best of our knowledge, the first to formally model and argue about the security of TM systems for authentication, based on blockchain technology. To achieve this result, we first provide an abstract model for TM systems for authentication. Then, we show how this model can be conceptually encoded in a blockchain, by expressing it as a series of state transitions. As a next step, we examine five prevalent attacks on TM systems, and provide evidence that blockchain-based solutions can be beneficial to the security of such systems, by mitigating, or completely negating such attacks.
The rapid development of cloud computing has resulted in the emergence of numerous web services on the Internet. Selecting a suitable cloud service is becoming a major problem for users especially non-professionals. Quality of Service (QoS) is considered to be the criterion for judging web services. There are several Collaborative Filtering (CF)-based QoS prediction methods proposed in recent years. QoS values among different users may vary largely due to the network and geographical location. Moreover, QoS data provided by untrusted users will definitely affect the prediction accuracy. However, most existing methods seldom take both facts into consideration. In this paper, we present a trust-aware and location-based approach for web service QoS prediction. A trust value for each user is evaluated before the similarity calculation and the location is taken into account in similar neighbors selecting. A series of experiments are performed based on a realworld QoS dataset including 339 service users and 5,825 services. The experimental analysis shows that the accuracy of our method is much higher than other CF-based methods.
Caching query results is an efficient technique for Web search engines. A state-of-the-art approach named Static-Dynamic Cache (SDC) is widely used in practice. Replacement policy is the key factor on the performance of cache system, and has been widely studied such as LIRS, ARC, CLOCK, SKLRU and RANDOM in different research areas. In this paper, we discussed replacement policies for static-dynamic cache and conducted the experiments on real large scale query logs from two famous commercial Web search engine companies. The experimental results show that ARC replacement policy could work well with static-dynamic cache, especially for large scale query results cache.
Wikipedia is one of the most popular information platforms on the Internet. The user access pattern to Wikipedia pages depends on their relevance in the current worldwide social discourse. We use publically available statistics about the top-1000 most popular pages on each day to estimate the efficiency of caches for support of the platform. While the data volumes are moderate, the main goal of Wikipedia caches is to reduce access times for page views and edits. We study the impact of most popular pages on the achievable cache hit rate in comparison to Zipf request distributions and we include daily dynamics in popularity.
Knowledge Graphs (KG) are becoming core components of most artificial intelligence applications. Linked Data, as a method of publishing KGs, allows applications to traverse within, and even out of, the graph thanks to global dereferenceable identifiers denoting entities, in the form of IRIs. However, as we show in this work, after analyzing several popular datasets (namely DBpedia, LOD Cache, and Web Data Commons JSON-LD data) many entities are being represented using literal strings where IRIs should be used, diminishing the advantages of using Linked Data. To remedy this, we propose an approach for identifying such strings and replacing them with their corresponding entity IRIs. The proposed approach is based on identifying relations between entities based on both ontological axioms as well as data profiling information and converting strings to entity IRIs based on the types of entities linked by each relation. Our approach showed 98% recall and 76% precision in identifying such strings and 97% precision in converting them to their corresponding IRI in the considered KG. Further, we analyzed how the connectivity of the KG is increased when new relevant links are added to the entities as a result of our method. Our experiments on a subset of the Spanish DBpedia data show that it could add 25% more links to the KG and improve the overall connectivity by 17%.
A disk-based search system distributes a large index across multiple disks on one or more machines, where documents are typically assigned to disks at random in order to achieve load balancing. However, random distribution degrades clustering, which is required for efficient index compression. Using the GOV2 dataset, we demonstrate the effect of various ordering techniques on index compression, and then quantify the effect of various document distribution approaches on compression and load balancing. We explore runtime performance by simulating a disk-based search system for a scaled-out 10xGOV2 index over ten disks using two standard approaches, document and term distribution, as well as a hybrid approach: small-term distribution. We find that small-term distribution has the best performance, especially in the presence of list caching, and argue that this rarely discussed distribution approach can improve disk-based search performance for many real-world installations.
Modern CDNs cache and deliver a highly-diverse set of traffic classes, including web pages, images, videos and software downloads. It is economically advantageous for a CDN to cache and deliver all traffic classes using a shared distributed cache server infrastructure. However, such sharing of cache resources across multiple traffic classes poses significant cache provisioning challenges that are the focus of this paper. Managing a vast shared caching infrastructure requires careful modeling of user request sequences for each traffic class. Using extensive traces from Akamai's CDN, we show how each traffic class has drastically different object access patterns, object size distributions, and cache resource requirements. We introduce the notion of a footprint descriptor that is a succinct representation of the cache requirements of a request sequence. Leveraging novel connections to Fourier analysis, we develop a footprint descriptor calculus that allows us to predict the cache requirements when different traffic classes are added, subtracted and scaled to within a prediction error of 2.5%. We integrated our footprint calculus in the cache provisioning operations of the production CDN and show how it is used to solve key challenges in cache sizing, traffic mixing, and cache partitioning.
The Internet is becoming more and more content-oriented. CDN (Content Distribution Networks) has been a popular architecture compatible with the current Internet, and a new revolutionary paradigm such as ICN (Information Centric Networking) has studied. One of the main components in both CDN and ICN is considering cache on network. Despite a surge of extensive use of cache in the current and future Internet architectures, analysis on the performance of general cache networks are still quite limited due to complex inter-plays among various components and thus analytical intractability. Due to mathematical tractability, we consider 'static' cache policies and study asymptotic delay performance of those policies in cache networks, in particular, focusing on the impact of heterogeneous content popularities and nodes' geographical 'importances' in caching policies. Furthermore, our simulation results suggest that they perform quite similarly as popular 'dynamic' policies such as LFU (Least-Frequently-Used) and LRU (Least-Recently-Used). We believe that our theoretical findings provide useful engineering implications such as when and how various factors have impact on caching performance.
Many Linked Open Data applications require fresh copies of RDF data at their local repositories. Since RDF documents constantly change and those changes are not automatically propagated to the LOD applications, it is important to regularly visit the RDF documents to refresh the local copies and keep them up-to-date. For this purpose, crawling strategies determine which RDF documents should be preferentially fetched. Traditional crawling strategies rely only on how an RDF document has been modified in the past. In contrast, we predict on the triple level whether a change will occur in the future. We use the weekly snapshots of the DyLDO dataset as well as the monthly snapshots of the Wikidata dataset. First, we conduct an in-depth analysis of the life span of triples in RDF documents. Through the analysis, we identify which triples are stable and which are ephemeral. We introduce different features based on the triples and apply a simple but effective linear regression model. Second, we propose a novel crawling strategy based on the linear regression model. We conduct two experimental setups where we vary the amount of available bandwidth as well as iteratively observe the quality of the local copies over time. The results demonstrate that the novel crawling strategy outperforms the state of the art in both setups.
Map-based services are becoming increasingly important in many applications. These services often need to show geospatial objects (e.g., cities and parks) in Web browsers, and being able to retrieve such objects efficiently is critical to achieving a low response time for user queries. In this demonstration we present a browser-based caching technique to store and load geospatial objects on a map in a Web page. The technique employs a hierarchical structure to store and index polygons, and does intelligent prefetching and cache replacement by utilizing the information about the user's recent browser activities. We demonstrate the usage of the technique in an application called TwitterMap for visualizing more than 1 billion tweets in real time. We show its effectiveness by using different replacement policies. The technique is implemented as a general-purpose Javascript library, making it suitable for other applications as well.
Most popular Web applications rely on persistent databases based on languages like SQL for declarative specification of data models and the operations that read and modify them. As applications scale up in user base, they often face challenges responding quickly enough to the high volume of requests. A common aid is caching of database results in the application's memory space, taking advantage of program-specific knowledge of which caching schemes are sound and useful, embodied in handwritten modifications that make the program less maintainable. These modifications also require nontrivial reasoning about the read-write dependencies across operations. In this paper, we present a compiler optimization that automatically adds sound SQL caching to Web applications coded in the Ur/Web domain-specific functional language, with no modifications required to source code. We use a custom cache implementation that supports concurrent operations without compromising the transactional semantics of the database abstraction. Through experiments with microbenchmarks and production Ur/Web applications, we show that our optimization in many cases enables an easy doubling or more of an application's throughput, requiring nothing more than passing an extra command-line flag to the compiler.
End-users in emerging markets experience poor web performance due to a combination of three factors: high server response time, limited edge bandwidth and the complexity of web pages. The absence of cloud infrastructure in developing regions and the limited bandwidth experienced by edge nodes constrain the effectiveness of conventional caching solutions for these contexts. This paper describes the design, implementation and deployment of xCache, a cloud-managed Internet caching architecture that aims to proactively profile popular web pages and maintain the liveness of popular content at software defined edge caches to enhance the cache hit rate with minimal bandwidth overhead. xCache uses a Cloud Controller that continuously analyzes active cloud-managed web pages and derives an object-group representation of web pages based on the objects of a page. Using this object-group representation, xCache computes a bandwidth-aware utility measure to derive the most valuable configuration for each edge cache. Our preliminary real-world deployment across university campuses in three developing regions demonstrates its potential compared to conventional caching by improving cache hit rates by about 15%. Our evaluations of xCache have also shown that it can be applied in conjunction with other web optimizations solutions like Shandian, and can improve page load times by more than 50%.
Establishing a secret and reliable wireless communication is a challenging task that is of paramount importance. In this paper, we investigate the physical layer security of a legitimate transmission link between a user that assists an Intrusion Detection System (IDS) in detecting eavesdropping and jamming attacks in the presence of an adversary that is capable of conducting an eavesdropping or a jamming attack. The user is being faced by a challenge of whether to transmit, thus becoming vulnerable to an eavesdropping or a jamming attack, or to keep silent and consequently his/her transmission will be delayed. The adversary is also facing a challenge of whether to conduct an eavesdropping or a jamming attack that will not get him/her to be detected. We model the interactions between the user and the adversary as a two-state stochastic game. Explicit solutions characterize some properties while highlighting some interesting strategies that are being embraced by the user and the adversary. Results show that our proposed system outperform current systems in terms of communication secrecy.
High-accuracy localization is a prerequisite for many wireless applications. To obtain accurate location information, it is often required to share users' positional knowledge and this brings the risk of leaking location information to adversaries during the localization process. This paper develops a theory and algorithms for protecting location secrecy. In particular, we first introduce a location secrecy metric (LSM) for a general measurement model of an eavesdropper. Compared to previous work, the measurement model accounts for parameters such as channel conditions and time offsets in addition to the positions of users. We determine the expression of the LSM for typical scenarios and show how the LSM depends on the capability of an eavesdropper and the quality of the eavesdropper's measurement. Based on the insights gained from the analysis, we consider a case study in wireless localization network and develop an algorithm that diminish the eavesdropper's capabilities by exploiting the reciprocity of channels. Numerical results show that the proposed algorithm can effectively increase the LSM and protect location secrecy.
In this paper, machine learning attacks are performed on a novel hybrid delay based Arbiter Ring Oscillator PUF (AROPUF). The AROPUF exhibits improved results when compared to traditional Arbiter Physical Unclonable Function (APUF). The challenge-response pairs (CRPs) from both PUFs are fed to the multilayered perceptron model (MLP) with one hidden layer. The results show that the CRPs generated from the proposed AROPUF has more training and prediction errors when compared to the APUF, thus making it more difficult for the adversary to predict the CRPs.
In this paper, we present an algorithm for estimating the state of the power grid following a cyber-physical attack. We assume that an adversary attacks an area by: (i) disconnecting some lines within that area (failed lines), and (ii) obstructing the information from within the area to reach the control center. Given the phase angles of the buses outside the attacked area under the AC power flow model (before and after the attack), the algorithm estimates the phase angles of the buses and detects the failed lines inside the attacked area. The novelty of our approach is the transformation of the line failures detection problem, which is combinatorial in nature, to a convex optimization problem. As a result, our algorithm can detect any number of line failures in a running time that is independent of the number of failures and is solely dependent on the size of the network. To the best of our knowledge, this is the first convex relaxation for the problem of line failures detection using phase angle measurements under the AC power flow model. We evaluate the performance of our algorithm in the IEEE 118- and 300-bus systems, and show that it estimates the phase angles of the buses with less that 1% error, and can detect the line failures with 80% accuracy for single, double, and triple line failures.
The false data injection attack (FDIA) is a form of cyber-attack capable of affecting the secure and economic operation of the smart grid. With DC model-based state estimation, this paper analyzes ways of constructing a successful attacking vector to fulfill specific targets, i.e., pre-specified state variable target and pre-specified meter target according to the adversary's willingness. The grid operator's historical reading experiences on meters are considered as a constraint for the adversary to avoid being detected. Also from the viewpoint of the adversary, we propose to take full advantage of the dual concept of the coefficients in the topology matrix to handle with the problem that the adversary has no access to some meters. Effectiveness of the proposed method is validated by numerical experiments on the IEEE-14 benchmark system.
Machine learning and data mining algorithms typically assume that the training and testing data are sampled from the same fixed probability distribution; however, this violation is often violated in practice. The field of domain adaptation addresses the situation where this assumption of a fixed probability between the two domains is violated; however, the difference between the two domains (training/source and testing/target) may not be known a priori. There has been a recent thrust in addressing the problem of learning in the presence of an adversary, which we formulate as a problem of domain adaption to build a more robust classifier. This is because the overall security of classifiers and their preprocessing stages have been called into question with the recent findings of adversaries in a learning setting. Adversarial training (and testing) data pose a serious threat to scenarios where an attacker has the opportunity to ``poison'' the training or ``evade'' on the testing data set(s) in order to achieve something that is not in the best interest of the classifier. Recent work has begun to show the impact of adversarial data on several classifiers; however, the impact of the adversary on aspects related to preprocessing of data (i.e., dimensionality reduction or feature selection) has widely been ignored in the revamp of adversarial learning research. Furthermore, variable selection, which is a vital component to any data analysis, has been shown to be particularly susceptible under an attacker that has knowledge of the task. In this work, we explore avenues for learning resilient classification models in the adversarial learning setting by considering the effects of adversarial data and how to mitigate its effects through optimization. Our model forms a single convex optimization problem that uses the labeled training data from the source domain and known- weaknesses of the model for an adversarial component. We benchmark the proposed approach on synthetic data and show the trade-off between classification accuracy and skew-insensitive statistics.
Deregulated electricity markets rely on a two-settlement system consisting of day-ahead and real-time markets, across which electricity price is volatile. In such markets, locational marginal pricing is widely adopted to set electricity prices and manage transmission congestion. Locational marginal prices are vulnerable to measurement errors. Existing studies show that if the adversaries are omniscient, they can design profitable attack strategies without being detected by the residue-based bad data detectors. This paper focuses on a more realistic setting, in which the attackers have only partial and imperfect information due to their limited resources and restricted physical access to the grid. Specifically, the attackers are assumed to have uncertainties about the state of the grid, and the uncertainties are modeled stochastically. Based on this model, this paper offers a framework for characterizing the optimal stochastic guarantees for the effectiveness of the attacks and the associated pricing impacts.
Cooperative spectrum sensing is often necessary in cognitive radios systems to localize a transmitter by fusing the measurements from multiple sensing radios. However, revealing spectrum sensing information also generally leaks information about the location of the radio that made those measurements. We propose a protocol for performing cooperative spectrum sensing while preserving the privacy of the sensing radios. In this protocol, radios fuse sensing information through a distributed particle filter based on a tree structure. All sensing information is encrypted using public-key cryptography, and one of the radios serves as an anonymizer, whose role is to break the connection between the sensing radios and the public keys they use. We consider a semi-honest (honest-but-curious) adversary model in which there is at most a single adversary that is internal to the sensing network and complies with the specified protocol but wishes to determine information about the other participants. Under this scenario, an adversary may learn the sensing information of some of the radios, but it does not have any way to tie that information to a particular radio's identity. We test the performance of our proposed distributed, tree-based particle filter using physical measurements of FM broadcast stations.
This paper offers a new approach to modelling the effect of cyber-attacks on reliability of software used in industrial control applications. The model is based on the view that successful cyber-attacks introduce failure regions, which are not present in non-compromised software. The model is then extended to cover a fault tolerant architecture, such as the 1-out-of-2 software, popular for building industrial protection systems. The model is used to study the effectiveness of software maintenance policies such as patching and "cleansing" ("proactive recovery") under different adversary models ranging from independent attacks to sophisticated synchronized attacks on the channels. We demonstrate that the effect of attacks on reliability of diverse software significantly depends on the adversary model. Under synchronized attacks system reliability may be more than an order of magnitude worse than under independent attacks on the channels. These findings, although not surprising, highlight the importance of using an adequate adversary model in the assessment of how effective various cyber-security controls are.
Distributed Denial of Service (DDoS) attacks are some of the most persistent threats on the Internet today. The evolution of DDoS attacks calls for an in-depth analysis of those attacks. A better understanding of the attackers' behavior can provide insights to unveil patterns and strategies utilized by attackers. The prior art on the attackers' behavior analysis often falls in two aspects: it assumes that adversaries are static, and makes certain simplifying assumptions on their behavior, which often are not supported by real attack data. In this paper, we take a data-driven approach to designing and validating three DDoS attack models from temporal (e.g., attack magnitudes), spatial (e.g., attacker origin), and spatiotemporal (e.g., attack inter-launching time) perspectives. We design these models based on the analysis of traces consisting of more than 50,000 verified DDoS attacks from industrial mitigation operations. Each model is also validated by testing its effectiveness in accurately predicting future DDoS attacks. Comparisons against simple intuitive models further show that our models can more accurately capture the essential features of DDoS attacks.
There are continuous hacking and social issues regarding APT (Advanced Persistent Threat - APT) attacks and a number of antivirus businesses and researchers are making efforts to analyze such APT attacks in order to prevent or cope with APT attacks, some host PC security technologies such as firewalls and intrusion detection systems are used. Therefore, in this study, malignant behavior patterns were extracted by using an API of PE files. Moreover, the FP-Growth Algorithm to extract behavior information generated in the host PC in order to overcome the limitation of the previous signature-based intrusion detection systems. We will utilize this study as fundamental research about a system that extracts malignant behavior patterns within networks and APIs in the future.
This paper proposes a method to detect two primary means of using the Domain Name System (DNS) for malicious purposes. We develop machine learning models to detect information exfiltration from compromised machines and the establishment of command & control (C&C) servers via tunneling. We validate our approach by experiments where we successfully detect a malware used in several recent Advanced Persistent Threat (APT) attacks [1]. The novelty of our method is its robustness, simplicity, scalability, and ease of deployment in a production environment.