Biblio
Mobile Ad hoc NETworks (MANETs) is a collection of mobile nodes and they can communicate with each other over the wireless medium without any fixed infrastructure. In MANETs any node can join and leave the network at any time and this makes MANETs vulnerable to a malicious attackers. Hence, it is necessary to develop an efficient intrusion-detection system to safeguard the MANET from attacks. In this paper, an Enhanced Adaptive Acknowledgement with Digital Signature Algorithm namely (EAACK-DSA) has been proposed which can detect and isolate the malicious nodes. This algorithm is based on the acknowledgement packet and hence all acknowledgement packets are digitally signed before transmission. The proposed algorithm can be integrated with any source routing protocol and EAACK-DSA gives a better malicious-behavior-detection than the conventional approaches.
In this paper we show how word embeddings can be used to increase the effectiveness of a state-of-the art Locality Sensitive Hashing (LSH) based first story detection (FSD) system over a standard tweet corpus. Vocabulary mismatch, in which related tweets use different words, is a serious hindrance to the effectiveness of a modern FSD system. In this case, a tweet could be flagged as a first story even if a related tweet, which uses different but synonymous words, was already returned as a first story. In this work, we propose a novel approach to mitigate this problem of lexical variation, based on tweet expansion. In particular, we propose to expand tweets with semantically related paraphrases identified via automatically mined word embeddings over a background tweet corpus. Through experimentation on a large data stream comprised of 50 million tweets, we show that FSD effectiveness can be improved by 9.5% over a state-of-the-art FSD system.
We present RENE –- a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENE on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.
Providing recommendations on social systems has been in the spotlight of both academics and industry for some time already. Social network giants like Facebook, LinkedIn, Myspace, etc., are eager to find the silver bullet of recommendation. These applications permit clients to shape a few certain social networks through their day-by-day social cooperative communications. In the meantime, today's online experience depends progressively on social association. One of the main concerns in social network is establishing a successful business plan to make more profit from the social network. Doing a business on every platform needs a good business plan with some important solutions such as advertise the products or services of other companies which would be a kind of marketing for those external businesses. In this study a philosophy of a system speaking to of a comprehensive structure of advertisement recommender system for social networks will be presented. The framework uses a semantic logic to provide the recommended products and this capability can differentiate the recommender part of the framework from classical recommender methods. Briefly, the framework proposed in this study has been designed in a form that can generate advertisement recommendations in a simplified and effective way for social network users.
With the massive amounts of data available today, it is common to store and process data using multiple machines. Parallel programming platforms such as MapReduce and its variants are popular frameworks for handling such large data. We present the first provably efficient algorithms to compute, store, and query data structures for range queries and approximate nearest neighbor queries in a popular parallel computing abstraction that captures the salient features of MapReduce and other massively parallel communication (MPC) models. In particular, we describe algorithms for \$kd\$-trees, range trees, and BBD-trees that only require O(1) rounds of communication for both preprocessing and querying while staying competitive in terms of running time and workload to their classical counterparts. Our algorithms are randomized, but they can be made deterministic at some increase in their running time and workload while keeping the number of rounds of communication to be constant.
We study the problem of approximate nearest neighbor search in \$d\$-dimensional Hamming space \0,1\d. We study the complexity of the problem in the famous cell-probe model, a classic model for data structures. We consider algorithms in the cell-probe model with limited adaptivity, where the algorithm makes k rounds of parallel accesses to the data structure for a given k. For any k ≥ 1, we give a simple randomized algorithm solving the approximate nearest neighbor search using k rounds of parallel memory accesses, with O(k(log d)1/k) accesses in total. We also give a more sophisticated randomized algorithm using O(k+(1/k log d)O(1/k)) memory accesses in k rounds for large enough k. Both algorithms use data structures of size polynomial in n, the number of points in the database. We prove an Ω(1/k(log d)1/k) lower bound for the total number of memory accesses required by any randomized algorithm solving the approximate nearest neighbor search within k ≤ (log log d)/(2 log log log d) rounds of parallel memory accesses on any data structures of polynomial size. This lower bound shows that our first algorithm is asymptotically optimal for any constant round k. And our second algorithm approaches the asymptotically optimal tradeoff between rounds and memory accesses, in a sense that the lower bound of memory accesses for any k1 rounds can be matched by the algorithm within k2=O(k1) rounds. In the extreme, for some large enough k=Θ((log log d)/(log log log d)), our second algorithm matches the Θ((log log d)/(log log log d)) tight bound for fully adaptive algorithms for approximate nearest neighbor search due to Chakrabarti and Regev.
Due to the "curse of dimensionality" problem, it is very expensive to process the nearest neighbor (NN) query in high-dimensional spaces; and hence, approximate approaches, such as Locality-Sensitive Hashing (LSH), are widely used for their theoretical guarantees and empirical performance. Current LSH-based approaches target at the L1 and L2 spaces, while as shown in previous work, the fractional distance metrics (Lp metrics with 0 textless p textless 1) can provide more insightful results than the usual L1 and L2 metrics for data mining and multimedia applications. However, none of the existing work can support multiple fractional distance metrics using one index. In this paper, we propose LazyLSH that answers approximate nearest neighbor queries for multiple Lp metrics with theoretical guarantees. Different from previous LSH approaches which need to build one dedicated index for every query space, LazyLSH uses a single base index to support the computations in multiple Lp spaces, significantly reducing the maintenance overhead. Extensive experiments show that LazyLSH provides more accurate results for approximate kNN search under fractional distance metrics.
With the advantage in compact representation and efficient comparison, binary hashing has been extensively investigated for approximate nearest neighbor search. In this paper, we propose a novel and general hashing framework, which simultaneously considers a new linear pair-wise distance preserving objective and point-wise constraint. The direct distance preserving objective aims to keep the linear relationships between the Euclidean distance and the Hamming distance of data points. Based on different point-wise constraints, we propose two methods to instantiate this framework. The first one is a pseudo-supervised hashing method, which uses existing unsupervised hashing methods to generate binary codes as pseudo-supervised information. The second one is an unsupervised hashing method, in which quantization loss is considered. We validate our framework on two large-scale datasets. The experiments demonstrate that our pseudo-supervised method achieves consistent improvement for the state-of-the-art unsupervised hashing methods, while our unsupervised method outperforms the state-of-the-art methods.
Today's applications asking for finding spatial protests nearest to a predefined area in the meantime fulfill limitation of keywords. Best answer for such questions depends on the IR2-tree, which has some inadequacies that truly affect system s efficiency. To defeat those inadequacies another access strategy is produced called the Spatial-inverted Index (SI) that extends the modified file to adapt to multidimensional information, and accompanies calculations that can answer closest neighbor queries with keywords continuously. This new technique SI is produced broadens the capacities of routine modified record makes do with multidimensional information, alongside the arrangement of using so as to move reach queries replied SI results to calculation which tackles the issue continuously.
This paper describes an approach where group testing helps in enforcing security and privacy in identification. We detail a particular scheme based on embedding and group testing. We add a second layer of defense, group vectors, where each group vector represents a set of dataset vectors. Whereas the selected embedding poorly protects the data when used alone, the group testing approach makes it much harder to reconstruct the data when combined with the embedding. Even when curious server and user collude to disclose the secret parameters, they cannot accurately recover the data. Another byproduct of our approach is that it reduces the complexity of the search and the required storage space. We show the interest of our work in a benchmark biometrics dataset, where we verify our theoretical analysis with real data.
A standard model for Recommender Systems is the Matrix Completion setting: given partially known matrix of ratings given by users (rows) to items (columns), infer the unknown ratings. In the last decades, few attempts where done to handle that objective with Neural Networks, but recently an architecture based on Autoencoders proved to be a promising approach. In current paper, we enhanced that architecture (i) by using a loss function adapted to input data with missing values, and (ii) by incorporating side information. The experiments demonstrate that while side information only slightly improve the test error averaged on all users/items, it has more impact on cold users/items.
Preparing recommendations for unknown users or such that correctly respond to the short-term needs of a particular user is one of the fundamental problems for e-commerce. Most of the common Recommender Systems assume that user identification must be explicit. In this paper a Session-Aware Recommender System approach is presented where no straightforward user information is required. The recommendation process is based only on user activity within a single session, defined as a sequence of events. This information is incorporated in the recommendation process by explicit context modeling with factorization methods and a novel approach with Recurrent Neural Network (RNN). Compared to the session modeling approach, RNN directly models the dependency of user observed sequential behavior throughout its recurrent structure. The evaluation discusses the results based on sessions from real-life system with ephemeral items (identified only by the set of their attributes) for the task of top-n best recommendations.
The shift from the host-centric to the information-centric paradigm results in many benefits including native security, enhanced mobility, and scalability. The corresponding information-centric networking (ICN), also presents several important challenges, such as closest replica routing, client privacy, and client preference collection. The majority of these challenges have received the research community’s attention. However, no mechanisms have been proposed for the challenge of effective client preferences collection. In the era of big data analytics and recommender systems customer preferences are essential for providers such as Amazon and Netflix. However, with content served from in-network caches, the ICN paradigm indirectly undermines the gathering of these essential individualized preferences. In this paper, we discuss the requirements for client preference collections and present potential mechanisms that may be used for achieving it successfully.
We propose an efficient recommendation algorithm, by incorporating the side information of users' trust and distrust social relationships into the learning process of a Joint Non-negative Matrix Factorization technique based on Signed Graphs, namely JNMF-SG. The key idea in this study is to generate clusters based on signed graphs, considering positive and negative weights for the trust and distrust relationships, respectively. Using a spectral clustering approach for signed graphs, the clusters are extracted on condition that users with positive connections should lie close, while users with negative ones should lie far. Then, we propose a Joint Non-negative Matrix factorization framework, by generating the final recommendations, using the user-item and user-cluster associations over the joint factorization. In our experiments with a dataset from a real-world social media platform, we show that we significantly increase the recommendation accuracy, compared to state-of-the-art methods that also consider the trust and distrust side information in matrix factorization.
The cold start problem in recommender systems refers to the inability of making reliable recommendations if a critical mass of items has not yet been rated. To bypass this problem existing research focused on developing more reliable prediction models for situations in which only few items ratings exist. However, most of these approaches depend on adjusting the algorithm that determines a recommendation. We present a complimentary approach that does not require any adjustments to the recommendation algorithm. We draw on motivation theory and reward users for rating items. In particular, we instantiate different gamification patterns and examine their effect on the average userâs number of provided report ratings. Our results confirm the positive effect of instantiating gamification patterns on the number of received report ratings.
This turorial offers a rich blend of theory and practice regarding dimensionality reduction methods, to address the information overload problem in recommender systems. This problem affects our everyday experience while searching for knowledge on a topic. Naive Collaborative Filtering cannot deal with challenging issues such as scalability, noise, and sparsity. We can deal with all the aforementioned challenges by applying matrix and tensor decomposition methods. These methods have been proven to be the most accurate (i.e., Netflix prize) and efficient for handling big data. For each method (SVD, SVD++, timeSVD++, HOSVD, CUR, etc.) we will provide a detailed theoretical mathematical background and a step-by-step analysis, by using an integrated toy example, which runs throughout all parts of the tutorial, helping the audience to understand clearly the differences among factorisation methods.
User modeling of individual users on the Social Web platforms such as Twitter plays a significant role in providing personalized recommendations and filtering interesting information from social streams. Recently, researchers proposed the use of concepts (e.g., DBpedia entities) for representing user interests instead of word-based approaches, since Knowledge Bases such as DBpedia provide cross-domain background knowledge about concepts, and thus can be used for extending user interest profiles. Even so, not all concepts can be covered by a Knowledge Base, especially in the case of microblogging platforms such as Twitter where new concepts/topics emerge everyday. In this short paper, instead of using concepts alone, we propose using synsets from WordNet and concepts from DBpedia for representing user interests. We evaluate our proposed user modeling strategies by comparing them with other bag-of-concepts approaches. The results show that using synsets and concepts together for representing user interests improves the quality of user modeling significantly in the context of link recommendations on Twitter.
The Wikidata platform is a crowdsourced, structured knowledgebase aiming to provide integrated, free and language-agnostic facts which are–-amongst others–-used by Wikipedias. Users who actively enter, review and revise data on Wikidata are assisted by a property suggesting system which provides users with properties that might also be applicable to a given item. We argue that evaluating and subsequently improving this recommendation mechanism and hence, assisting users, can directly contribute to an even more integrated, consistent and extensive knowledge base serving a huge variety of applications. However, the quality and usefulness of such recommendations has not been evaluated yet. In this work, we provide the first evaluation of different approaches aiming to provide users with property recommendations in the process of curating information on Wikidata. We compare the approach currently facilitated on Wikidata with two state-of-the-art recommendation approaches stemming from the field of RDF recommender systems and collaborative information systems. Further, we also evaluate hybrid recommender systems combining these approaches. Our evaluations show that the current recommendation algorithm works well in regards to recall and precision, reaching a recall@7 of 79.71% and a precision@7 of 27.97%. We also find that generally, incorporating contextual as well as classifying information into the computation of property recommendations can further improve its performance significantly.
The Netflix experience is driven by a number of recommendation algorithms: personalized ranking, page generation, similarity, ratings, search, etc. On the January 6th, 2016 we simultaneously launched Netflix in 130 new countries around the world, which brought the total to over 190 countries. Preparing for such a rapid expansion while ensuring each algorithm was ready to work seamlessly created new challenges for our recommendation and search teams. In this talk, we will highlight the four most interesting challenges we encountered in making our algorithms operate globally and how this improved our ability to connect members worldwide with stories they'll love. In particular, we will dive into the problems of uneven availability across catalogs, balancing personal and cultural tastes, handling language, and tracking quality of recommendations. Uneven catalog availability is a challenge because many recommendation algorithms assume that people could interact with any item and then use the absence of interaction implicitly or explicitly as negative information in the model. However, this assumption does not hold globally and across time where item availability differs. Running algorithms globally means needing a notion of location so that we can handle local variations in taste while also providing a good basis for personalization. Language is another challenge in recommending video content because people can typically only enjoy content that has assets (audio, subtitles) in languages they understand. The preferences for how people enjoy such content also vary between people and depend on their familiarity with a language. Also, while would like our recommendations to work well for every one of our members, tracking quality becomes difficult because with so many members in so many countries speaking so many languages, it can be hard to determine when an algorithm or system is performing sub-optimally for some subset of them. Thus, to support this global launch, we examined each and every algorithm that is part of our service and began to address these challenges.
Recommender systems must find items that match the heterogeneous preferences of its users. Customizable recommenders allow users to directly manipulate the system's algorithm in order to help it match those preferences. However, customizing may demand a certain degree of skill and new users particularly may struggle to effectively customize the system. In user studies of two different systems, I show that there is considerable heterogeneity in the way that new users will try to customize a recommender, even within groups of users with similar underlying preferences. Furthermore, I show that this heterogeneity persists beyond the first few interactions with the recommender. System designs should consider this heterogeneity so that new users can both receive good recommendations in their early interactions as well as learn how to effectively customize the system for their preferences.
Interactive proofs model a world where a verifier delegates computation to an untrustworthy prover, verifying the prover's claims before accepting them. These proofs have applications to delegation of computation, probabilistically checkable proofs, crowdsourcing, and more. In some of these applications, the verifier may pay the prover based on the quality of his work. Rational proofs, introduced by Azar and Micali (2012), are an interactive proof model in which the prover is rational rather than untrustworthy–-he may lie, but only to increase his payment. This allows the verifier to leverage the greed of the prover to obtain better protocols: while rational proofs are no more powerful than interactive proofs, the protocols are simpler and more efficient. Azar and Micali posed as an open problem whether multiple provers are more powerful than one for rational proofs. We provide a model that extends rational proofs to allow multiple provers. In this model, a verifier can cross-check the answers received by asking several provers. The verifier can pay the provers according to the quality of their work, incentivizing them to provide correct information. We analyze rational proofs with multiple provers from a complexity-theoretic point of view. We fully characterize this model by giving tight upper and lower bounds on its power. On the way, we resolve Azar and Micali's open problem in the affirmative, showing that multiple rational provers are strictly more powerful than one (under standard complexity-theoretic assumptions). We further show that the full power of rational proofs with multiple provers can be achieved using only two provers and five rounds of interaction. Finally, we consider more demanding models where the verifier wants the provers' payment to decrease significantly when they are lying, and fully characterize the power of the model when the payment gap must be noticeable (i.e., at least 1/p where p is a polynomial).
Suppose that you have n truly random bits x=(x1,…,xn) and you wish to use them to generate m≫ n pseudorandom bits y=(y1,…, ym) using a local mapping, i.e., each yi should depend on at most d=O(1) bits of x. In the polynomial regime of m=ns, stextgreater1, the only known solution, originates from (Goldreich, ECCC 2000), is based on Random Local Functions: Compute yi by applying some fixed (public) d-ary predicate P to a random (public) tuple of distinct inputs (xi1,…,xid). Our goal in this paper is to understand, for any value of s, how the pseudorandomness of the resulting sequence depends on the choice of the underlying predicate. We derive the following results: (1) We show that pseudorandomness against F2-linear adversaries (i.e., the distribution y has low-bias) is achieved if the predicate is (a) k=Ω(s)-resilience, i.e., uncorrelated with any k-subset of its inputs, and (b) has algebraic degree of Ω(s) even after fixing Ω(s) of its inputs. We also show that these requirements are necessary, and so they form a tight characterization (up to constants) of security against linear attacks. Our positive result shows that a d-local low-bias generator can have output length of nΩ(d), answering an open question of Mossel, Shpilka and Trevisan (FOCS, 2003). Our negative result shows that a candidate for pseudorandom generator proposed by the first author (computational complexity, 2015) and by O’Donnell and Witmer (CCC 2014) is insecure. We use similar techniques to refute a conjecture of Feldman, Perkins and Vempala (STOC 2015) regarding the hardness of planted constraint satisfaction problems. (2) Motivated by the cryptanalysis literature, we consider security against algebraic attacks. We provide the first theoretical treatment of such attacks by formalizing a general notion of algebraic inversion and distinguishing attacks based on the Polynomial Calculus proof system. We show that algebraic attacks succeed if and only if there exist a degree e=O(s) non-zero polynomial Q whose roots cover the roots of P or cover the roots of P’s complement. As a corollary, we obtain the first example of a predicate P for which the generated sequence y passes all linear tests but fails to pass some polynomial-time computable test, answering an open question posed by the first author (Question 4.9, computational complexity 2015).
Motivated by applications in cryptography, we introduce and study the problem of distribution design. The goal of distribution design is to find a joint distribution on \$n\$ random variables that satisfies a given set of constraints on the marginal distributions. Each constraint can either require that two sequences of variables be identically distributed or, alternatively, that the two sequences have disjoint supports. We present several positive and negative results on the existence and efficiency of solutions for a given set of constraints. Distribution design can be seen as a strict generalization of several well-studied problems in cryptography. These include secret sharing, garbling schemes, and non-interactive protocols for secure multiparty computation. We further motivate the problem and our results by demonstrating their usefulness towards realizing non-interactive protocols for ad-hoc secure multiparty computation, in which any subset of the parties may choose to participate and the identity of the participants should remain hidden to the extent possible.
Many lattice-based cryptosystems are based on the security of the Ring learning with errors (Ring-LWE) problem. The most critical and computationally intensive operation of these Ring-LWE based cryptosystems is polynomial multiplication. In this paper, we exploit the number theoretic transform to build a high-speed polynomial multiplier for the Ring-LWE based public key cryptosystems. We present a versatile pipelined polynomial multiplication architecture to calculate the product of two \$n\$-degree polynomials in about ((nlg n)/4 + n/2) clock cycles. In addition, we introduce several optimization techniques to reduce the required ROM storage. The experimental results on a Spartan-6 FPGA show that the proposed hardware architecture can achieve a speedup of on average 2.25 than the state of the art of high-speed design. Meanwhile, our design is able to save up to 47.06% memory blocks.
We ask whether it is possible to anonymously communicate a large amount of data using only public (non-anonymous) communication together with a small anonymous channel. We think this is a central question in the theory of anonymous communication and to the best of our knowledge this is the first formal study in this direction. Towards this goal, we introduce the novel concept of anonymous steganography: think of a leaker Lea who wants to leak a large document to Joe the journalist. Using anonymous steganography Lea can embed this document in innocent looking communication on some popular website (such as cat videos on YouTube or funny memes on 9GAG). Then Lea provides Joe with a short decoding key dk which, when applied to the entire website, recovers the document while hiding the identity of Lea among the large number of users of the website. Our contributions include: Introducing and formally defining anonymous steganography, A construction showing that anonymous steganography is possible (which uses recent results in circuits obfuscation), A lower bound on the number of bits which are needed to bootstrap anonymous communication.