Visible to the public Biblio

Filters: Keyword is nearest neighbor search  [Clear All Filters]
2018-06-11
Luo, X., Chen, K., Pang, G., Shou, L., Chen, G..  2017.  Visible Nearest Neighbor Search for Objects Moving on Consecutive Trajectories. 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC). :1296–1303.

A visible nearest neighbor (VNN) query returns the k nearest objects that are visible to a query point, which is used to support various applications such as route planning, target monitoring, and antenna placement. However, with the proliferation of wireless communications and advances in positioning technology for mobile equipments, efficiently searching for VNN among moving objects are required. While most previous work on VNN query focused on static objects, in this paper, we treats the objects as moving consecutively when indexing them, and study the visible nearest neighbor query for moving objects (MVNN) . Assuming that the objects are represented as trajectories given by linear functions of time, we propose a scheme which indexes the moving objects by time-parameterized R-tree (TPR-tree) and obstacles by R-tree. The paper offers four heuristics for visibility and space pruning. New algorithms, Post-pruning and United-pruning, are developed for efficiently solving MVNN queries with all four heuristics. The effectiveness and efficiency of our solutions are verified by extensive experiments over synthetic datasets on real road network.

Fujiwara, Yasuhiro, Marumo, Naoki, Blondel, Mathieu, Takeuchi, Koh, Kim, Hideaki, Iwata, Tomoharu, Ueda, Naonori.  2017.  Scaling Locally Linear Embedding. Proceedings of the 2017 ACM International Conference on Management of Data. :1479–1492.
Locally Linear Embedding (LLE) is a popular approach to dimensionality reduction as it can effectively represent nonlinear structures of high-dimensional data. For dimensionality reduction, it computes a nearest neighbor graph from a given dataset where edge weights are obtained by applying the Lagrange multiplier method, and it then computes eigenvectors of the LLE kernel where the edge weights are used to obtain the kernel. Although LLE is used in many applications, its computation cost is significantly high. This is because, in obtaining edge weights, its computation cost is cubic in the number of edges to each data point. In addition, the computation cost in obtaining the eigenvectors of the LLE kernel is cubic in the number of data points. Our approach, Ripple, is based on two ideas: (1) it incrementally updates the edge weights by exploiting the Woodbury formula and (2) it efficiently computes eigenvectors of the LLE kernel by exploiting the LU decomposition-based inverse power method. Experiments show that Ripple is significantly faster than the original approach of LLE by guaranteeing the same results of dimensionality reduction.
Liu, Yingfan, Cheng, Hong, Cui, Jiangtao.  2017.  PQBF: I/O-Efficient Approximate Nearest Neighbor Search by Product Quantization. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. :667–676.
Approximate nearest neighbor (ANN) search in high-dimensional space plays an essential role in many multimedia applications. Recently, product quantization (PQ) based methods for ANN search have attracted enormous attention in the community of computer vision, due to its good balance between accuracy and space requirement. PQ based methods embed a high-dimensional vector into a short binary code (called PQ code), and the squared Euclidean distance is estimated by asymmetric quantizer distance (AQD) with pretty high precision. Thus, ANN search in the original space can be converted to similarity search on AQD using the PQ approach. All existing PQ methods are in-memory solutions, which may not handle massive data if they cannot fit entirely in memory. In this paper, we propose an I/O-efficient PQ based solution for ANN search. We design an index called PQB+-forest to support efficient similarity search on AQD. PQB+-forest first creates a number of partitions of the PQ codes by a coarse quantizer and then builds a B+-tree, called PQB+-tree, for each partition. The search process is greatly expedited by focusing on a few selected partitions that are closest to the query, as well as by the pruning power of PQB+-trees. According to the experiments conducted on two large-scale data sets containing up to 1 billion vectors, our method outperforms its competitors, including the state-of-the-art PQ method and the state-of-the-art LSH methods for ANN search.
Zhang, Peng-Fei, Li, Chuan-Xiang, Liu, Meng-Yuan, Nie, Liqiang, Xu, Xin-Shun.  2017.  Semi-Relaxation Supervised Hashing for Cross-Modal Retrieval. Proceedings of the 2017 ACM on Multimedia Conference. :1762–1770.
Recently, some cross-modal hashing methods have been devised for cross-modal search task. Essentially, given a similarity matrix, most of these methods tackle a discrete optimization problem by separating it into two stages, i.e., first relaxing the binary constraints and finding a solution of the relaxed optimization problem, then quantizing the solution to obtain the binary codes. This scheme will generate large quantization error. Some discrete optimization methods have been proposed to tackle this; however, the generation of the binary codes is independent of the features in the original space, which makes it not robust to noise. To consider these problems, in this paper, we propose a novel supervised cross-modal hashing method—Semi-Relaxation Supervised Hashing (SRSH). It can learn the hash functions and the binary codes simultaneously. At the same time, to tackle the optimization problem, it relaxes a part of binary constraints, instead of all of them, by introducing an intermediate representation variable. By doing this, the quantization error can be reduced and the optimization problem can also be easily solved by an iterative algorithm proposed in this paper. Extensive experimental results on three benchmark datasets demonstrate that SRSH can obtain competitive results and outperform state-of-the-art unsupervised and supervised cross-modal hashing methods.
2017-08-02
Moran, Sean, McCreadie, Richard, Macdonald, Craig, Ounis, Iadh.  2016.  Enhancing First Story Detection Using Word Embeddings. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. :821–824.

In this paper we show how word embeddings can be used to increase the effectiveness of a state-of-the art Locality Sensitive Hashing (LSH) based first story detection (FSD) system over a standard tweet corpus. Vocabulary mismatch, in which related tweets use different words, is a serious hindrance to the effectiveness of a modern FSD system. In this case, a tweet could be flagged as a first story even if a related tweet, which uses different but synonymous words, was already returned as a first story. In this work, we propose a novel approach to mitigate this problem of lexical variation, based on tweet expansion. In particular, we propose to expand tweets with semantically related paraphrases identified via automatically mined word embeddings over a background tweet corpus. Through experimentation on a large data stream comprised of 50 million tweets, we show that FSD effectiveness can be improved by 9.5% over a state-of-the-art FSD system.

Bremler-Barr, Anat, Harchol, Yotam, Hay, David, Hel-Or, Yacov.  2016.  Encoding Short Ranges in TCAM Without Expansion: Efficient Algorithm and Applications. Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. :35–46.

We present RENE –- a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENE on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.

Pazahr, Ali, Zapater, J. Javier Samper, Sánchez, Francisco García, Botella, Carmen, Martinez, Rafael.  2016.  Semantically-enhanced Advertisement Recommender Systems in Social Networks. Proceedings of the 18th International Conference on Information Integration and Web-based Applications and Services. :179–189.

Providing recommendations on social systems has been in the spotlight of both academics and industry for some time already. Social network giants like Facebook, LinkedIn, Myspace, etc., are eager to find the silver bullet of recommendation. These applications permit clients to shape a few certain social networks through their day-by-day social cooperative communications. In the meantime, today's online experience depends progressively on social association. One of the main concerns in social network is establishing a successful business plan to make more profit from the social network. Doing a business on every platform needs a good business plan with some important solutions such as advertise the products or services of other companies which would be a kind of marketing for those external businesses. In this study a philosophy of a system speaking to of a comprehensive structure of advertisement recommender system for social networks will be presented. The framework uses a semantic logic to provide the recommended products and this capability can differentiate the recommender part of the framework from classical recommender methods. Briefly, the framework proposed in this study has been designed in a form that can generate advertisement recommendations in a simplified and effective way for social network users.

Agarwal, Pankaj K., Fox, Kyle, Munagala, Kamesh, Nath, Abhinandan.  2016.  Parallel Algorithms for Constructing Range and Nearest-Neighbor Searching Data Structures. Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. :429–440.

With the massive amounts of data available today, it is common to store and process data using multiple machines. Parallel programming platforms such as MapReduce and its variants are popular frameworks for handling such large data. We present the first provably efficient algorithms to compute, store, and query data structures for range queries and approximate nearest neighbor queries in a popular parallel computing abstraction that captures the salient features of MapReduce and other massively parallel communication (MPC) models. In particular, we describe algorithms for \$kd\$-trees, range trees, and BBD-trees that only require O(1) rounds of communication for both preprocessing and querying while staying competitive in terms of running time and workload to their classical counterparts. Our algorithms are randomized, but they can be made deterministic at some increase in their running time and workload while keeping the number of rounds of communication to be constant.

Liu, Mingmou, Pan, Xiaoyin, Yin, Yitong.  2016.  Randomized Approximate Nearest Neighbor Search with Limited Adaptivity. Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. :23–33.

We study the problem of approximate nearest neighbor search in \$d\$-dimensional Hamming space \0,1\d. We study the complexity of the problem in the famous cell-probe model, a classic model for data structures. We consider algorithms in the cell-probe model with limited adaptivity, where the algorithm makes k rounds of parallel accesses to the data structure for a given k. For any k ≥ 1, we give a simple randomized algorithm solving the approximate nearest neighbor search using k rounds of parallel memory accesses, with O(k(log d)1/k) accesses in total. We also give a more sophisticated randomized algorithm using O(k+(1/k log d)O(1/k)) memory accesses in k rounds for large enough k. Both algorithms use data structures of size polynomial in n, the number of points in the database. We prove an Ω(1/k(log d)1/k) lower bound for the total number of memory accesses required by any randomized algorithm solving the approximate nearest neighbor search within k ≤ (log log d)/(2 log log log d) rounds of parallel memory accesses on any data structures of polynomial size. This lower bound shows that our first algorithm is asymptotically optimal for any constant round k. And our second algorithm approaches the asymptotically optimal tradeoff between rounds and memory accesses, in a sense that the lower bound of memory accesses for any k1 rounds can be matched by the algorithm within k2=O(k1) rounds. In the extreme, for some large enough k=Θ((log log d)/(log log log d)), our second algorithm matches the Θ((log log d)/(log log log d)) tight bound for fully adaptive algorithms for approximate nearest neighbor search due to Chakrabarti and Regev.

Zheng, Yuxin, Guo, Qi, Tung, Anthony K.H., Wu, Sai.  2016.  LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index. Proceedings of the 2016 International Conference on Management of Data. :2023–2037.

Due to the "curse of dimensionality" problem, it is very expensive to process the nearest neighbor (NN) query in high-dimensional spaces; and hence, approximate approaches, such as Locality-Sensitive Hashing (LSH), are widely used for their theoretical guarantees and empirical performance. Current LSH-based approaches target at the L1 and L2 spaces, while as shown in previous work, the fractional distance metrics (Lp metrics with 0 textless p textless 1) can provide more insightful results than the usual L1 and L2 metrics for data mining and multimedia applications. However, none of the existing work can support multiple fractional distance metrics using one index. In this paper, we propose LazyLSH that answers approximate nearest neighbor queries for multiple Lp metrics with theoretical guarantees. Different from previous LSH approaches which need to build one dedicated index for every query space, LazyLSH uses a single base index to support the computations in multiple Lp spaces, significantly reducing the maintenance overhead. Extensive experiments show that LazyLSH provides more accurate results for approximate kNN search under fractional distance metrics.

Wang, Min, Zhou, Wengang, Tian, Qi, Zha, Zhengjun, Li, Houqiang.  2016.  Linear Distance Preserving Pseudo-Supervised and Unsupervised Hashing. Proceedings of the 2016 ACM on Multimedia Conference. :1257–1266.

With the advantage in compact representation and efficient comparison, binary hashing has been extensively investigated for approximate nearest neighbor search. In this paper, we propose a novel and general hashing framework, which simultaneously considers a new linear pair-wise distance preserving objective and point-wise constraint. The direct distance preserving objective aims to keep the linear relationships between the Euclidean distance and the Hamming distance of data points. Based on different point-wise constraints, we propose two methods to instantiate this framework. The first one is a pseudo-supervised hashing method, which uses existing unsupervised hashing methods to generate binary codes as pseudo-supervised information. The second one is an unsupervised hashing method, in which quantization loss is considered. We validate our framework on two large-scale datasets. The experiments demonstrate that our pseudo-supervised method achieves consistent improvement for the state-of-the-art unsupervised hashing methods, while our unsupervised method outperforms the state-of-the-art methods.

Shejawal, Pooja, Pansare, Jayshree R..  2016.  Nearest Neighbor Search Technique Using Keywords and Threshold. Proceedings of the ACM Symposium on Women in Research 2016. :7–11.

Today's applications asking for finding spatial protests nearest to a predefined area in the meantime fulfill limitation of keywords. Best answer for such questions depends on the IR2-tree, which has some inadequacies that truly affect system s efficiency. To defeat those inadequacies another access strategy is produced called the Spatial-inverted Index (SI) that extends the modified file to adapt to multidimensional information, and accompanies calculations that can answer closest neighbor queries with keywords continuously. This new technique SI is produced broadens the capacities of routine modified record makes do with multidimensional information, alongside the arrangement of using so as to move reach queries replied SI results to calculation which tackles the issue continuously.

Iscen, Ahmet, Furon, Teddy.  2016.  Group Testing for Identification with Privacy. Proceedings of the 4th ACM Workshop on Information Hiding and Multimedia Security. :51–56.

This paper describes an approach where group testing helps in enforcing security and privacy in identification. We detail a particular scheme based on embedding and group testing. We add a second layer of defense, group vectors, where each group vector represents a set of dataset vectors. Whereas the selected embedding poorly protects the data when used alone, the group testing approach makes it much harder to reconstruct the data when combined with the embedding. Even when curious server and user collude to disclose the secret parameters, they cannot accurately recover the data. Another byproduct of our approach is that it reduces the complexity of the search and the required storage space. We show the interest of our work in a benchmark biometrics dataset, where we verify our theoretical analysis with real data.

2017-05-16
Yan, Ting-Kun, Xu, Xin-Shun, Guo, Shanqing, Huang, Zi, Wang, Xiao-Lin.  2016.  Supervised Robust Discrete Multimodal Hashing for Cross-Media Retrieval. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :1271–1280.

Recently, multimodal hashing techniques have received considerable attention due to their low storage cost and fast query speed for multimodal data retrieval. Many methods have been proposed; however, there are still some problems that need to be further considered. For example, some of these methods just use a similarity matrix for learning hash functions which will discard some useful information contained in original data; some of them relax binary constraints or separate the process of learning hash functions and binary codes into two independent stages to bypass the obstacle of handling the discrete constraints on binary codes for optimization, which may generate large quantization error; some of them are not robust to noise. All these problems may degrade the performance of a model. To consider these problems, in this paper, we propose a novel supervised hashing framework for cross-modal retrieval, i.e., Supervised Robust Discrete Multimodal Hashing (SRDMH). Specifically, SRDMH tries to make final binary codes preserve label information as same as that in original data so that it can leverage more label information to supervise the binary codes learning. In addition, it learns hashing functions and binary codes directly instead of relaxing the binary constraints so as to avoid large quantization error problem. Moreover, to make it robust and easy to solve, we further integrate a flexible l2,p loss with nonlinear kernel embedding and an intermediate presentation of each instance. Finally, an alternating algorithm is proposed to solve the optimization problem in SRDMH. Extensive experiments are conducted on three benchmark data sets. The results demonstrate that the proposed method (SRDMH) outperforms or is comparable to several state-of-the-art methods for cross-modal retrieval task.