Visible to the public Biblio

Filters: Keyword is locality sensitive hashing  [Clear All Filters]
2021-08-31
Hu, Hongsheng, Dobbie, Gillian, Salcic, Zoran, Liu, Meng, Zhang, Jianbing, Zhang, Xuyun.  2020.  A Locality Sensitive Hashing Based Approach for Federated Recommender System. 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID). :836–842.
The recommender system is an important application in big data analytics because accurate recommendation items or high-valued suggestions can bring high profit to both commercial companies and customers. To make precise recommendations, a recommender system often needs large and fine-grained data for training. In the current big data era, data often exist in the form of isolated islands, and it is difficult to integrate the data scattered due to privacy security concerns. Moreover, privacy laws and regulations make it harder to share data. Therefore, designing a privacy-preserving recommender system is of paramount importance. Existing privacy-preserving recommender system models mainly adapt cryptography approaches to achieve privacy preservation. However, cryptography approaches have heavy overhead when performing encryption and decryption operations and they lack a good level of flexibility. In this paper, we propose a Locality Sensitive Hashing (LSH) based approach for federated recommender system. Our proposed efficient and scalable federated recommender system can make full use of multiple source data from different data owners while guaranteeing preservation of privacy of contributing parties. Extensive experiments on real-world benchmark datasets show that our approach can achieve both high time efficiency and accuracy under small privacy budgets.
2020-11-16
Anju, J., Shreelekshmi, R..  2019.  Modified Feature Descriptors to enhance Secure Content-based Image Retrieval in Cloud. 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT). 1:674–680.
With the emergence of cloud, content-based image retrieval (CBIR) on encrypted domain gain enormous importance due to the ever increasing need for ensuring confidentiality, authentication, integrity and privacy of data. CBIR on outsourced encrypted images can be done by extracting features from unencrypted images and generating searchable encrypted index based on it. Visual descriptors like color descriptors, shape and texture descriptors, etc. are employed for similarity search. Since visual descriptors used to represent an image have crucial role in retrieving most similar results, an attempt to combine them has been made in this paper. The effect of combining different visual descriptors on retrieval precision in secure CBIR scheme proposed by Xia et al. is analyzed. Experimental results show that combining visual descriptors can significantly enhance retrieval precision of the secure CBIR scheme.
2020-06-08
Fang, Bo, Hua, Zhongyun, Huang, Hejiao.  2019.  Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket. 2019 14th International Conference on Computer Science Education (ICCSE). :5–10.
Nearest neighbor search (NNS) is one of the current popular research directions, which widely used in machine learning, pattern recognition, image detection and so on. In the low dimension data, based on tree search method can get good results. But when the data dimension goes up, that will produce a curse of dimensional. The proposed Locality-Sensitive Hashing algorithm (LSH) greatly improves the efficiency of nearest neighbor query for high dimensional data. But the algorithm relies on the building a large number of hash table, which makes the space complexity very high. C2LSH based on dynamic collision improves the disadvantage of LSH, but its disadvantage is that it needs to detect the collision times of a large number of data points which Increased query time. Therefore, Based on LSH algorithm, later researchers put forward many improved algorithms, but still not ideal.In this paper, we put forward Locality-Sensitive Hashing Scheme Based on Heap Sort of Hash Bucket (HSLSH) algorithm aiming at the shortcomings of LSH and C2LSH. Its main idea is to take advantage of the efficiency of heapsort in massive data sorting to improve the efficiency of nearest neighbor query. It only needs to rely on a small number of hash functions can not only overcome the shortcoming of LSH need to build a large number of hash table, and avoids defects of C2LSH. Experiments show that our algorithm is more than 20% better than C2LSH in query accuracy and 40% percent lower in query time.
2018-06-11
Sun, Yuanyuan, Hua, Yu, Liu, Xue, Cao, Shunde, Zuo, Pengfei.  2017.  DLSH: A Distribution-aware LSH Scheme for Approximate Nearest Neighbor Query in Cloud Computing. Proceedings of the 2017 Symposium on Cloud Computing. :242–255.
Cloud computing needs to process and analyze massive high-dimensional data in a real-time manner. Approximate queries in cloud computing systems can provide timely queried results with acceptable accuracy, thus alleviating the consumption of a large amount of resources. Locality Sensitive Hashing (LSH) is able to maintain the data locality and support approximate queries. However, due to randomly choosing hash functions, LSH has to use too many functions to guarantee the query accuracy. The extra computation and storage overheads exacerbate the real performance of LSH. In order to reduce the overheads and deliver high performance, we propose a distribution-aware scheme, called DLSH, to offer cost-effective approximate nearest neighbor query service for cloud computing. The idea of DLSH is to leverage the principal components of the data distribution as the projection vectors of hash functions in LSH, further quantify the weight of each hash function and adjust the interval value in each hash table. We then refine the queried result set based on the hit frequency to significantly decrease the time overhead of distance computation. Extensive experiments in a large-scale cloud computing testbed demonstrate significant improvements in terms of multiple system performance metrics. We have released the source code of DLSH for public use.
2018-02-14
Awad, A., Matthews, A., Qiao, Y., Lee, B..  2017.  Chaotic Searchable Encryption for Mobile Cloud Storage. IEEE Transactions on Cloud Computing. PP:1–1.

This paper considers the security problem of outsourcing storage from user devices to the cloud. A secure searchable encryption scheme is presented to enable searching of encrypted user data in the cloud. The scheme simultaneously supports fuzzy keyword searching and matched results ranking, which are two important factors in facilitating practical searchable encryption. A chaotic fuzzy transformation method is proposed to support secure fuzzy keyword indexing, storage and query. A secure posting list is also created to rank the matched results while maintaining the privacy and confidentiality of the user data, and saving the resources of the user mobile devices. Comprehensive tests have been performed and the experimental results show that the proposed scheme is efficient and suitable for a secure searchable cloud storage system.

2017-08-02
Moran, Sean, McCreadie, Richard, Macdonald, Craig, Ounis, Iadh.  2016.  Enhancing First Story Detection Using Word Embeddings. Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. :821–824.

In this paper we show how word embeddings can be used to increase the effectiveness of a state-of-the art Locality Sensitive Hashing (LSH) based first story detection (FSD) system over a standard tweet corpus. Vocabulary mismatch, in which related tweets use different words, is a serious hindrance to the effectiveness of a modern FSD system. In this case, a tweet could be flagged as a first story even if a related tweet, which uses different but synonymous words, was already returned as a first story. In this work, we propose a novel approach to mitigate this problem of lexical variation, based on tweet expansion. In particular, we propose to expand tweets with semantically related paraphrases identified via automatically mined word embeddings over a background tweet corpus. Through experimentation on a large data stream comprised of 50 million tweets, we show that FSD effectiveness can be improved by 9.5% over a state-of-the-art FSD system.

Zheng, Yuxin, Guo, Qi, Tung, Anthony K.H., Wu, Sai.  2016.  LazyLSH: Approximate Nearest Neighbor Search for Multiple Distance Functions with a Single Index. Proceedings of the 2016 International Conference on Management of Data. :2023–2037.

Due to the "curse of dimensionality" problem, it is very expensive to process the nearest neighbor (NN) query in high-dimensional spaces; and hence, approximate approaches, such as Locality-Sensitive Hashing (LSH), are widely used for their theoretical guarantees and empirical performance. Current LSH-based approaches target at the L1 and L2 spaces, while as shown in previous work, the fractional distance metrics (Lp metrics with 0 textless p textless 1) can provide more insightful results than the usual L1 and L2 metrics for data mining and multimedia applications. However, none of the existing work can support multiple fractional distance metrics using one index. In this paper, we propose LazyLSH that answers approximate nearest neighbor queries for multiple Lp metrics with theoretical guarantees. Different from previous LSH approaches which need to build one dedicated index for every query space, LazyLSH uses a single base index to support the computations in multiple Lp spaces, significantly reducing the maintenance overhead. Extensive experiments show that LazyLSH provides more accurate results for approximate kNN search under fractional distance metrics.

2015-05-06
Zhongming Jin, Cheng Li, Yue Lin, Deng Cai.  2014.  Density Sensitive Hashing. Cybernetics, IEEE Transactions on. 44:1362-1371.

Nearest neighbor search is a fundamental problem in various research fields like machine learning, data mining and pattern recognition. Recently, hashing-based approaches, for example, locality sensitive hashing (LSH), are proved to be effective for scalable high dimensional nearest neighbor search. Many hashing algorithms found their theoretic root in random projection. Since these algorithms generate the hash tables (projections) randomly, a large number of hash tables (i.e., long codewords) are required in order to achieve both high precision and recall. To address this limitation, we propose a novel hashing algorithm called density sensitive hashing (DSH) in this paper. DSH can be regarded as an extension of LSH. By exploring the geometric structure of the data, DSH avoids the purely random projections selection and uses those projective functions which best agree with the distribution of the data. Extensive experimental results on real-world data sets have shown that the proposed method achieves better performance compared to the state-of-the-art hashing approaches.