Visible to the public Biblio

Filters: Keyword is LSH  [Clear All Filters]
2020-09-28
Chen, Lvhao, Liao, Xiaofeng, Mu, Nankun, Wu, Jiahui, Junqing, Junqing.  2019.  Privacy-Preserving Fuzzy Multi-Keyword Search for Multiple Data Owners in Cloud Computing. 2019 IEEE Symposium Series on Computational Intelligence (SSCI). :2166–2171.
With cloud computing's development, more users are decide to store information on the cloud server. Owing to the cloud server's insecurity, many documents should be encrypted to avoid information leakage before being sent to the cloud. Nevertheless, it leads to the problem that plaintext search techniques can not be directly applied to the ciphertext search. In this case, many searchable encryption schemes based on single data owner model have been proposed. But, the actual situation is that users want to do research with encrypted documents originating from various data owners. This paper puts forward a privacy-preserving scheme that is based on fuzzy multi-keyword search (PPFMKS) for multiple data owners. For the sake of espousing fuzzy multi-keyword and accurate search, secure indexes on the basis of Locality-Sensitive Hashing (LSH) and Bloom Filter (BF)are established. To guarantee the search privacy under multiple data owners model, a new encryption method allowing that different data owners have diverse keys to encrypt files is proposed. This method also solves the high cost caused by inconvenience of key management.
2019-01-21
Yao, S., Niu, B., Liu, J..  2018.  Enhancing Sampling and Counting Method for Audio Retrieval with Time-Stretch Resistance. 2018 IEEE Fourth International Conference on Multimedia Big Data (BigMM). :1–5.

An ideal audio retrieval method should be not only highly efficient in identifying an audio track from a massive audio dataset, but also robust to any distortion. Unfortunately, none of the audio retrieval methods is robust to all types of distortions. An audio retrieval method has to do with both the audio fingerprint and the strategy, especially how they are combined. We argue that the Sampling and Counting Method (SC), a state-of-the-art audio retrieval method, would be promising towards an ideal audio retrieval method, if we could make it robust to time-stretch and pitch-stretch. Towards this objective, this paper proposes a turning point alignment method to enhance SC with resistance to time-stretch, which makes Philips and Philips-like fingerprints resist to time-stretch. Experimental results show that our approach can resist to time-stretch from 70% to 130%, which is on a par to the state-of-the-art methods. It also marginally improves the retrieval performance with various noise distortions.

2017-04-24
Zhang, Xuyun, Leckie, Christopher, Dou, Wanchun, Chen, Jinjun, Kotagiri, Ramamohanarao, Salcic, Zoran.  2016.  Scalable Local-Recoding Anonymization Using Locality Sensitive Hashing for Big Data Privacy Preservation. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :1793–1802.

While cloud computing has become an attractive platform for supporting data intensive applications, a major obstacle to the adoption of cloud computing in sectors such as health and defense is the privacy risk associated with releasing datasets to third-parties in the cloud for analysis. A widely-adopted technique for data privacy preservation is to anonymize data via local recoding. However, most existing local-recoding techniques are either serial or distributed without directly optimizing scalability, thus rendering them unsuitable for big data applications. In this paper, we propose a highly scalable approach to local-recoding anonymization in cloud computing, based on Locality Sensitive Hashing (LSH). Specifically, a novel semantic distance metric is presented for use with LSH to measure the similarity between two data records. Then, LSH with the MinHash function family can be employed to divide datasets into multiple partitions for use with MapReduce to parallelize computation while preserving similarity. By using our efficient LSH-based scheme, we can anonymize each partition through the use of a recursive agglomerative \$k\$-member clustering algorithm. Extensive experiments on real-life datasets show that our approach significantly improves the scalability and time-efficiency of local-recoding anonymization by orders of magnitude over existing approaches.

2015-05-06
Zhongming Jin, Cheng Li, Yue Lin, Deng Cai.  2014.  Density Sensitive Hashing. Cybernetics, IEEE Transactions on. 44:1362-1371.

Nearest neighbor search is a fundamental problem in various research fields like machine learning, data mining and pattern recognition. Recently, hashing-based approaches, for example, locality sensitive hashing (LSH), are proved to be effective for scalable high dimensional nearest neighbor search. Many hashing algorithms found their theoretic root in random projection. Since these algorithms generate the hash tables (projections) randomly, a large number of hash tables (i.e., long codewords) are required in order to achieve both high precision and recall. To address this limitation, we propose a novel hashing algorithm called density sensitive hashing (DSH) in this paper. DSH can be regarded as an extension of LSH. By exploring the geometric structure of the data, DSH avoids the purely random projections selection and uses those projective functions which best agree with the distribution of the data. Extensive experimental results on real-world data sets have shown that the proposed method achieves better performance compared to the state-of-the-art hashing approaches.

Mokhtar, B., Eltoweissy, M..  2014.  Towards a Data Semantics Management System for Internet Traffic. New Technologies, Mobility and Security (NTMS), 2014 6th International Conference on. :1-5.

Although current Internet operations generate voluminous data, they remain largely oblivious of traffic data semantics. This poses many inefficiencies and challenges due to emergent or anomalous behavior impacting the vast array of Internet elements such as services and protocols. In this paper, we propose a Data Semantics Management System (DSMS) for learning Internet traffic data semantics to enable smarter semantics- driven networking operations. We extract networking semantics and build and utilize a dynamic ontology of network concepts to better recognize and act upon emergent or abnormal behavior. Our DSMS utilizes: (1) Latent Dirichlet Allocation algorithm (LDA) for latent features extraction and semantics reasoning; (2) big tables as a cloud-like data storage technique to maintain large-scale data; and (3) Locality Sensitive Hashing algorithm (LSH) for reducing data dimensionality. Our preliminary evaluation using real Internet traffic shows the efficacy of DSMS for learning behavior of normal and abnormal traffic data and for accurately detecting anomalies at low cost.