Biblio

List
Filter

Found 97 results

Filters: Keyword is query processing [Clear All Filters]

2022-07-15

Giesser, Patrick, Stechschulte, Gabriel, Costa Vaz, Anna da, Kaufmann, Michael. 2021. Implementing Efficient and Scalable In-Database Linear Regression in SQL. 2021 IEEE International Conference on Big Data (Big Data). :5125—5132.

Relational database management systems not only support larger-than-memory data processing and very advanced query optimization, but also offer the benefits of data security, privacy, and consistency. When machine learning on large data sets is processed directly on an existing SQL database server, the data does not need to be exported and transferred to a separate big data processing platform. To achieve this, we implement a linear regression algorithm using SQL code generation such that the computation can be performed server-side and directly in the RDBMs. Our method and its implementation, programmed in Python, solves linear regression (LR) using the ordinary least squares (OLS) method directly in the RDBMS using SQL code generation, leaving most of the processing in the database. Only the matrix of the system of equations, whose size is equal to the number of variables squared, is transferred from the SQL server to the Python client to be solved for OLS regression. For evaluation purposes, our LR implementation was tested with artificially generated datasets and compared to an existing Python library (Scikit Learn). We found that our implementation consistently solves OLS regression faster than Scikit Learn for datasets with more than 10,000 input rows, and if the number of columns is less than 64. Moreover, under the same test conditions where the computation is larger than memory, our implementation showed a fast result, while Scikit returned an out-of-memory error. We conclude that SQL is a promising tool for in-database processing of large-volume, low-dimensional data sets with a particular class of machine learning algorithms, namely those that can be efficiently solved with map-reduce queries such as OLS regression.

2022-06-10

Ge, Yurun, Bertozzi, Andrea L.. 2021. Active Learning for the Subgraph Matching Problem. 2021 IEEE International Conference on Big Data (Big Data). :2641–2649.

The subgraph matching problem arises in a number of modern machine learning applications including segmented images and meshes of 3D objects for pattern recognition, bio-chemical reactions and security applications. This graph-based problem can have a very large and complex solution space especially when the world graph has many more nodes and edges than the template. In a real use-case scenario, analysts may need to query additional information about template nodes or world nodes to reduce the problem size and the solution space. Currently, this query process is done by hand, based on the personal experience of analysts. By analogy to the well-known active learning problem in machine learning classification problems, we present a machine-based active learning problem for the subgraph match problem in which the machine suggests optimal template target nodes that would be most likely to reduce the solution space when it is otherwise overly large and complex. The humans in the loop can then include additional information about those target nodes. We present some case studies for both synthetic and real world datasets for multichannel subgraph matching.

2022-01-10

Radhakrishnan, Sangeetha, Akila, A.. 2021. Securing Distributed Database Using Elongated RSA Algorithm. 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS). 1:1931–1936.

Securing data, management of the authorised access of the user and maintaining the privacy of the data are some of the problems relating with the stored data in the database. The security of the data stored is considered as the major concern which is to be managed in a very serious manner as the users are sensitive about their shared data. The user's data can be protected by the process of cryptography which is considered as the conventional method. Advanced Encryption Standard (AES), Data Encryption Standard(DES), Two Fish, Rivest Shamir Adleman Algorithm (RSA), Attribute Based Encryption (ABE), Blowfish algorithms are considered as some of the cryptographic algorithms. These algorithms are classified into symmetric and asymmetric algorithms. Same key is used for the encryption and decoding technique in symmetric key cryptographic algorithm whereas two keys are used for the asymmetric ones. In this paper, the implementation of one of the asymmetric algorithm RSA with the educational dataset is done. To secure the distributed database, the extended version of the RSA algorithm is implemented as the proposed work.

2021-03-09

Rahmati, A., Moosavi-Dezfooli, S.-M., Frossard, P., Dai, H.. 2020. GeoDA: A Geometric Framework for Black-Box Adversarial Attacks. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :8443–8452.

Adversarial examples are known as carefully perturbed images fooling image classifiers. We propose a geometric framework to generate adversarial examples in one of the most challenging black-box settings where the adversary can only generate a small number of queries, each of them returning the top-1 label of the classifier. Our framework is based on the observation that the decision boundary of deep networks usually has a small mean curvature in the vicinity of data samples. We propose an effective iterative algorithm to generate query-efficient black-box perturbations with small p norms which is confirmed via experimental evaluations on state-of-the-art natural image classifiers. Moreover, for p=2, we theoretically show that our algorithm actually converges to the minimal perturbation when the curvature of the decision boundary is bounded. We also obtain the optimal distribution of the queries over the iterations of the algorithm. Finally, experimental results confirm that our principled black-box attack algorithm performs better than state-of-the-art algorithms as it generates smaller perturbations with a reduced number of queries.

2021-02-22

Li, M., Zhang, Y., Sun, Y., Wang, W., Tsang, I. W., Lin, X.. 2020. I/O Efficient Approximate Nearest Neighbour Search based on Learned Functions. 2020 IEEE 36th International Conference on Data Engineering (ICDE). :289–300.

Approximate nearest neighbour search (ANNS) in high dimensional space is a fundamental problem in many applications, such as multimedia database, computer vision and information retrieval. Among many solutions, data-sensitive hashing-based methods are effective to this problem, yet few of them are designed for external storage scenarios and hence do not optimized for I/O efficiency during the query processing. In this paper, we introduce a novel data-sensitive indexing and query processing framework for ANNS with an emphasis on optimizing the I/O efficiency, especially, the sequential I/Os. The proposed index consists of several lists of point IDs, ordered by values that are obtained by learned hashing (i.e., mapping) functions on each corresponding data point. The functions are learned from the data and approximately preserve the order in the high-dimensional space. We consider two instantiations of the functions (linear and non-linear), both learned from the data with novel objective functions. We also develop an I/O efficient ANNS framework based on the index. Comprehensive experiments on six benchmark datasets show that our proposed methods with learned index structure perform much better than the state-of-the-art external memory-based ANNS methods in terms of I/O efficiency and accuracy.

Bashyam, K. G. Renga, Vadhiyar, S.. 2020. Fast Scalable Approximate Nearest Neighbor Search for High-dimensional Data. 2020 IEEE International Conference on Cluster Computing (CLUSTER). :294–302.

K-Nearest Neighbor (k-NN) search is one of the most commonly used approaches for similarity search. It finds extensive applications in machine learning and data mining. This era of big data warrants efficiently scaling k-NN search algorithms for billion-scale datasets with high dimensionality. In this paper, we propose a solution towards this end where we use vantage point trees for partitioning the dataset across multiple processes and exploit an existing graph-based sequential approximate k-NN search algorithm called HNSW (Hierarchical Navigable Small World) for searching locally within a process. Our hybrid MPI-OpenMP solution employs techniques including exploiting MPI one-sided communication for reducing communication times and partition replication for better load balancing across processes. We demonstrate computation of k-NN for 10,000 queries in the order of seconds using our approach on 8000 cores on a dataset with billion points in an 128-dimensional space. We also show 10X speedup over a completely k-d tree-based solution for the same dataset, thus demonstrating better suitability of our solution for high dimensional datasets. Our solution shows almost linear strong scaling.

Fang, S., Kennedy, S., Wang, C., Wang, B., Pei, Q., Liu, X.. 2020. Sparser: Secure Nearest Neighbor Search with Space-filling Curves. IEEE INFOCOM 2020 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). :370–375.

Nearest neighbor search, a classic way of identifying similar data, can be applied to various areas, including database, machine learning, natural language processing, software engineering, etc. Secure nearest neighbor search aims to find nearest neighbors to a given query point over encrypted data without accessing data in plaintext. It provides privacy protection to datasets when nearest neighbor queries need to be operated by an untrusted party (e.g., a public server). While different solutions have been proposed to support nearest neighbor queries on encrypted data, these existing solutions still encounter critical drawbacks either in efficiency or privacy. In light of the limitations in the current literature, we propose a novel approximate nearest neighbor search solution, referred to as Sparser, by leveraging a combination of space-filling curves, perturbation, and Order-Preserving Encryption. The advantages of Sparser are twofold, strengthening privacy and improving efficiency. Specifically, Sparser pre-processes plaintext data with space-filling curves and perturbation, such that data is sparse, which mitigates leakage abuse attacks and renders stronger privacy. In addition to privacy enhancement, Sparser can efficiently find approximate nearest neighbors over encrypted data with logarithmic time. Through extensive experiments over real-world datasets, we demonstrate that Sparser can achieve strong privacy protection under leakage abuse attacks and minimize search time.

Lei, X., Tu, G.-H., Liu, A. X., Xie, T.. 2020. Fast and Secure kNN Query Processing in Cloud Computing. 2020 IEEE Conference on Communications and Network Security (CNS). :1–9.

Advances in sensing and tracking technology lead to the proliferation of location-based services. Location service providers (LSPs) often resort to commercial public clouds to store the tremendous geospatial data and process location-based queries from data users. To protect the privacy of LSP's geospatial data and data user's query location against the untrusted cloud, they are required to be encrypted before sending to the cloud. Nevertheless, it is not easy to design a fast and secure location-based query processing scheme over the encrypted data. In this paper, we propose a Fast and Secure kNN (FSkNN) scheme to support secure k nearest neighbor (k NN) search in cloud computing. We reveal the inherent connection between an Sk NN protocol and a secure range query protocol and further describe how to construct FSkNN based on a secure range query protocol. FSkNN leverages a customized accuracy-assured strategy to ensure the result accuracy and adopts a data structure named random Bloom filter (RBF) to build a secure index for efficiently searching. We formally prove the security of FSkNN under the random oracle model. Our evaluation results show that FSkNN is highly practical.

Kornaropoulos, E. M., Papamanthou, C., Tamassia, R.. 2020. The State of the Uniform: Attacks on Encrypted Databases Beyond the Uniform Query Distribution. 2020 IEEE Symposium on Security and Privacy (SP). :1223–1240.

Recent foundational work on leakage-abuse attacks on encrypted databases has broadened our understanding of what an adversary can accomplish with a standard leakage profile. Nevertheless, all known value reconstruction attacks succeed under strong assumptions that may not hold in the real world. The most prevalent assumption is that queries are issued uniformly at random by the client. We present the first value reconstruction attacks that succeed without any knowledge about the query or data distribution. Our approach uses the search-pattern leakage, which exists in all known structured encryption schemes but has not been fully exploited so far. At the core of our method lies a support size estimator, a technique that utilizes the repetition of search tokens with the same response to estimate distances between encrypted values without any assumptions about the underlying distribution. We develop distribution-agnostic reconstruction attacks for both range queries and k-nearest-neighbor (k-NN) queries based on information extracted from the search-pattern leakage. Our new range attack follows a different algorithmic approach than state-of-the-art attacks, which are fine-tuned to succeed under the uniformly distributed queries. Instead, we reconstruct plaintext values under a variety of skewed query distributions and even outperform the accuracy of previous approaches under the uniform query distribution. Our new k-NN attack succeeds with far fewer samples than previous attacks and scales to much larger values of k. We demonstrate the effectiveness of our attacks by experimentally testing them on a wide range of query distributions and database densities, both unknown to the adversary.

Alzahrani, A., Feki, J.. 2020. Toward a Natural Language-Based Approach for the Specification of Decisional-Users Requirements. 2020 3rd International Conference on Computer Applications Information Security (ICCAIS). :1–6.

The number of organizations adopting the Data Warehouse (DW) technology along with data analytics in order to improve the effectiveness of their decision-making processes is permanently increasing. Despite the efforts invested, the DW design remains a great challenge research domain. More accurately, the design quality of the DW depends on several aspects; among them, the requirement-gathering phase is a critical and complex task. In this context, we propose a Natural language (NL) NL-template based design approach, which is twofold; firstly, it facilitates the involvement of decision-makers in the early step of the DW design; indeed, using NL is a good and natural means to encourage the decision-makers to express their requirements as query-like English sentences. Secondly, our approach aims to generate a DW multidimensional schema from a set of gathered requirements (as OLAP: On-Line-Analytical-Processing queries, written according to the NL suggested templates). This approach articulates around: (i) two NL-templates for specifying multidimensional components, and (ii) a set of five heuristic rules for extracting the multidimensional concepts from requirements. Really, we are developing a software prototype that accepts the decision-makers' requirements then automatically identifies the multidimensional components of the DW model.

2021-01-28

Kariyappa, S., Qureshi, M. K.. 2020. Defending Against Model Stealing Attacks With Adaptive Misinformation. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :767—775.

Deep Neural Networks (DNNs) are susceptible to model stealing attacks, which allows a data-limited adversary with no knowledge of the training dataset to clone the functionality of a target model, just by using black-box query access. Such attacks are typically carried out by querying the target model using inputs that are synthetically generated or sampled from a surrogate dataset to construct a labeled dataset. The adversary can use this labeled dataset to train a clone model, which achieves a classification accuracy comparable to that of the target model. We propose "Adaptive Misinformation" to defend against such model stealing attacks. We identify that all existing model stealing attacks invariably query the target model with Out-Of-Distribution (OOD) inputs. By selectively sending incorrect predictions for OOD queries, our defense substantially degrades the accuracy of the attacker's clone model (by up to 40%), while minimally impacting the accuracy (\textbackslashtextless; 0.5%) for benign users. Compared to existing defenses, our defense has a significantly better security vs accuracy trade-off and incurs minimal computational overhead.

2021-01-25

Arthy, R., Daniel, E., Maran, T. G., Praveen, M.. 2020. A Hybrid Secure Keyword Search Scheme in Encrypted Graph for Social Media Database. 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC). :1000–1004.

Privacy preservation is a challenging task with the huge amount of data that are available in social media. The data those are stored in the distributed environment or in cloud environment need to ensure confidentiality to data. In addition, representing the voluminous data is graph will be convenient to perform keyword search. The proposed work initially reads the data corresponding to social media and converts that into a graph. In order to prevent the data from the active attacks Advanced Encryption Standard algorithm is used to perform graph encryption. Later, search operation is done using two algorithms: kNK keyword search algorithm and top k nearest keyword search algorithm. The first scheme is used to fetch all the data corresponding to the keyword. The second scheme is used to fetch the nearest neighbor. This scheme increases the efficiency of the search process. Here shortest path algorithm is used to find the minimum distance. Now, based on the minimum value the results are produced. The proposed algorithm shows high performance for graph generation and searching and moderate performance for graph encryption.

2021-01-18

Sun, J., Ma, J., Quan, J., Zhu, X., I, C.. 2019. A Fuzzy String Matching Scheme Resistant to Statistical Attack. 2019 International Conference on Networking and Network Applications (NaNA). :396–402.

The fuzzy query scheme based on vector index uses Bloom filter to construct vector index for key words. Then the statistical attack based on the deviation of frequency distribution of the vector index brings out the sensitive information disclosure. Using the noise vector, a fuzzy query scheme resistant to the statistical attack serving for encrypted database, i.e. S-BF, is introduced. With the noise vector to clear up the deviation of frequency distribution of vector index, the statistical attacks to the vector index are resolved. Demonstrated by lab experiment, S-BF scheme can achieve the secure fuzzy query with the powerful privation protection capability for encrypted cloud database without the loss of fuzzy query efficiency.

Yadav, M. K., Gugal, D., Matkar, S., Waghmare, S.. 2019. Encrypted Keyword Search in Cloud Computing using Fuzzy Logic. 2019 1st International Conference on Innovations in Information and Communication Technology (ICIICT). :1–4.

Research and Development, and information management professionals routinely employ simple keyword searches or more complex Boolean queries when using databases such as PubMed and Ovid and search engines like Google to find the information they need. While satisfying the basic needs of the researcher, basic search is limited which can adversely affect both precision and recall, decreasing productivity and damaging the researchers' ability to discover new insights. The cloud service providers who store user's data may access sensitive information without any proper authority. A basic approach to save the data confidentiality is to encrypt the data. Data encryption also demands the protection of keyword privacy since those usually contain very vital information related to the files. Encryption of keywords protects keyword safety. Fuzzy keyword search enhances system usability by matching the files perfectly or to the nearest possible files against the keywords entered by the user based on similar semantics. Encrypted keyword search in cloud using this logic provides the user, on entering keywords, to receive best possible files in a more secured manner, by protecting the user's documents.

2021-01-11

Huang, K., Yang, T.. 2020. Additive and Subtractive Cuckoo Filters. 2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS). :1–10.

Bloom filters (BFs) are fast and space-efficient data structures used for set membership queries in many applications. BFs are required to satisfy three key requirements: low space cost, high-speed lookups, and fast updates. Prior works do not satisfy these requirements at the same time. The standard BF does not support deletions of items and the variants that support deletions need additional space or performance overhead. The state-of-the-art cuckoo filters (CF) has high performance with seemingly low space cost. However, the CF suffers a critical issue of varying space cost per item. This is because the exclusive-OR (XOR) operation used by the CF requires the total number of buckets to be a power of two, leading to the space inflation. To address the issue, in this paper we propose a scalable variant of the cuckoo filter called additive and subtractive cuckoo filter (ASCF). We aim to improve the space efficiency while sustaining comparably high performance. The ASCF uses the addition and subtraction (ADD/SUB) operations instead of the XOR operation to compute an item's two candidate bucket indexes based on its fingerprint. Experimental results show that the ASCF achieves both low space cost and high performance. Compared to the CF, the ASCF reduces up to 1.9x space cost per item while maintaining the same lookup and update throughput. In addition, the ASCF outperforms other filters in both space cost and performance.

2020-12-11

Kumar, S., Vasthimal, D. K.. 2019. Raw Cardinality Information Discovery for Big Datasets. 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :200—205.

Real-time discovery of all different types of unique attributes within unstructured data is a challenging problem to solve when dealing with multiple petabytes of unstructured data volume everyday. Popular discovery solutions such as the creation of offline jobs to uniquely identify attributes or running aggregation queries on raw data sets limits real time discovery use-cases and often results into poor resource utilization. The discovery information must be treated as a parallel problem to just storing raw data sets efficiently onto back-end big data systems. Solving the discovery problem by creating a parallel discovery data store infrastructure has multiple benefits as it allows such to channel the actual search queries against the raw data set in much more funneled manner instead of being widespread across the entire data sets. Such focused search queries and data separation are far more performant and requires less compute and memory footprint.

Zhang, W., Byna, S., Niu, C., Chen, Y.. 2019. Exploring Metadata Search Essentials for Scientific Data Management. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). :83—92.

Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.

2020-11-16

Zhang, C., Xu, C., Xu, J., Tang, Y., Choi, B.. 2019. GEMˆ2-Tree: A Gas-Efficient Structure for Authenticated Range Queries in Blockchain. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :842–853.

Blockchain technology has attracted much attention due to the great success of the cryptocurrencies. Owing to its immutability property and consensus protocol, blockchain offers a new solution for trusted storage and computation services. To scale up the services, prior research has suggested a hybrid storage architecture, where only small meta-data are stored onchain and the raw data are outsourced to off-chain storage. To protect data integrity, a cryptographic proof can be constructed online for queries over the data stored in the system. However, the previous schemes only support simple key-value queries. In this paper, we take the first step toward studying authenticated range queries in the hybrid-storage blockchain. The key challenge lies in how to design an authenticated data structure (ADS) that can be efficiently maintained by the blockchain, in which a unique gas cost model is employed. By analyzing the performance of the existing techniques, we propose a novel ADS, called GEM2-tree, which is not only gas-efficient but also effective in supporting authenticated queries. To further reduce the ADS maintenance cost without sacrificing much the query performance, we also propose an optimized structure, GEM2*-tree, by designing a two-level index structure. Theoretical analysis and empirical evaluation validate the performance of the proposed ADSs.

Shen, N., Yeh, J., Chen, C., Chen, Y., Zhang, Y.. 2019. Ensuring Query Completeness in Outsourced Database Using Order-Preserving Encryption. 2019 IEEE Intl Conf on Parallel Distributed Processing with Applications, Big Data Cloud Computing, Sustainable Computing Communications, Social Computing Networking (ISPA/BDCloud/SocialCom/SustainCom). :776–783.

Nowadays database outsourcing has become business owners' preferred option and they are benefiting from its flexibility, reliability, and low cost. However, because database service providers cannot always be fully trusted and data owners will no longer have a direct control over their own data, how to make the outsourced data secure becomes a hot research topic. From the data integrity protection aspect, the client wants to make sure the data returned is correct, complete, and up-to-date. Previous research work in literature put more efforts on data correctness, while data completeness is still a challenging problem to solve. There are some existing works that tried to protect the completeness of data. Unfortunately, these solutions were considered not fully solving the problem because of their high communication or computation overhead. The implementations and limitations of existing works will be further discussed in this paper. From the data confidentiality protection aspect, order-preserving encryption (OPE) is a widely used encryption scheme in protecting data confidentiality. It allows the client to perform range queries and some other operations such as GROUP BY and ORDER BY over the OPE encrypted data. Therefore, it is worthy to develop a solution that allows user to verify the query completeness for an OPE encrypted database so that both data confidentiality and completeness are both protected. Inspired by this motivation, we propose a new data completeness protecting scheme by inserting fake tuples into databases. Both the real and fake tuples are OPE encrypted and thus the cloud server cannot distinguish among them. While our new scheme is much more efficient than all existing approaches, the level of security protection remains the same.

Roisum, H., Urizar, L., Yeh, J., Salisbury, K., Magette, M.. 2019. Completeness Integrity Protection for Outsourced Databases Using Semantic Fake Data. 2019 4th International Conference on Communication and Information Systems (ICCIS). :222–228.

As cloud storage and computing gains popularity, data entrusted to the cloud has the potential to be exposed to more people and thus more vulnerable to attacks. It is important to develop mechanisms to protect data privacy and integrity so that clients can safely outsource their data to the cloud. We present a method for ensuring data completeness which is one facet of the data integrity problem. Our approach converts a standard database to a Completeness Protected Database (CPDB) by inserting some semantic fake data before outsourcing it to the cloud. These fake data are initially produced using our generating function which uses Order Preserving Encryption, which allows the user to be able to regenerate these fake data and match them to fake data returned from a range query to check for completeness. The CPDB is innovative in the following ways: (1) fake data is deterministically generated but is semantically indistinguishable from other existing data; (2) since fake data is generated by deterministic functions, data owners do not need to locally store the fake data that have been inserted, instead they can re-generate fake data using the functions; (3) no costly data encryption/signature is used in our scheme compared to previous work which encrypt/sign the entire database.

2020-10-26

Eryonucu, Cihan, Ayday, Erman, Zeydan, Engin. 2018. A Demonstration of Privacy-Preserving Aggregate Queries for Optimal Location Selection. 2018 IEEE 19th International Symposium on "A World of Wireless, Mobile and Multimedia Networks" (WoWMoM). :1–3.

In recent years, service providers, such as mobile operators providing wireless services, collected location data in enormous extent with the increase of the usages of mobile phones. Vertical businesses, such as banks, may want to use this location information for their own scenarios. However, service providers cannot directly provide these private data to the vertical businesses because of the privacy and legal issues. In this demo, we show how privacy preserving solutions can be utilized using such location-based queries without revealing each organization's sensitive data. In our demonstration, we used partially homomorphic cryptosystem in our protocols and showed practicality and feasibility of our proposed solution.

2020-09-28

Chen, Lvhao, Liao, Xiaofeng, Mu, Nankun, Wu, Jiahui, Junqing, Junqing. 2019. Privacy-Preserving Fuzzy Multi-Keyword Search for Multiple Data Owners in Cloud Computing. 2019 IEEE Symposium Series on Computational Intelligence (SSCI). :2166–2171.

With cloud computing's development, more users are decide to store information on the cloud server. Owing to the cloud server's insecurity, many documents should be encrypted to avoid information leakage before being sent to the cloud. Nevertheless, it leads to the problem that plaintext search techniques can not be directly applied to the ciphertext search. In this case, many searchable encryption schemes based on single data owner model have been proposed. But, the actual situation is that users want to do research with encrypted documents originating from various data owners. This paper puts forward a privacy-preserving scheme that is based on fuzzy multi-keyword search (PPFMKS) for multiple data owners. For the sake of espousing fuzzy multi-keyword and accurate search, secure indexes on the basis of Locality-Sensitive Hashing (LSH) and Bloom Filter (BF)are established. To guarantee the search privacy under multiple data owners model, a new encryption method allowing that different data owners have diverse keys to encrypt files is proposed. This method also solves the high cost caused by inconvenience of key management.

Liu, Qin, Pei, Shuyu, Xie, Kang, Wu, Jie, Peng, Tao, Wang, Guojun. 2018. Achieving Secure and Effective Search Services in Cloud Computing. 2018 17th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/ 12th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE). :1386–1391.

One critical challenge of today's cloud services is how to provide an effective search service while preserving user privacy. In this paper, we propose a wildcard-based multi-keyword fuzzy search (WMFS) scheme over the encrypted data, which tolerates keyword misspellings by exploiting the indecomposable property of primes. Compared with existing secure fuzzy search schemes, our WMFS scheme has the following merits: 1) Efficiency. It eliminates the requirement of a predefined dictionary and thus supports updates efficiently. 2) High accuracy. It eliminates the false positive and false negative introduced by specific data structures and thus allows the user to retrieve files as accurate as possible. 3) Flexibility. It gives the user great flexibility to specify different search patterns including keyword and substring matching. Extensive experiments on a real data set demonstrate the effectiveness and efficiency of our scheme.

Gao, Meng-Qi, Han, Jian-Min, Lu, Jian-Feng, Peng, Hao, Hu, Zhao-Long. 2018. Incentive Mechanism for User Collaboration on Trajectory Privacy Preservation. 2018 IEEE SmartWorld, Ubiquitous Intelligence Computing, Advanced Trusted Computing, Scalable Computing Communications, Cloud Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). :1976–1981.

Collaborative trajectory privacy preservation (CTPP) scheme is an effective method for continuous queries. However, collaborating with other users need pay some cost. Therefore, some rational and selfish users will not choose collaboration, which will result in users' privacy disclosing. To solve the problem, this paper proposes a collaboration incentive mechanism by rewarding collaborative users and punishing non-collaborative users. The paper models the interactions of users participating in CTPP as a repeated game and analysis the utility of participated users. The analytical results show that CTPP with the proposed incentive mechanism can maximize user's payoffs. Experiments show that the proposed mechanism can effectively encourage users' collaboration behavior and effectively preserve the trajectory privacy for continuous query users.

2020-09-04

Song, Chengru, Xu, Changqiao, Yang, Shujie, Zhou, Zan, Gong, Changhui. 2019. A Black-Box Approach to Generate Adversarial Examples Against Deep Neural Networks for High Dimensional Input. 2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC). :473—479.

Generating adversarial samples is gathering much attention as an intuitive approach to evaluate the robustness of learning models. Extensive recent works have demonstrated that numerous advanced image classifiers are defenseless to adversarial perturbations in the white-box setting. However, the white-box setting assumes attackers to have prior knowledge of model parameters, which are generally inaccessible in real world cases. In this paper, we concentrate on the hard-label black-box setting where attackers can only pose queries to probe the model parameters responsible for classifying different images. Therefore, the issue is converted into minimizing non-continuous function. A black-box approach is proposed to address both massive queries and the non-continuous step function problem by applying a combination of a linear fine-grained search, Fibonacci search, and a zeroth order optimization algorithm. However, the input dimension of a image is so high that the estimation of gradient is noisy. Hence, we adopt a zeroth-order optimization method in high dimensions. The approach converts calculation of gradient into a linear regression model and extracts dimensions that are more significant. Experimental results illustrate that our approach can relatively reduce the amount of queries and effectively accelerate convergence of the optimization method.