Biblio
Shortest path queries on road networks are widely used in location-based services (LBS), e.g., finding the shortest route from my home to the airport through Google Maps. However, when there are a large number of path queries arrived concurrently or in a short while, an LBS provider (e.g., Google Maps) has to endure a high workload and then may lead to a long response time to users. Therefore, path caching services are utilized to accelerate large-scale path query processing, which try to store the historical path results and reuse them to answer the coming queries directly. However, most of existing path caches are organized based on nodes of paths; hence, the underlying road network topology is still needed to answer a path query when its querying origin or destination lies on edges. To overcome this limitation, we propose an edge-based shortest path cache in this paper that can efficiently handle queries without needing any road information, which is much more practical in the real world. We achieve this by designing a totally new edge-based path cache structure, an efficient R-tree-based cache lookup algorithm, and a greedy-based cache construction algorithm. Extensive experiments on a real road network and real point-of-interest datasets are conducted, and the results show the efficiency, scalability, and applicability of our proposed caching techniques.
Stealing confidential information from a database has become a severe vulnerability issue for web applications. The attacks can be prevented by defining a whitelist of SQL queries issued by web applications and detecting queries not in list. For large-scale web applications, automated generation of the whitelist is conducted because manually defining numerous query patterns is impractical for developers. Conventional methods for automated generation are unable to detect attacks immediately because of the long time required for collecting legitimate queries. Moreover, they require application-specific implementations that reduce the versatility of the methods. As described herein, we propose a method to generate a whitelist automatically using queries issued during web application tests. Our proposed method uses the queries generated during application tests. It is independent of specific applications, which yields improved timeliness against attacks and versatility for multiple applications.
Searchable Encryption (SE) schemes provide security and privacy to the cloud data. The existing SE approaches enable multiple users to perform search operation by using various schemes like Broadcast Encryption (BE), Attribute-Based Encryption (ABE), etc. However, these schemes do not allow multiple users to perform the search operation over the encrypted data of multiple owners. Some SE schemes involve a Proxy Server (PS) that allow multiple users to perform the search operation. However, these approaches incur huge computational burden on PS due to the repeated encryption of the user queries for transformation purpose so as to ensure that users' query is searchable over the encrypted data of multiple owners. Hence, to eliminate this computational burden on PS, this paper proposes a secure proxy server approach that performs the search operation without transforming the user queries. This approach also returns the top-k relevant documents to the user queries by using Euclidean distance similarity approach. Based on the experimental study, this approach is efficient with respect to search time and accuracy.
Data outsourcing to cloud has been a common IT practice nowadays due to its significant benefits. Meanwhile, security and privacy concerns are critical obstacles to hinder the further adoption of cloud. Although data encryption can mitigate the problem, it reduces the functionality of query processing, e.g., disabling SQL queries. Several schemes have been proposed to enable one-dimensional query on encrypted data, but multi-dimensional range query has not been well addressed. In this paper, we propose a secure and scalable scheme that can support multi-dimensional range queries over encrypted data. The proposed scheme has three salient features: (1) Privacy: the server cannot learn the contents of queries and data records during query processing. (2) Efficiency: we utilize hierarchical cubes to encode multi-dimensional data records and construct a secure tree index on top of such encoding to achieve sublinear query time. (3) Verifiability: our scheme allows users to verify the correctness and completeness of the query results to address server's malicious behaviors. We perform formal security analysis and comprehensive experimental evaluations. The results on real datasets demonstrate that our scheme achieves practical performance while guaranteeing data privacy and result integrity.
In recent years, real-world attacks against PKI take place frequently. For example, malicious domains' certificates issued by compromised CAs are widespread, and revoked certificates are still trusted by clients. In spite of a lot of research to improve the security of SSL/TLS connections, there are still some problems unsolved. On one hand, although log-based schemes provided certificate audit service to quickly detect CAs' misbehavior, the security and data consistency of log servers are ignored. On the other hand, revoked certificates checking is neglected due to the incomplete, insecure and inefficient certificate revocation mechanisms. Further, existing revoked certificates checking schemes are centralized which would bring safety bottlenecks. In this paper, we propose a blockchain-based public and efficient audit scheme for TLS connections, which is called Certchain. Specially, we propose a dependability-rank based consensus protocol in our blockchain system and a new data structure to support certificate forward traceability. Furthermore, we present a method that utilizes dual counting bloom filter (DCBF) with eliminating false positives to achieve economic space and efficient query for certificate revocation checking. The security analysis and experimental results demonstrate that CertChain is suitable in practice with moderate overhead.
Query authentication has been extensively studied to ensure the integrity of query results for outsourced databases, which are often not fully trusted. However, access control, another important security concern, is largely ignored by existing works. Notably, recent breakthroughs in cryptography have enabled fine-grained access control over outsourced data. In this paper, we take the first step toward studying the problem of authenticating relational queries with fine-grained access control. The key challenge is how to protect information confidentiality during query authentication, which is essential to many critical applications. To address this challenge, we propose a novel access-policy-preserving (APP) signature as the primitive authenticated data structure. A useful property of the APP signature is that it can be used to derive customized signatures for unauthorized users to prove the inaccessibility while achieving the zero-knowledge confidentiality. We also propose a grid-index-based tree structure that can aggregate APP signatures for efficient range and join query authentication. In addition to this, a number of optimization techniques are proposed to further improve the authentication performance. Security analysis and performance evaluation show that the proposed solutions and techniques are robust and efficient under various system settings.
SQL injection is well known a method of executing SQL queries and retrieving sensitive information from a website connected database. This process poses a threat to those applications which are poorly coded in the today's world. SQL is considered as one of the top 10 vulnerabilities even in 2018. To keep a track of the vulnerabilities that each of the websites are facing, we employ a tool called Acunetix which allows us to find the vulnerabilities of a specific website. This tool also suggests measures on how to ensure preventive measures. Using this implementation, we discover vulnerabilities in an actual website. Such a real-world implementation would be useful for instructional use in a foundational cybersecurity course.
Internet users are increasing day by day. The web services and mobile web applications or desktop web application's demands are also increasing. The chances of a system being hacked are also increasing. All web applications maintain data at the backend database from which results are retrieved. As web applications can be accessed from anywhere all around the world which must be available to all the users of the web application. SQL injection attack is nowadays one of the topmost threats for security of web applications. By using SQL injection attackers can steal confidential information. In this paper, the SQL injection attack detection method by removing the parameter values of the SQL query is discussed and results are presented.
Motivated by the study of matrix elimination orderings in combinatorial scientific computing, we utilize graph sketching and local sampling to give a data structure that provides access to approximate fill degrees of a matrix undergoing elimination in polylogarithmic time per elimination and query. We then study the problem of using this data structure in the minimum degree algorithm, which is a widely-used heuristic for producing elimination orderings for sparse matrices by repeatedly eliminating the vertex with (approximate) minimum fill degree. This leads to a nearly-linear time algorithm for generating approximate greedy minimum degree orderings. Despite extensive studies of algorithms for elimination orderings in combinatorial scientific computing, our result is the first rigorous incorporation of randomized tools in this setting, as well as the first nearly-linear time algorithm for producing elimination orderings with provable approximation guarantees. While our sketching data structure readily works in the oblivious adversary model, by repeatedly querying and greedily updating itself, it enters the adaptive adversarial model where the underlying sketches become prone to failure due to dependency issues with their internal randomness. We show how to use an additional sampling procedure to circumvent this problem and to create an independent access sequence. Our technique for decorrelating interleaved queries and updates to this randomized data structure may be of independent interest.
Audit logs are widely used in information systems nowadays. In cloud computing and cloud storage environment, audit logs are required to be encrypted and outsourced on remote servers to protect the confidentiality of data and the privacy of users. The searchable encrypted audit logs support a search on the encrypted audit logs. In this paper, we propose a privacy-preserving and unforgeable searchable encrypted audit log scheme based on PEKS. Only the trusted data owner can generate encrypted audit logs containing access permissions for users. The semi-honest server verifies the audit logs in a searchable encryption way before granting the operation rights to users and storing the audit logs. The data owner can perform a fine-grained conjunctive query on the stored audit logs, and accept only the valid audit logs. The scheme is immune to the collusion tamper or fabrication conducted by server and user. Concrete implementations of the scheme is put forward in detail. The correct of the scheme is proved, and the security properties, such as privacy-preserving, searchability, verifiability and unforgeability are analyzed. Further evaluation of computation load shows that the design is of considerable efficiency.
As more and more technologies to store and analyze massive amount of data become available, it is extremely important to make privacy-sensitive data de-identified so that further analysis can be conducted by different parties. For example, data needs to go through data de-identification process before being transferred to institutes for further value added analysis. As such, privacy protection issues associated with the release of data and data mining have become a popular field of study in the domain of big data. As a strict and verifiable definition of privacy, differential privacy has attracted noteworthy attention and widespread research in recent years. Nevertheless, differential privacy is not practical for most applications due to its performance of synthetic dataset generation for data query. Moreover, the definition of data protection by randomized noise in native differential privacy is abstract to users. Therefore, we design a pragmatic DP-based data de-identification protection and risk of data disclosure estimation system, in which a DP-based noise addition mechanism is applied to generate synthetic datasets. Furthermore, the risk of data disclosure to these synthetic datasets can be evaluated before releasing to buyers/consumers.
Differential privacy is an approach that preserves patient privacy while permitting researchers access to medical data. This paper presents mechanisms proposed to satisfy differential privacy while answering a given workload of range queries. Representing input data as a vector of counts, these methods partition the vector according to relationships between the data and the ranges of the given queries. After partitioning the vector into buckets, the counts of each bucket are estimated privately and split among the bucket's positions to answer the given query set. The performance of the proposed method was evaluated using different workloads over several attributes. The results show that partitioning the vector based on the data can produce more accurate answers, while partitioning the vector based on the given workload improves privacy. This paper's two main contributions are: (1) improving earlier work on partitioning mechanisms by building a greedy algorithm to partition the counts' vector efficiently, and (2) its adaptive algorithm considers the sensitivity of the given queries before providing results.
The rapid development of Internet has resulted in massive information overloading recently. These information is usually represented by high-dimensional feature vectors in many related applications such as recognition, classification and retrieval. These applications usually need efficient indexing and search methods for such large-scale and high-dimensional database, which typically is a challenging task. Some efforts have been made and solved this problem to some extent. However, most of them are implemented in a single machine, which is not suitable to handle large-scale database.In this paper, we present a novel data index structure and nearest neighbor search algorithm implemented on Apache Spark. We impose a grid on the database and index data by non-empty grid cells. This grid-based index structure is simple and easy to be implemented in parallel. Moreover, we propose to build a scalable KNN graph on the grids, which increase the efficiency of this index structure by a low cost in parallel implementation. Finally, experiments are conducted in both public databases and synthetic databases, showing that the proposed methods achieve overall high performance in both efficiency and accuracy.
The growing volume of data and its increasing complexity require even more efficient and faster information retrieval techniques. Approximate nearest neighbor search algorithms based on hashing were proposed to query high-dimensional datasets due to its high retrieval speed and low storage cost. Recent studies promote the use of Convolutional Neural Network (CNN) with hashing techniques to improve the search accuracy. However, there are challenges to solve in order to find a practical and efficient solution to index CNN features, such as the need for a heavy training process to achieve accurate query results and the critical dependency on data-parameters. In this work we execute exhaustive experiments in order to compare recent methods that are able to produces a better representation of the data space with a less computational cost for a better accuracy by computing the best data-parameter values for optimal sub-space projection exploring the correlations among CNN feature attributes using fractal theory. We give an overview of these different techniques and present our comparative experiments for data representation and retrieval performance.
With the advancement of sensor electronic devices, wireless sensor networks have attracted more and more attention. Range query has become a significant part of sensor networks due to its availability and convenience. However, It is challenging to process range query while still protecting sensitive data from disclosure. Existing work mainly focuses on privacy- preserving range query, but neglects the damage of collusion attacks, probability attacks and differential attacks. In this paper, we propose a privacy- preserving, energy-efficient and multi-dimensional range query protocol called PERQ, which not only achieves data privacy, but also considers collusion attacks, probability attacks and differential attacks. Generalized distance-based and modular arithmetic range query mechanism are used. In addition, a novel cyclic modular verification scheme is proposed to verify the data integrity. Extensive theoretical analysis and experimental results confirm the high performance of PERQ in terms of energy efficiency, security and accountability requirements.
The mitigation of insider threats against databases is a challenging problem as insiders often have legitimate access privileges to sensitive data. Therefore, conventional security mechanisms, such as authentication and access control, may be insufficient for the protection of databases against insider threats and need to be complemented with techniques that support real-time detection of access anomalies. The existing real-time anomaly detection techniques consider anomalies in references to the database entities and the amounts of accessed data. However, they are unable to track the access frequencies. According to recent security reports, an increase in the access frequency by an insider is an indicator of a potential data misuse and may be the result of malicious intents for stealing or corrupting the data. In this paper, we propose techniques for tracking users' access frequencies and detecting anomalous related activities in real-time. We present detailed algorithms for constructing accurate profiles that describe the access patterns of the database users and for matching subsequent accesses by these users to the profiles. Our methods report and log mismatches as anomalies that may need further investigation. We evaluated our techniques on the OLTP-Benchmark. The results of the evaluation indicate that our techniques are very effective in the detection of anomalies.