Biblio
One of the challenges in supplying the communities with wider access to scientific databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it might not be practical to make the public aware of the structure of databases. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide a more intuitive method for generating database queries and delivering responses. Through social media, which makes it possible to interact with a wide section of the population, and with the help of natural language processing, researchers at the Atmospheric Radiation Measurement (ARM) Data Center at Oak Ridge National Laboratory (ORNL) have developed a concept to enable easy search and retrieval of data from several environmental data centers for the scientific community through social media.Using a machine learning framework that maps natural language text to thousands of datasets, instruments, variables, and data streams, the prototype system would allow users to request data through Twitter and receive a link (via tweet) to applicable data results on the project's search catalog tailored to their key words. This automated identification of relevant data from various petascale archives at ORNL could increase convenience, access, and use of the project's data by the broader community. In this paper we discuss how some data-intensive projects at ORNL are using innovative ways to help in data discovery.
Stealing confidential information from a database has become a severe vulnerability issue for web applications. The attacks can be prevented by defining a whitelist of SQL queries issued by web applications and detecting queries not in list. For large-scale web applications, automated generation of the whitelist is conducted because manually defining numerous query patterns is impractical for developers. Conventional methods for automated generation are unable to detect attacks immediately because of the long time required for collecting legitimate queries. Moreover, they require application-specific implementations that reduce the versatility of the methods. As described herein, we propose a method to generate a whitelist automatically using queries issued during web application tests. Our proposed method uses the queries generated during application tests. It is independent of specific applications, which yields improved timeliness against attacks and versatility for multiple applications.
Data outsourcing to cloud has been a common IT practice nowadays due to its significant benefits. Meanwhile, security and privacy concerns are critical obstacles to hinder the further adoption of cloud. Although data encryption can mitigate the problem, it reduces the functionality of query processing, e.g., disabling SQL queries. Several schemes have been proposed to enable one-dimensional query on encrypted data, but multi-dimensional range query has not been well addressed. In this paper, we propose a secure and scalable scheme that can support multi-dimensional range queries over encrypted data. The proposed scheme has three salient features: (1) Privacy: the server cannot learn the contents of queries and data records during query processing. (2) Efficiency: we utilize hierarchical cubes to encode multi-dimensional data records and construct a secure tree index on top of such encoding to achieve sublinear query time. (3) Verifiability: our scheme allows users to verify the correctness and completeness of the query results to address server's malicious behaviors. We perform formal security analysis and comprehensive experimental evaluations. The results on real datasets demonstrate that our scheme achieves practical performance while guaranteeing data privacy and result integrity.
Web-Based applications are becoming more increasingly technically complex and sophisticated. The very nature of their feature-rich design and their capability to collate, process, and disseminate information over the Internet or from within an intranet makes them a popular target for attack. According to Open Web Application Security Project (OWASP) Top Ten Cheat sheet-2017, SQL Injection Attack is at peak among online attacks. This can be attributed primarily to lack of awareness on software security. Developing effective SQL injection detection approaches has been a challenge in spite of extensive research in this area. In this paper, we propose a signature based SQL injection attack detection framework by integrating fingerprinting method and Pattern Matching to distinguish genuine SQL queries from malicious queries. Our framework monitors SQL queries to the database and compares them against a dataset of signatures from known SQL injection attacks. If the fingerprint method cannot determine the legitimacy of query alone, then the Aho Corasick algorithm is invoked to ascertain whether attack signatures appear in the queries. The initial experimental results of our framework indicate the approach can identify wide variety of SQL injection attacks with negligible impact on performance.
Aiming at the problem of internal attackers of database system, anomaly detection method of user behaviour is used to detect the internal attackers of database system. With using Discrete-time Markov Chains (DTMC), an anomaly detection system of user behavior is proposed, which can detect the internal threats of database system. First, we make an analysis on SQL queries, which are user behavior features. Then, we use DTMC model extract behavior features of a normal user and the detected user and make a comparison between them. If the deviation of features is beyond threshold, the detected user behavior is judged as an anomaly behavior. The experiments are used to test the feasibility of the detction system. The experimental results show that this detction system can detect normal and abnormal user behavior precisely and effectively.