Biblio | CPS-VO

Devarakonda, Ranjeet, Giansiracusa, Michael, Kumar, Jitendra. 2018. Machine Learning and Social Media to Mine and Disseminate Big Scientific Data. 2018 IEEE International Conference on Big Data (Big Data). :5312—5315.

One of the challenges in supplying the communities with wider access to scientific databases is the need for knowledge of database languages like Structured Query Language (SQL). Although the SQL language has been published in many forms, not everybody is able to write SQL queries. Another challenge is that it might not be practical to make the public aware of the structure of databases. There is a need for novice users to query relational databases using their natural language. To solve this problem, many natural language interfaces to structured databases have been developed. The goal is to provide a more intuitive method for generating database queries and delivering responses. Through social media, which makes it possible to interact with a wide section of the population, and with the help of natural language processing, researchers at the Atmospheric Radiation Measurement (ARM) Data Center at Oak Ridge National Laboratory (ORNL) have developed a concept to enable easy search and retrieval of data from several environmental data centers for the scientific community through social media.Using a machine learning framework that maps natural language text to thousands of datasets, instruments, variables, and data streams, the prototype system would allow users to request data through Twitter and receive a link (via tweet) to applicable data results on the project's search catalog tailored to their key words. This automated identification of relevant data from various petascale archives at ORNL could increase convenience, access, and use of the project's data by the broader community. In this paper we discuss how some data-intensive projects at ORNL are using innovative ways to help in data discovery.

Kim, J., Moon, I., Lee, K., Suh, S. C., Kim, I.. 2015. Scalable Security Event Aggregation for Situation Analysis. 2015 IEEE First International Conference on Big Data Computing Service and Applications. :14–23.

Cyber-attacks have been evolved in a way to be more sophisticated by employing combinations of attack methodologies with greater impacts. For instance, Advanced Persistent Threats (APTs) employ a set of stealthy hacking processes running over a long period of time, making it much hard to detect. With this trend, the importance of big-data security analytics has taken greater attention since identifying such latest attacks requires large-scale data processing and analysis. In this paper, we present SEAS-MR (Security Event Aggregation System over MapReduce) that facilitates scalable security event aggregation for comprehensive situation analysis. The introduced system provides the following three core functions: (i) periodic aggregation, (ii) on-demand aggregation, and (iii) query support for effective analysis. We describe our design and implementation of the system over MapReduce and high-level query languages, and report our experimental results collected through extensive settings on a Hadoop cluster for performance evaluation and design impacts.

J. Kim, I. Moon, K. Lee, S. C. Suh, I. Kim. 2015. "Scalable Security Event Aggregation for Situation Analysis". 2015 IEEE First International Conference on Big Data Computing Service and Applications. :14-23.

Cyber-attacks have been evolved in a way to be more sophisticated by employing combinations of attack methodologies with greater impacts. For instance, Advanced Persistent Threats (APTs) employ a set of stealthy hacking processes running over a long period of time, making it much hard to detect. With this trend, the importance of big-data security analytics has taken greater attention since identifying such latest attacks requires large-scale data processing and analysis. In this paper, we present SEAS-MR (Security Event Aggregation System over MapReduce) that facilitates scalable security event aggregation for comprehensive situation analysis. The introduced system provides the following three core functions: (i) periodic aggregation, (ii) on-demand aggregation, and (iii) query support for effective analysis. We describe our design and implementation of the system over MapReduce and high-level query languages, and report our experimental results collected through extensive settings on a Hadoop cluster for performance evaluation and design impacts.