Visible to the public Biblio

Filters: Keyword is unstructured data  [Clear All Filters]
2020-12-28
Riaz, S., Khan, A. H., Haroon, M., Latif, S., Bhatti, S..  2020.  Big Data Security and Privacy: Current Challenges and Future Research perspective in Cloud Environment. 2020 International Conference on Information Management and Technology (ICIMTech). :977—982.

Cloud computing is an Internet-based technology that emerging rapidly in the last few years due to popular and demanded services required by various institutions, organizations, and individuals. structured, unstructured, semistructured data is transfer at a record pace on to the cloud server. These institutions, businesses, and organizations are shifting more and more increasing workloads on cloud server, due to high cost, space and maintenance issues from big data, cloud computing will become a potential choice for the storage of data. In Cloud Environment, It is obvious that data is not secure completely yet from inside and outside attacks and intrusions because cloud servers are under the control of a third party. The Security of data becomes an important aspect due to the storage of sensitive data in a cloud environment. In this paper, we give an overview of characteristics and state of art of big data and data security & privacy top threats, open issues and current challenges and their impact on business are discussed for future research perspective and review & analysis of previous and recent frameworks and architectures for data security that are continuously established against threats to enhance how to keep and store data in the cloud environment.

2020-08-13
Wang, Tianyi, Chow, Kam Pui.  2019.  Automatic Tagging of Cyber Threat Intelligence Unstructured Data using Semantics Extraction. 2019 IEEE International Conference on Intelligence and Security Informatics (ISI). :197—199.
Threat intelligence, information about potential or current attacks to an organization, is an important component in cyber security territory. As new threats consecutively occurring, cyber security professionals always keep an eye on the latest threat intelligence in order to continuously lower the security risks for their organizations. Cyber threat intelligence is usually conveyed by structured data like CVE entities and unstructured data like articles and reports. Structured data are always under certain patterns that can be easily analyzed, while unstructured data have more difficulties to find fixed patterns to analyze. There exists plenty of methods and algorithms on information extraction from structured data, but no current work is complete or suitable for semantics extraction upon unstructured cyber threat intelligence data. In this paper, we introduce an idea of automatic tagging applying JAPE feature within GATE framework to perform semantics extraction upon cyber threat intelligence unstructured data such as articles and reports. We extract token entities from each cyber threat intelligence article or report and evaluate the usefulness of them. A threat intelligence ontology then can be constructed with the useful entities extracted from related resources and provide convenience for professionals to find latest useful threat intelligence they need.
2015-05-05
Baughman, A.K., Chuang, W., Dixon, K.R., Benz, Z., Basilico, J..  2014.  DeepQA Jeopardy! Gamification: A Machine-Learning Perspective. Computational Intelligence and AI in Games, IEEE Transactions on. 6:55-66.

DeepQA is a large-scale natural language processing (NLP) question-and-answer system that responds across a breadth of structured and unstructured data, from hundreds of analytics that are combined with over 50 models, trained through machine learning. After the 2011 historic milestone of defeating the two best human players in the Jeopardy! game show, the technology behind IBM Watson, DeepQA, is undergoing gamification into real-world business problems. Gamifying a business domain for Watson is a composite of functional, content, and training adaptation for nongame play. During domain gamification for medical, financial, government, or any other business, each system change affects the machine-learning process. As opposed to the original Watson Jeopardy!, whose class distribution of positive-to-negative labels is 1:100, in adaptation the computed training instances, question-and-answer pairs transformed into true-false labels, result in a very low positive-to-negative ratio of 1:100 000. Such initial extreme class imbalance during domain gamification poses a big challenge for the Watson machine-learning pipelines. The combination of ingested corpus sets, question-and-answer pairs, configuration settings, and NLP algorithms contribute toward the challenging data state. We propose several data engineering techniques, such as answer key vetting and expansion, source ingestion, oversampling classes, and question set modifications to increase the computed true labels. In addition, algorithm engineering, such as an implementation of the Newton-Raphson logistic regression with a regularization term, relaxes the constraints of class imbalance during training adaptation. We conclude by empirically demonstrating that data and algorithm engineering are complementary and indispensable to overcome the challenges in this first Watson gamification for real-world business problems.