Biblio

Found 350 results

Filters: Keyword is data mining  [Clear All Filters]
2017-03-07
Isah, H., Neagu, D., Trundle, P..  2015.  Bipartite network model for inferring hidden ties in crime data. 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :994–1001.

Certain crimes are difficult to be committed by individuals but carefully organised by group of associates and affiliates loosely connected to each other with a single or small group of individuals coordinating the overall actions. A common starting point in understanding the structural organisation of criminal groups is to identify the criminals and their associates. Situations arise in many criminal datasets where there is no direct connection among the criminals. In this paper, we investigate ties and community structure in crime data in order to understand the operations of both traditional and cyber criminals, as well as to predict the existence of organised criminal networks. Our contributions are twofold: we propose a bipartite network model for inferring hidden ties between actors who initiated an illegal interaction and objects affected by the interaction, we then validate the method in two case studies on pharmaceutical crime and underground forum data using standard network algorithms for structural and community analysis. The vertex level metrics and community analysis results obtained indicate the significance of our work in understanding the operations and structure of organised criminal networks which were not immediately obvious in the data. Identifying these groups and mapping their relationship to one another is essential in making more effective disruption strategies in the future.

2017-02-14
D. L. Schales, X. Hu, J. Jang, R. Sailer, M. P. Stoecklin, T. Wang.  2015.  "FCCE: Highly scalable distributed Feature Collection and Correlation Engine for low latency big data analytics". 2015 IEEE 31st International Conference on Data Engineering. :1316-1327.

In this paper, we present the design, architecture, and implementation of a novel analysis engine, called Feature Collection and Correlation Engine (FCCE), that finds correlations across a diverse set of data types spanning over large time windows with very small latency and with minimal access to raw data. FCCE scales well to collecting, extracting, and querying features from geographically distributed large data sets. FCCE has been deployed in a large production network with over 450,000 workstations for 3 years, ingesting more than 2 billion events per day and providing low latency query responses for various analytics. We explore two security analytics use cases to demonstrate how we utilize the deployment of FCCE on large diverse data sets in the cyber security domain: 1) detecting fluxing domain names of potential botnet activity and identifying all the devices in the production network querying these names, and 2) detecting advanced persistent threat infection. Both evaluation results and our experience with real-world applications show that FCCE yields superior performance over existing approaches, and excels in the challenging cyber security domain by correlating multiple features and deriving security intelligence.

2017-02-23
A. Soliman, L. Bahri, B. Carminati, E. Ferrari, S. Girdzijauskas.  2015.  "DIVa: Decentralized identity validation for social networks". 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :383-391.

Online Social Networks exploit a lightweight process to identify their users so as to facilitate their fast adoption. However, such convenience comes at the price of making legitimate users subject to different threats created by fake accounts. Therefore, there is a crucial need to empower users with tools helping them in assigning a level of trust to whomever they interact with. To cope with this issue, in this paper we introduce a novel model, DIVa, that leverages on mining techniques to find correlations among user profile attributes. These correlations are discovered not from user population as a whole, but from individual communities, where the correlations are more pronounced. DIVa exploits a decentralized learning approach and ensures privacy preservation as each node in the OSN independently processes its local data and is required to know only its direct neighbors. Extensive experiments using real-world OSN datasets show that DIVa is able to extract fine-grained community-aware correlations among profile attributes with average improvements up to 50% than the global approach.

2017-03-07
Macdonald, M., Frank, R., Mei, J., Monk, B..  2015.  Identifying digital threats in a hacker web forum. 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). :926–933.

Information threatening the security of critical infrastructures are exchanged over the Internet through communication platforms, such as online discussion forums. This information can be used by malicious hackers to attack critical computer networks and data systems. Much of the literature on the hacking of critical infrastructure has focused on developing typologies of cyber-attacks, but has not examined the communication activities of the actors involved. To address this gap in the literature, the language of hackers was analyzed to identify potential threats against critical infrastructures using automated analysis tools. First, discussion posts were collected from a selected hacker forum using a customized web-crawler. Posts were analyzed using a parts of speech tagger, which helped determine a list of keywords used to query the data. Next, a sentiment analysis tool scored these keywords, which were then analyzed to determine the effectiveness of this method.

2017-02-27
Li-xiong, Z., Xiao-lin, X., Jia, L., Lu, Z., Xuan-chen, P., Zhi-yuan, M., Li-hong, Z..  2015.  Malicious URL prediction based on community detection. 2015 International Conference on Cyber Security of Smart Cities, Industrial Control System and Communications (SSIC). :1–7.

Traditional Anti-virus technology is primarily based on static analysis and dynamic monitoring. However, both technologies are heavily depended on application files, which increase the risk of being attacked, wasting of time and network bandwidth. In this study, we propose a new graph-based method, through which we can preliminary detect malicious URL without application file. First, the relationship between URLs can be found through the relationship between people and URLs. Then the association rules can be mined with confidence of each frequent URLs. Secondly, the networks of URLs was built through the association rules. When the networks of URLs were finished, we clustered the date with modularity to detect communities and every community represents different types of URLs. We suppose that a URL has association with one community, then the URL is malicious probably. In our experiments, we successfully captured 82 % of malicious samples, getting a higher capture than using traditional methods.

2017-03-07
Kumar, B., Kumar, P., Mundra, A., Kabra, S..  2015.  DC scanner: Detecting phishing attack. 2015 Third International Conference on Image Information Processing (ICIIP). :271–276.

Data mining has been used as a technology in various applications of engineering, sciences and others to analysis data of systems and to solve problems. Its applications further extend towards detecting cyber-attacks. We are presenting our work with simple and less efforts similar to data mining which detects email based phishing attacks. This work digs html contents of emails and web pages referred. Also domains and domain related authority details of these links, script codes associated to web pages are analyzed to conclude for the probability of phishing attacks.

Poornachandran, P., Sreeram, R., Krishnan, M. R., Pal, S., Sankar, A. U. P., Ashok, A..  2015.  Internet of Vulnerable Things (IoVT): Detecting Vulnerable SOHO Routers. 2015 International Conference on Information Technology (ICIT). :119–123.

There has been a rampant surge in compromise of consumer grade small scale routers in the last couple of years. Attackers are able to manipulate the Domain Name Space (DNS) settings of these devices hence making them capable of initiating different man-in-the-middle attacks. By this study we aim to explore and comprehend the current state of these attacks. Focusing on the Indian Autonomous System Number (ASN) space, we performed scans over 3 months to successfully find vulnerable routers and extracted the DNS information from these vulnerable routers. In this paper we present the methodology followed for scanning, a detailed analysis report of the information we were able to collect and an insight into the current trends in the attack patterns. We conclude by proposing recommendations for mitigating these attacks.

2017-02-23
P. Jain, S. Nandanwar.  2015.  "Securing the Clustered Database Using Data Modification Technique". 2015 International Conference on Computational Intelligence and Communication Networks (CICN). :1163-1166.

The new era of information communication and technology (ICT), everyone wants to store/share their Data or information in online media, like in cloud database, mobile database, grid database, drives etc. When the data is stored in online media the main problem is arises related to data is privacy because different types of hacker, attacker or crackers wants to disclose their private information as publically. Security is a continuous process of protecting the data or information from attacks. For securing that information from those kinds of unauthorized people we proposed and implement of one the technique based on the data modification concept with taking the iris database on weka tool. And this paper provides the high privacy in distributed clustered database environments.

A. Rahmani, A. Amine, M. R. Hamou.  2015.  "De-identification of Textual Data Using Immune System for Privacy Preserving in Big Data". 2015 IEEE International Conference on Computational Intelligence Communication Technology. :112-116.

With the growing observed success of big data use, many challenges appeared. Timeless, scalability and privacy are the main problems that researchers attempt to figure out. Privacy preserving is now a highly active domain of research, many works and concepts had seen the light within this theme. One of these concepts is the de-identification techniques. De-identification is a specific area that consists of finding and removing sensitive information either by replacing it, encrypting it or adding a noise to it using several techniques such as cryptography and data mining. In this report, we present a new model of de-identification of textual data using a specific Immune System algorithm known as CLONALG.

2017-02-14
A. Oprea, Z. Li, T. F. Yen, S. H. Chin, S. Alrwais.  2015.  "Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data". 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks. :45-56.

Recent years have seen the rise of sophisticated attacks including advanced persistent threats (APT) which pose severe risks to organizations and governments. Additionally, new malware strains appear at a higher rate than ever before. Since many of these malware evade existing security products, traditional defenses deployed by enterprises today often fail at detecting infections at an early stage. We address the problem of detecting early-stage APT infection by proposing a new framework based on belief propagation inspired from graph theory. We demonstrate that our techniques perform well on two large datasets. We achieve high accuracy on two months of DNS logs released by Los Alamos National Lab (LANL), which include APT infection attacks simulated by LANL domain experts. We also apply our algorithms to 38TB of web proxy logs collected at the border of a large enterprise and identify hundreds of malicious domains overlooked by state-of-the-art security products.

J. Vukalović, D. Delija.  2015.  "Advanced Persistent Threats - detection and defense". 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). :1324-1330.

The term “Advanced Persistent Threat” refers to a well-organized, malicious group of people who launch stealthy attacks against computer systems of specific targets, such as governments, companies or military. The attacks themselves are long-lasting, difficult to expose and often use very advanced hacking techniques. Since they are advanced in nature, prolonged and persistent, the organizations behind them have to possess a high level of knowledge, advanced tools and competent personnel to execute them. The attacks are usually preformed in several phases - reconnaissance, preparation, execution, gaining access, information gathering and connection maintenance. In each of the phases attacks can be detected with different probabilities. There are several ways to increase the level of security of an organization in order to counter these incidents. First and foremost, it is necessary to educate users and system administrators on different attack vectors and provide them with knowledge and protection so that the attacks are unsuccessful. Second, implement strict security policies. That includes access control and restrictions (to information or network), protecting information by encrypting it and installing latest security upgrades. Finally, it is possible to use software IDS tools to detect such anomalies (e.g. Snort, OSSEC, Sguil).

F. Quader, V. Janeja, J. Stauffer.  2015.  "Persistent threat pattern discovery". 2015 IEEE International Conference on Intelligence and Security Informatics (ISI). :179-181.

Advanced Persistent Threat (APT) is a complex (Advanced) cyber-attack (Threat) against specific targets over long periods of time (Persistent) carried out by nation states or terrorist groups with highly sophisticated levels of expertise to establish entries into organizations, which are critical to a country's socio-economic status. The key identifier in such persistent threats is that patterns are long term, could be high priority, and occur consistently over a period of time. This paper focuses on identifying persistent threat patterns in network data, particularly data collected from Intrusion Detection Systems. We utilize Association Rule Mining (ARM) to detect persistent threat patterns on network data. We identify potential persistent threat patterns, which are frequent but at the same time unusual as compared with the other frequent patterns.

2017-02-23
G. DAngelo, S. Rampone, F. Palmieri.  2015.  "An Artificial Intelligence-Based Trust Model for Pervasive Computing". 2015 10th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC). :701-706.

Pervasive Computing is one of the latest and more advanced paradigms currently available in the computers arena. Its ability to provide the distribution of computational services within environments where people live, work or socialize leads to make issues such as privacy, trust and identity more challenging compared to traditional computing environments. In this work we review these general issues and propose a Pervasive Computing architecture based on a simple but effective trust model that is better able to cope with them. The proposed architecture combines some Artificial Intelligence techniques to achieve close resemblance with human-like decision making. Accordingly, Apriori algorithm is first used in order to extract the behavioral patterns adopted from the users during their network interactions. Naïve Bayes classifier is then used for final decision making expressed in term of probability of user trustworthiness. To validate our approach we applied it to some typical ubiquitous computing scenarios. The obtained results demonstrated the usefulness of such approach and the competitiveness against other existing ones.

2017-03-08
Bottazzi, G., Italiano, G. F..  2015.  Fast Mining of Large-Scale Logs for Botnet Detection: A Field Study. 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing. :1989–1996.

Botnets are considered one of the most dangerous species of network-based attack today because they involve the use of very large coordinated groups of hosts simultaneously. The behavioral analysis of computer networks is at the basis of the modern botnet detection methods, in order to intercept traffic generated by malwares for which signatures do not exist yet. Defining a pattern of features to be placed at the basis of behavioral analysis, puts the emphasis on the quantity and quality of information to be caught and used to mark data streams as normal or abnormal. The problem is even more evident if we consider extensive computer networks or clouds. With the present paper we intend to show how heuristics applied to large-scale proxy logs, considering a typical phase of the life cycle of botnets such as the search for C&C Servers through AGDs (Algorithmically Generated Domains), may provide effective and extremely rapid results. The present work will introduce some novel paradigms. The first is that some of the elements of the supply chain of botnets could be completed without any interaction with the Internet, mostly in presence of wide computer networks and/or clouds. The second is that behind a large number of workstations there are usually "human beings" and it is unlikely that their behaviors will cause marked changes in the interaction with the Internet in a fairly narrow time frame. Finally, AGDs can highlight, at the moment, common lexical features, detectable quickly and without using any black/white list.

2017-03-07
Cordero, C. G., Vasilomanolakis, E., Milanov, N., Koch, C., Hausheer, D., Mühlhäuser, M..  2015.  ID2T: A DIY dataset creation toolkit for Intrusion Detection Systems. 2015 IEEE Conference on Communications and Network Security (CNS). :739–740.

Intrusion Detection Systems (IDSs) are an important defense tool against the sophisticated and ever-growing network attacks. These systems need to be evaluated against high quality datasets for correctly assessing their usefulness and comparing their performance. We present an Intrusion Detection Dataset Toolkit (ID2T) for the creation of labeled datasets containing user defined synthetic attacks. The architecture of the toolkit is provided for examination and the example of an injected attack, in real network traffic, is visualized and analyzed. We further discuss the ability of the toolkit of creating realistic synthetic attacks of high quality and low bias.

2015-05-06
Perzyk, Marcin, Kochanski, Andrzej, Kozlowski, Jacek, Soroczynski, Artur, Biernacki, Robert.  2014.  Comparison of Data Mining Tools for Significance Analysis of Process Parameters in Applications to Process Fault Diagnosis. Inf. Sci.. 259:380–392.

This paper presents an evaluation of various methodologies used to determine relative significances of input variables in data-driven models. Significance analysis applied to manufacturing process parameters can be a useful tool in fault diagnosis for various types of manufacturing processes. It can also be applied to building models that are used in process control. The relative significances of input variables can be determined by various data mining methods, including relatively simple statistical procedures as well as more advanced machine learning systems. Several methodologies suitable for carrying out classification tasks which are characteristic of fault diagnosis were evaluated and compared from the viewpoint of their accuracy, robustness of results and applicability. Two types of testing data were used: synthetic data with assumed dependencies and real data obtained from the foundry industry. The simple statistical method based on contingency tables revealed the best overall performance, whereas advanced machine learning models, such as ANNs and SVMs, appeared to be of less value.

2015-05-05
Lei Xu, Chunxiao Jiang, Jian Wang, Jian Yuan, Yong Ren.  2014.  Information Security in Big Data: Privacy and Data Mining. Access, IEEE. 2:1149-1176.

The growing popularity and development of data mining technologies bring serious threat to the security of individual,'s sensitive information. An emerging research topic in data mining, known as privacy-preserving data mining (PPDM), has been extensively studied in recent years. The basic idea of PPDM is to modify the data in such a way so as to perform data mining algorithms effectively without compromising the security of sensitive information contained in the data. Current studies of PPDM mainly focus on how to reduce the privacy risk brought by data mining operations, while in fact, unwanted disclosure of sensitive information may also happen in the process of data collecting, data publishing, and information (i.e., the data mining results) delivering. In this paper, we view the privacy issues related to data mining from a wider perspective and investigate various approaches that can help to protect sensitive information. In particular, we identify four different types of users involved in data mining applications, namely, data provider, data collector, data miner, and decision maker. For each type of user, we discuss his privacy concerns and the methods that can be adopted to protect sensitive information. We briefly introduce the basics of related research topics, review state-of-the-art approaches, and present some preliminary thoughts on future research directions. Besides exploring the privacy-preserving approaches for each type of user, we also review the game theoretical approaches, which are proposed for analyzing the interactions among different users in a data mining scenario, each of whom has his own valuation on the sensitive information. By differentiating the responsibilities of different users with respect to security of sensitive information, we would like to provide some useful insights into the study of PPDM.

2016-12-05
Eric Yuan, Naeem Esfahani, Sam Malek.  2014.  Automated Mining of Software Component Interactions for Self-Adaptation. SEAMS 2014 Proceedings of the 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. :27-36.

A self-adaptive software system should be able to monitor and analyze its runtime behavior and make adaptation decisions accordingly to meet certain desirable objectives. Traditional software adaptation techniques and recent “models@runtime” approaches usually require an a priori model for a system’s dynamic behavior. Oftentimes the model is difficult to define and labor-intensive to maintain, and tends to get out of date due to adaptation and architecture decay. We propose an alternative approach that does not require defining the system’s behavior model beforehand, but instead involves mining software component interactions from system execution traces to build a probabilistic usage model, which is in turn used to analyze, plan, and execute adaptations. Our preliminary evaluation of the approach against an Emergency Deployment System shows that the associations mining model can be used to effectively address a variety of adaptation needs, including (1) safely applying dynamic changes to a running software system without creating inconsistencies, (2) identifying potentially malicious (abnormal) behavior for self-protection, and (3) our ongoing research on improving deployment of software components in a distributed setting for performance self-optimization.

2015-05-06
Bou-Harb, E., Debbabi, M., Assi, C..  2014.  Behavioral analytics for inferring large-scale orchestrated probing events. Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on. :506-511.

The significant dependence on cyberspace has indeed brought new risks that often compromise, exploit and damage invaluable data and systems. Thus, the capability to proactively infer malicious activities is of paramount importance. In this context, inferring probing events, which are commonly the first stage of any cyber attack, render a promising tactic to achieve that task. We have been receiving for the past three years 12 GB of daily malicious real darknet data (i.e., Internet traffic destined to half a million routable yet unallocated IP addresses) from more than 12 countries. This paper exploits such data to propose a novel approach that aims at capturing the behavior of the probing sources in an attempt to infer their orchestration (i.e., coordination) pattern. The latter defines a recently discovered characteristic of a new phenomenon of probing events that could be ominously leveraged to cause drastic Internet-wide and enterprise impacts as precursors of various cyber attacks. To accomplish its goals, the proposed approach leverages various signal and statistical techniques, information theoretical metrics, fuzzy approaches with real malware traffic and data mining methods. The approach is validated through one use case that arguably proves that a previously analyzed orchestrated probing event from last year is indeed still active, yet operating in a stealthy, very low rate mode. We envision that the proposed approach that is tailored towards darknet data, which is frequently, abundantly and effectively used to generate cyber threat intelligence, could be used by network security analysts, emergency response teams and/or observers of cyber events to infer large-scale orchestrated probing events for early cyber attack warning and notification.
 

Khobragade, P.K., Malik, L.G..  2014.  Data Generation and Analysis for Digital Forensic Application Using Data Mining. Communication Systems and Network Technologies (CSNT), 2014 Fourth International Conference on. :458-462.

In the cyber crime huge log data, transactional data occurs which tends to plenty of data for storage and analyze them. It is difficult for forensic investigators to play plenty of time to find out clue and analyze those data. In network forensic analysis involves network traces and detection of attacks. The trace involves an Intrusion Detection System and firewall logs, logs generated by network services and applications, packet captures by sniffers. In network lots of data is generated in every event of action, so it is difficult for forensic investigators to find out clue and analyzing those data. In network forensics is deals with analysis, monitoring, capturing, recording, and analysis of network traffic for detecting intrusions and investigating them. This paper focuses on data collection from the cyber system and web browser. The FTK 4.0 is discussing for memory forensic analysis and remote system forensic which is to be used as evidence for aiding investigation.
 

2015-04-30
Katkar, V.D., Bhatia, D.S..  2014.  Lightweight approach for detection of denial of service attacks using numeric to binary preprocessing. Circuits, Systems, Communication and Information Technology Applications (CSCITA), 2014 International Conference on. :207-212.


Denial of Service (DoS) and Distributed Denial of Service (DDoS) attack, exhausts the resources of server/service and makes it unavailable for legitimate users. With increasing use of online services and attacks on these services, the importance of Intrusion Detection System (IDS) for detection of DoS/DDoS attacks has also grown. Detection accuracy & CPU utilization of Data mining based IDS is directly proportional to the quality of training dataset used to train it. Various preprocessing methods like normalization, discretization, fuzzification are used by researchers to improve the quality of training dataset. This paper evaluates the effect of various data preprocessing methods on the detection accuracy of DoS/DDoS attack detection IDS and proves that numeric to binary preprocessing method performs better compared to other methods. Experimental results obtained using KDD 99 dataset are provided to support the efficiency of proposed combination.
 

2015-05-05
Rocha, T.S., Souto, E..  2014.  ETSSDetector: A Tool to Automatically Detect Cross-Site Scripting Vulnerabilities. Network Computing and Applications (NCA), 2014 IEEE 13th International Symposium on. :306-309.

The inappropriate use of features intended to improve usability and interactivity of web applications has resulted in the emergence of various threats, including Cross-Site Scripting(XSS) attacks. In this work, we developed ETSS Detector, a generic and modular web vulnerability scanner that automatically analyzes web applications to find XSS vulnerabilities. ETSS Detector is able to identify and analyze all data entry points of the application and generate specific code injection tests for each one. The results shows that the correct filling of the input fields with only valid information ensures a better effectiveness of the tests, increasing the detection rate of XSS attacks.
 

Dey, L., Mahajan, D., Gupta, H..  2014.  Obtaining Technology Insights from Large and Heterogeneous Document Collections. Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. 1:102-109.

Keeping up with rapid advances in research in various fields of Engineering and Technology is a challenging task. Decision makers including academics, program managers, venture capital investors, industry leaders and funding agencies not only need to be abreast of latest developments but also be able to assess the effect of growth in certain areas on their core business. Though analyst agencies like Gartner, McKinsey etc. Provide such reports for some areas, thought leaders of all organisations still need to amass data from heterogeneous collections like research publications, analyst reports, patent applications, competitor information etc. To help them finalize their own strategies. Text mining and data analytics researchers have been looking at integrating statistics, text analytics and information visualization to aid the process of retrieval and analytics. In this paper, we present our work on automated topical analysis and insight generation from large heterogeneous text collections of publications and patents. While most of the earlier work in this area provides search-based platforms, ours is an integrated platform for search and analysis. We have presented several methods and techniques that help in analysis and better comprehension of search results. We have also presented methods for generating insights about emerging and popular trends in research along with contextual differences between academic research and patenting profiles. We also present novel techniques to present topic evolution that helps users understand how a particular area has evolved over time.
 

Babour, A., Khan, J.I..  2014.  Tweet Sentiment Analytics with Context Sensitive Tone-Word Lexicon. Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. 1:392-399.

In this paper we propose a twitter sentiment analytics that mines for opinion polarity about a given topic. Most of current semantic sentiment analytics depends on polarity lexicons. However, many key tone words are frequently bipolar. In this paper we demonstrate a technique which can accommodate the bipolarity of tone words by context sensitive tone lexicon learning mechanism where the context is modeled by the semantic neighborhood of the main target. Performance analysis shows that ability to contextualize the tone word polarity significantly improves the accuracy.

Falcon, R., Abielmona, R., Billings, S., Plachkov, A., Abbass, H..  2014.  Risk management with hard-soft data fusion in maritime domain awareness. Computational Intelligence for Security and Defense Applications (CISDA), 2014 Seventh IEEE Symposium on. :1-8.

Enhanced situational awareness is integral to risk management and response evaluation. Dynamic systems that incorporate both hard and soft data sources allow for comprehensive situational frameworks which can supplement physical models with conceptual notions of risk. The processing of widely available semi-structured textual data sources can produce soft information that is readily consumable by such a framework. In this paper, we augment the situational awareness capabilities of a recently proposed risk management framework (RMF) with the incorporation of soft data. We illustrate the beneficial role of the hard-soft data fusion in the characterization and evaluation of potential vessels in distress within Maritime Domain Awareness (MDA) scenarios. Risk features pertaining to maritime vessels are defined a priori and then quantified in real time using both hard (e.g., Automatic Identification System, Douglas Sea Scale) as well as soft (e.g., historical records of worldwide maritime incidents) data sources. A risk-aware metric to quantify the effectiveness of the hard-soft fusion process is also proposed. Though illustrated with MDA scenarios, the proposed hard-soft fusion methodology within the RMF can be readily applied to other domains.