Biblio

List
Filter

Found 55 results

Filters: Keyword is Hidden Markov models [Clear All Filters]

2020-08-24

Sophakan, Natnaree, Sathitwiriyawong, Chanboon. 2019. A Secured OpenFlow-Based Software Defined Networking Using Dynamic Bayesian Network. 2019 19th International Conference on Control, Automation and Systems (ICCAS). :1517–1522.

OpenFlow has been the main standard protocol of software defined networking (SDN) since the launch of this new networking paradigm. It is a programmable network protocol that controls traffic flows among switches and routers regardless of their platforms. Its security relies on the optional implementation of Transport Layer Security (TLS) which has been proven vulnerable. The aim of this research was to develop a secured OpenFlow, so-called Secured-OF. A stateful firewall was used to store state information for further analysis. Dynamic Bayesian Network (DBN) was used to learn denial-of-service attack and distributed denial-of-service attack. It analyzes packet states to determine the nature of an attack and adds that piece of information to the flow table entry. The proposed Secured-OF model in Ryu controller was evaluated with several performance metrics. The analytical evaluation of the proposed Secured-OF scheme was performed on an emulated network. The results showed that the proposed Secured-OF scheme offers a high attack detection accuracy at 99.5%. In conclusion, it was able to improve the security of the OpenFlow controller dramatically with trivial performance degradation compared to an SDN with no security implementation.

2020-07-06

Chai, Yadeng, Liu, Yong. 2019. Natural Spoken Instructions Understanding for Robot with Dependency Parsing. 2019 IEEE 9th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER). :866–871.

This paper presents a method based on syntactic information, which can be used for intent determination and slot filling tasks in a spoken language understanding system including the spoken instructions understanding module for robot. Some studies in recent years attempt to solve the problem of spoken language understanding via syntactic information. This research is a further extension of these approaches which is based on dependency parsing. In this model, the input for neural network are vectors generated by a dependency parsing tree, which we called window vector. This vector contains dependency features that improves performance of the syntactic-based model. The model has been evaluated on the benchmark ATIS task, and the results show that it outperforms many other syntactic-based approaches, especially in terms of slot filling, it has a performance level on par with some state of the art deep learning algorithms in recent years. Also, the model has been evaluated on FBM3, a dataset of the RoCKIn@Home competition. The overall rate of correctly understanding the instructions for robot is quite good but still not acceptable in practical use, which is caused by the small scale of FBM3.

2020-05-18

Nambiar, Sindhya K, Leons, Antony, Jose, Soniya, Arunsree. 2019. Natural Language Processing Based Part of Speech Tagger using Hidden Markov Model. 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC). :782–785.

In various natural language processing applications, PART-OF-SPEECH (POS) tagging is performed as a preprocessing step. For making POS tagging accurate, various techniques have been explored. But in Indian languages, not much work has been done. This paper describes the methods to build a Part of speech tagger by using hidden markov model. Supervised learning approach is implemented in which, already tagged sentences in malayalam is used to build hidden markov model.

2020-04-06

Li, Jiabin, Xue, Zhi. 2019. Distributed Threat Intelligence Sharing System: A New Sight of P2P Botnet Detection. 2019 2nd International Conference on Computer Applications Information Security (ICCAIS). :1–6.

Botnet has been evolving over time since its birth. Nowadays, P2P (Peer-to-Peer) botnet has become a main threat to cyberspace security, owing to its strong concealment and easy expansibility. In order to effectively detect P2P botnet, researchers often focus on the analysis of network traffic. For the sake of enriching P2P botnet detection methods, the author puts forward a new sight of applying distributed threat intelligence sharing system to P2P botnet detection. This system aims to fight against distributed botnet by using distributed methods itself, and then to detect botnet in real time. To fulfill the goal of botnet detection, there are 3 important parts: the threat intelligence sharing and evaluating system, the BAV quantitative TI model, and the AHP and HMM based analysis algorithm. Theoretically, this method should work on different types of distributed cyber threat besides P2P botnet.

2020-03-18

Li, Tao, Guo, Yuanbo, Ju, Ankang. 2019. A Self-Attention-Based Approach for Named Entity Recognition in Cybersecurity. 2019 15th International Conference on Computational Intelligence and Security (CIS). :147–150.

With cybersecurity situation more and more complex, data-driven security has become indispensable. Numerous cybersecurity data exists in textual sources and data analysis is difficult for both security analyst and the machine. To convert the textual information into structured data for further automatic analysis, we extract cybersecurity-related entities and propose a self-attention-based neural network model for the named entity recognition in cybersecurity. Considering the single word feature not enough for identifying the entity, we introduce CNN to extract character feature which is then concatenated into the word feature. Then we add the self-attention mechanism based on the existing BiLSTM-CRF model. Finally, we evaluate the proposed model on the labelled dataset and obtain a better performance than the previous entity extraction model.

2020-03-09

Li, Zhixin, Liu, Lei, Kong, Degang. 2019. Virtual Machine Failure Prediction Method Based on AdaBoost-Hidden Markov Model. 2019 International Conference on Intelligent Transportation, Big Data Smart City (ICITBS). :700–703.

The failure prediction method of virtual machines (VM) guarantees reliability to cloud platforms. However, the uncertainty of VM security state will affect the reliability and task processing capabilities of the entire cloud platform. In this study, a failure prediction method of VM based on AdaBoost-Hidden Markov Model was proposed to improve the reliability of VMs and overall performance of cloud platforms. This method analyzed the deep relationship between the observation state and the hidden state of the VM through the hidden Markov model, proved the influence of the AdaBoost algorithm on the hidden Markov model (HMM), and realized the prediction of the VM failure state. Results show that the proposed method adapts to the complex dynamic cloud platform environment, can effectively predict the failure state of VMs, and improve the predictive ability of VM security state.

2020-02-10

Ishtiaq, Asra, Islam, Muhammad Arshad, Azhar Iqbal, Muhammad, Aleem, Muhammad, Ahmed, Usman. 2019. Graph Centrality Based Spam SMS Detection. 2019 16th International Bhurban Conference on Applied Sciences and Technology (IBCAST). :629–633.

Short messages usage has been tremendously increased such as SMS, tweets and status updates. Due to its popularity and ease of use, many companies use it for advertisement purpose. Hackers also use SMS to defraud users and steal personal information. In this paper, the use of Graphs centrality metrics is proposed for spam SMS detection. The graph centrality measures: degree, closeness, and eccentricity are used for classification of SMS. Graphs for each class are created using labeled SMS and then unlabeled SMS is classified using the centrality scores of the token available in the unclassified SMS. Our results show that highest precision and recall is achieved by using degree centrality. Degree centrality achieved the highest precision i.e. 0.81 and recall i.e., 0.76 for spam messages.

2020-01-28

Xuan, Shichang, Wang, Huanhong, Gao, Duo, Chung, Ilyong, Wang, Wei, Yang, Wu. 2019. Network Penetration Identification Method Based on Interactive Behavior Analysis. 2019 Seventh International Conference on Advanced Cloud and Big Data (CBD). :210–215.

The Internet has gradually penetrated into the national economy, politics, culture, military, education and other fields. Due to its openness, interconnectivity and other characteristics, the Internet is vulnerable to all kinds of malicious attacks. The research uses a honeynet to collect attacker information, and proposes a network penetration recognition technology based on interactive behavior analysis. Using Sebek technology to capture the attacker's keystroke record, time series modeling of the keystroke sequences of the interaction behavior is proposed, using a Recurrent Neural Network. The attack recognition method is constructed by using Long Short-Term Memory that solves the problem of gradient disappearance, gradient explosion and long-term memory shortage in ordinary Recurrent Neural Network. Finally, the experiment verifies that the short-short time memory network has a high accuracy rate for the recognition of penetration attacks.

2019-12-30

Kim, Sunbin, Kim, Hyeoncheol. 2019. Deep Explanation Model for Facial Expression Recognition Through Facial Action Coding Unit. 2019 IEEE International Conference on Big Data and Smart Computing (BigComp). :1–4.

Facial expression is the most powerful and natural non-verbal emotional communication method. Facial Expression Recognition(FER) has significance in machine learning tasks. Deep Learning models perform well in FER tasks, but it doesn't provide any justification for its decisions. Based on the hypothesis that facial expression is a combination of facial muscle movements, we find that Facial Action Coding Units(AUs) and Emotion label have a relationship in CK+ Dataset. In this paper, we propose a model which utilises AUs to explain Convolutional Neural Network(CNN) model's classification results. The CNN model is trained with CK+ Dataset and classifies emotion based on extracted features. Explanation model classifies the multiple AUs with the extracted features and emotion classes from the CNN model. Our experiment shows that with only features and emotion classes obtained from the CNN model, Explanation model generates AUs very well.

2019-10-15

Panagiotakis, C., Papadakis, H., Fragopoulou, P.. 2018. Detection of Hurriedly Created Abnormal Profiles in Recommender Systems. 2018 International Conference on Intelligent Systems (IS). :499–506.

Recommender systems try to predict the preferences of users for specific items. These systems suffer from profile injection attacks, where the attackers have some prior knowledge of the system ratings and their goal is to promote or demote a particular item introducing abnormal (anomalous) ratings. The detection of both cases is a challenging problem. In this paper, we propose a framework to spot anomalous rating profiles (outliers), where the outliers hurriedly create a profile that injects into the system either random ratings or specific ratings, without any prior knowledge of the existing ratings. The proposed detection method is based on the unpredictable behavior of the outliers in a validation set, on the user-item rating matrix and on the similarity between users. The proposed system is totally unsupervised, and in the last step it uses the k-means clustering method automatically spotting the spurious profiles. For the cases where labeling sample data is available, a random forest classifier is trained to show how supervised methods outperforms unsupervised ones. Experimental results on the MovieLens 100k and the MovieLens 1M datasets demonstrate the high performance of the proposed schemata.

2019-04-05

Bapat, R., Mandya, A., Liu, X., Abraham, B., Brown, D. E., Kang, H., Veeraraghavan, M.. 2018. Identifying Malicious Botnet Traffic Using Logistic Regression. 2018 Systems and Information Engineering Design Symposium (SIEDS). :266-271.

An important source of cyber-attacks is malware, which proliferates in different forms such as botnets. The botnet malware typically looks for vulnerable devices across the Internet, rather than targeting specific individuals, companies or industries. It attempts to infect as many connected devices as possible, using their resources for automated tasks that may cause significant economic and social harm while being hidden to the user and device. Thus, it becomes very difficult to detect such activity. A considerable amount of research has been conducted to detect and prevent botnet infestation. In this paper, we attempt to create a foundation for an anomaly-based intrusion detection system using a statistical learning method to improve network security and reduce human involvement in botnet detection. We focus on identifying the best features to detect botnet activity within network traffic using a lightweight logistic regression model. The network traffic is processed by Bro, a popular network monitoring framework which provides aggregate statistics about the packets exchanged between a source and destination over a certain time interval. These statistics serve as features to a logistic regression model responsible for classifying malicious and benign traffic. Our model is easy to implement and simple to interpret. We characterized and modeled 8 different botnet families separately and as a mixed dataset. Finally, we measured the performance of our model on multiple parameters using F1 score, accuracy and Area Under Curve (AUC).

2019-01-21

Wang, X., Hou, Y., Huang, X., Li, D., Tao, X., Xu, J.. 2018. Security Analysis of Key Extraction from Physical Measurements with Multiple Adversaries. 2018 IEEE International Conference on Communications Workshops (ICC Workshops). :1–6.

In this paper, security of secret key extraction scheme is evaluated for private communication between legitimate wireless devices. Multiple adversaries that distribute around these legitimate wireless devices eavesdrop on the data transmitted between them, and deduce the secret key. Conditional min-entropy given the view of those adversaries is utilized as security evaluation metric in this paper. Besides, the wiretap channel model and hidden Markov model (HMM) are regarded as the channel model and a dynamic programming approach is used to approximate conditional min- entropy. Two algorithms are proposed to mathematically calculate the conditional min- entropy by combining the Viterbi algorithm with the Forward algorithm. Optimal method with multiple adversaries (OME) algorithm is proposed firstly, which has superior performance but exponential computation complexity. To reduce this complexity, suboptimal method with multiple adversaries (SOME) algorithm is proposed, using performance degradation for the computation complexity reduction. In addition to the theoretical analysis, simulation results further show that the OME algorithm indeed has superior performance as well as the SOME algorithm has more efficient computation.

2018-12-10

Schonherr, L., Zeiler, S., Kolossa, D.. 2017. Spoofing detection via simultaneous verification of audio-visual synchronicity and transcription. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). :591–598.

Acoustic speaker recognition systems are very vulnerable to spoofing attacks via replayed or synthesized utterances. One possible countermeasure is audio-visual speaker recognition. Nevertheless, the addition of the visual stream alone does not prevent spoofing attacks completely and only provides further information to assess the authenticity of the utterance. Many systems consider audio and video modalities independently and can easily be spoofed by imitating only a single modality or by a bimodal replay attack with a victim's photograph or video. Therefore, we propose the simultaneous verification of the data synchronicity and the transcription in a challenge-response setup. We use coupled hidden Markov models (CHMMs) for a text-dependent spoofing detection and introduce new features that provide information about the transcriptions of the utterance and the synchronicity of both streams. We evaluate the features for various spoofing scenarios and show that the combination of the features leads to a more robust recognition, also in comparison to the baseline method. Additionally, by evaluating the data on unseen speakers, we show the spoofing detection to be applicable in speaker-independent use-cases.

2018-11-14

Teoh, T. T., Nguwi, Y. Y., Elovici, Y., Cheung, N. M., Ng, W. L.. 2017. Analyst Intuition Based Hidden Markov Model on High Speed, Temporal Cyber Security Big Data. 2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). :2080–2083.

Hidden Markov Models (HMM) are probabilistic models that can be used for forecasting time series data. It has seen success in various domains like finance [1-5], bioinformatics [6-8], healthcare [9-11], agriculture [12-14], artificial intelligence[15-17]. However, the use of HMM in cyber security found to date is numbered. We believe the properties of HMM being predictive, probabilistic, and its ability to model different naturally occurring states form a good basis to model cyber security data. It is hence the motivation of this work to provide the initial results of our attempts to predict security attacks using HMM. A large network datasets representing cyber security attacks have been used in this work to establish an expert system. The characteristics of attacker's IP addresses can be extracted from our integrated datasets to generate statistical data. The cyber security expert provides the weight of each attribute and forms a scoring system by annotating the log history. We applied HMM to distinguish between a cyber security attack, unsure and no attack by first breaking the data into 3 cluster using Fuzzy K mean (FKM), then manually label a small data (Analyst Intuition) and finally use HMM state-based approach. By doing so, our results are very encouraging as compare to finding anomaly in a cyber security log, which generally results in creating huge amount of false detection.

2018-06-20

Pranamulia, R., Asnar, Y., Perdana, R. S.. 2017. Profile hidden Markov model for malware classification \#x2014; usage of system call sequence for malware classification. 2017 International Conference on Data and Software Engineering (ICoDSE). :1–5.

Malware technology makes it difficult for malware analyst to detect same malware files with different obfuscation technique. In this paper we are trying to tackle that problem by analyzing the sequence of system call from an executable file. Malware files which actually are the same should have almost identical or at least a similar sequence of system calls. In this paper, we are going to create a model for each malware class consists of malwares from different families based on its sequence of system calls. Method/algorithm that's used in this paper is profile hidden markov model which is a very well-known tool in the biological informatics field for comparing DNA and protein sequences. Malware classes that we are going to build are trojan and worm class. Accuracy for these classes are pretty high, it's above 90% with also a high false positive rate around 37%.

2018-05-09

Yu, L., Wang, Q., Barrineau, G., Oakley, J., Brooks, R. R., Wang, K. C.. 2017. TARN: A SDN-based traffic analysis resistant network architecture. 2017 12th International Conference on Malicious and Unwanted Software (MALWARE). :91–98.

Destination IP prefix-based routing protocols are core to Internet routing today. Internet autonomous systems (AS) possess fixed IP prefixes, while packets carry the intended destination AS's prefix in their headers, in clear text. As a result, network communications can be easily identified using IP addresses and become targets of a wide variety of attacks, such as DNS/IP filtering, distributed Denial-of-Service (DDoS) attacks, man-in-the-middle (MITM) attacks, etc. In this work, we explore an alternative network architecture that fundamentally removes such vulnerabilities by disassociating the relationship between IP prefixes and destination networks, and by allowing any end-to-end communication session to have dynamic, short-lived, and pseudo-random IP addresses drawn from a range of IP prefixes rather than one. The concept is seemingly impossible to realize in todays Internet. We demonstrate how this is doable today with three different strategies using software defined networking (SDN), and how this can be done at scale to transform the Internet addressing and routing paradigms with the novel concept of a distributed software defined Internet exchange (SDX). The solution works with both IPv4 and IPv6, whereas the latter provides higher degrees of IP addressing freedom. Prototypes based on Open vSwitches (OVS) have been implemented for experimentation across the PEERING BGP testbed. The SDX solution not only provides a technically sustainable pathway towards large-scale traffic analysis resistant network (TARN) support, it also unveils a new business model for customer-driven, customizable and trustable end-to-end network services.

Shan-Shan, J., Ya-Bin, X.. 2017. The APT detection method in SDN. 2017 3rd IEEE International Conference on Computer and Communications (ICCC). :1240–1245.

SDN is a new network framework which can be controlled and defined by software programming, and OpenFlow is the communication protocol between SDN controller plane and data plane. With centralized control of SDN, the network is more vulnerable encounter APT than traditional network. After deeply analyzing the process of APT at each stage in SDN, this paper proposes the APT detection method based on HMM, which can fully reflect the relationship between attack behavior and APT stage. Experiment shows that the method is more accurate to detect APT in SDN, and less overhead.

2018-02-27

Stefanova, Z., Ramachandran, K.. 2017. Network Attribute Selection, Classification and Accuracy (NASCA) Procedure for Intrusion Detection Systems. 2017 IEEE International Symposium on Technologies for Homeland Security (HST). :1–7.

With the progressive development of network applications and software dependency, we need to discover more advanced methods for protecting our systems. Each industry is equally affected, and regardless of whether we consider the vulnerability of the government or each individual household or company, we have to find a sophisticated and secure way to defend our systems. The starting point is to create a reliable intrusion detection mechanism that will help us to identify the attack at a very early stage; otherwise in the cyber security space the intrusion can affect the system negatively, which can cause enormous consequences and damage the system's privacy, security or financial stability. This paper proposes a concise, and easy to use statistical learning procedure, abbreviated NASCA, which is a four-stage intrusion detection method that can successfully detect unwanted intrusion to our systems. The model is static, but it can be adapted to a dynamic set up.

2017-11-20

Du, H., Jung, T., Jian, X., Hu, Y., Hou, J., Li, X. Y.. 2016. User-Demand-Oriented Privacy-Preservation in Video Delivering. 2016 12th International Conference on Mobile Ad-Hoc and Sensor Networks (MSN). :145–151.

This paper presents a framework for privacy-preserving video delivery system to fulfill users' privacy demands. The proposed framework leverages the inference channels in sensitive behavior prediction and object tracking in a video surveillance system for the sequence privacy protection. For such a goal, we need to capture different pieces of evidence which are used to infer the identity. The temporal, spatial and context features are extracted from the surveillance video as the observations to perceive the privacy demands and their correlations. Taking advantage of quantifying various evidence and utility, we let users subscribe videos with a viewer-dependent pattern. We implement a prototype system for off-line and on-line requirements in two typical monitoring scenarios to construct extensive experiments. The evaluation results show that our system can efficiently satisfy users' privacy demands while saving over 25% more video information compared to traditional video privacy protection schemes.

2017-09-15

Vemparala, Swapna, Di Troia, Fabio, Corrado, Visaggio Aaron, Austin, Thomas H., Stamo, Mark. 2016. Malware Detection Using Dynamic Birthmarks. Proceedings of the 2016 ACM on International Workshop on Security And Privacy Analytics. :41–46.

In this paper, we compare the effectiveness of Hidden Markov Models (HMMs) with that of Profile Hidden Markov Models (PHMMs), where both are trained on sequences of API calls. We compare our results to static analysis using HMMs trained on sequences of opcodes, and show that dynamic analysis achieves significantly stronger results in many cases. Furthermore, in comparing our two dynamic analysis approaches, we find that using PHMMs consistently outperforms our technique based on HMMs.

2017-03-08

Chammas, E., Mokbel, C., Likforman-Sulem, L.. 2015. Arabic handwritten document preprocessing and recognition. 2015 13th International Conference on Document Analysis and Recognition (ICDAR). :451–455.

Arabic handwritten documents present specific challenges due to the cursive nature of the writing and the presence of diacritical marks. Moreover, one of the largest labeled database of Arabic handwritten documents, the OpenHart-NIST database includes specific noise, namely guidelines, that has to be addressed. We propose several approaches to process these documents. First a guideline detection approach has been developed, based on K-means, that detects the documents that include guidelines. We then propose a series of preprocessing at text-line level to reduce the noise effects. For text-lines including guidelines, a guideline removal preprocessing is described and existing keystroke restoration approaches are assessed. In addition, we propose a preprocessing that combines noise removal and deskewing by removing line fragments from neighboring text lines, while searching for the principal orientation of the text-line. We provide recognition results, showing the significant improvement brought by the proposed processings.

2017-03-07

Baba, Asif Iqbal, Jaeger, Manfred, Lu, Hua, Pedersen, Torben Bach, Ku, Wei-Shinn, Xie, Xike. 2016. Learning-Based Cleansing for Indoor RFID Data. Proceedings of the 2016 International Conference on Management of Data. :925–936.

RFID is widely used for object tracking in indoor environments, e.g., airport baggage tracking. Analyzing RFID data offers insight into the underlying tracking systems as well as the associated business processes. However, the inherent uncertainty in RFID data, including noise (cross readings) and incompleteness (missing readings), pose challenges to high-level RFID data querying and analysis. In this paper, we address these challenges by proposing a learning-based data cleansing approach that, unlike existing approaches, requires no detailed prior knowledge about the spatio-temporal properties of the indoor space and the RFID reader deployment. Requiring only minimal information about RFID deployment, the approach learns relevant knowledge from raw RFID data and uses it to cleanse the data. In particular, we model raw RFID readings as time series that are sparse because the indoor space is only partly covered by a limited number of RFID readers. We propose the Indoor RFID Multi-variate Hidden Markov Model (IR-MHMM) to capture the uncertainties of indoor RFID data as well as the correlation of moving object locations and object RFID readings. We propose three state space design methods for IR-MHMM that enable the learning of parameters while contending with raw RFID data time series. We solely use raw uncleansed RFID data for the learning of model parameters, requiring no special labeled data or ground truth. The resulting IR-MHMM based RFID data cleansing approach is able to recover missing readings and reduce cross readings with high effectiveness and efficiency, as demonstrated by extensive experimental studies with both synthetic and real data. Given enough indoor RFID data for learning, the proposed approach achieves a data cleansing accuracy comparable to or even better than state-of-the-art techniques requiring very detailed prior knowledge, making our solution superior in terms of both effectiveness and employability.

2017-02-14

C. H. Hsieh, C. M. Lai, C. H. Mao, T. C. Kao, K. C. Lee. 2015. "AD2: Anomaly detection on active directory log data for insider threat monitoring". 2015 International Carnahan Conference on Security Technology (ICCST). :287-292.

What you see is not definitely believable is not a rare case in the cyber security monitoring. However, due to various tricks of camouflages, such as packing or virutal private network (VPN), detecting "advanced persistent threat"(APT) by only signature based malware detection system becomes more and more intractable. On the other hand, by carefully modeling users' subsequent behaviors of daily routines, probability for one account to generate certain operations can be estimated and used in anomaly detection. To the best of our knowledge so far, a novel behavioral analytic framework, which is dedicated to analyze Active Directory domain service logs and to monitor potential inside threat, is now first proposed in this project. Experiments on real dataset not only show that the proposed idea indeed explores a new feasible direction for cyber security monitoring, but also gives a guideline on how to deploy this framework to various environments.

2015-05-05

Dey, L., Mahajan, D., Gupta, H.. 2014. Obtaining Technology Insights from Large and Heterogeneous Document Collections. Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. 1:102-109.

Keeping up with rapid advances in research in various fields of Engineering and Technology is a challenging task. Decision makers including academics, program managers, venture capital investors, industry leaders and funding agencies not only need to be abreast of latest developments but also be able to assess the effect of growth in certain areas on their core business. Though analyst agencies like Gartner, McKinsey etc. Provide such reports for some areas, thought leaders of all organisations still need to amass data from heterogeneous collections like research publications, analyst reports, patent applications, competitor information etc. To help them finalize their own strategies. Text mining and data analytics researchers have been looking at integrating statistics, text analytics and information visualization to aid the process of retrieval and analytics. In this paper, we present our work on automated topical analysis and insight generation from large heterogeneous text collections of publications and patents. While most of the earlier work in this area provides search-based platforms, ours is an integrated platform for search and analysis. We have presented several methods and techniques that help in analysis and better comprehension of search results. We have also presented methods for generating insights about emerging and popular trends in research along with contextual differences between academic research and patenting profiles. We also present novel techniques to present topic evolution that helps users understand how a particular area has evolved over time.

Falcon, R., Abielmona, R., Billings, S., Plachkov, A., Abbass, H.. 2014. Risk management with hard-soft data fusion in maritime domain awareness. Computational Intelligence for Security and Defense Applications (CISDA), 2014 Seventh IEEE Symposium on. :1-8.

Enhanced situational awareness is integral to risk management and response evaluation. Dynamic systems that incorporate both hard and soft data sources allow for comprehensive situational frameworks which can supplement physical models with conceptual notions of risk. The processing of widely available semi-structured textual data sources can produce soft information that is readily consumable by such a framework. In this paper, we augment the situational awareness capabilities of a recently proposed risk management framework (RMF) with the incorporation of soft data. We illustrate the beneficial role of the hard-soft data fusion in the characterization and evaluation of potential vessels in distress within Maritime Domain Awareness (MDA) scenarios. Risk features pertaining to maritime vessels are defined a priori and then quantified in real time using both hard (e.g., Automatic Identification System, Douglas Sea Scale) as well as soft (e.g., historical records of worldwide maritime incidents) data sources. A risk-aware metric to quantify the effectiveness of the hard-soft fusion process is also proposed. Though illustrated with MDA scenarios, the proposed hard-soft fusion methodology within the RMF can be readily applied to other domains.