Visible to the public Biblio

Found 789 results

Filters: Keyword is learning (artificial intelligence)  [Clear All Filters]
2015-05-05
Koch, S., John, M., Worner, M., Muller, A., Ertl, T..  2014.  VarifocalReader #x2014; In-Depth Visual Analysis of Large Text Documents. Visualization and Computer Graphics, IEEE Transactions on. 20:1723-1732.

Interactive visualization provides valuable support for exploring, analyzing, and understanding textual documents. Certain tasks, however, require that insights derived from visual abstractions are verified by a human expert perusing the source text. So far, this problem is typically solved by offering overview-detail techniques, which present different views with different levels of abstractions. This often leads to problems with visual continuity. Focus-context techniques, on the other hand, succeed in accentuating interesting subsections of large text documents but are normally not suited for integrating visual abstractions. With VarifocalReader we present a technique that helps to solve some of these approaches' problems by combining characteristics from both. In particular, our method simplifies working with large and potentially complex text documents by simultaneously offering abstract representations of varying detail, based on the inherent structure of the document, and access to the text itself. In addition, VarifocalReader supports intra-document exploration through advanced navigation concepts and facilitates visual analysis tasks. The approach enables users to apply machine learning techniques and search mechanisms as well as to assess and adapt these techniques. This helps to extract entities, concepts and other artifacts from texts. In combination with the automatic generation of intermediate text levels through topic segmentation for thematic orientation, users can test hypotheses or develop interesting new research questions. To illustrate the advantages of our approach, we provide usage examples from literature studies.

Baughman, A.K., Chuang, W., Dixon, K.R., Benz, Z., Basilico, J..  2014.  DeepQA Jeopardy! Gamification: A Machine-Learning Perspective. Computational Intelligence and AI in Games, IEEE Transactions on. 6:55-66.

DeepQA is a large-scale natural language processing (NLP) question-and-answer system that responds across a breadth of structured and unstructured data, from hundreds of analytics that are combined with over 50 models, trained through machine learning. After the 2011 historic milestone of defeating the two best human players in the Jeopardy! game show, the technology behind IBM Watson, DeepQA, is undergoing gamification into real-world business problems. Gamifying a business domain for Watson is a composite of functional, content, and training adaptation for nongame play. During domain gamification for medical, financial, government, or any other business, each system change affects the machine-learning process. As opposed to the original Watson Jeopardy!, whose class distribution of positive-to-negative labels is 1:100, in adaptation the computed training instances, question-and-answer pairs transformed into true-false labels, result in a very low positive-to-negative ratio of 1:100 000. Such initial extreme class imbalance during domain gamification poses a big challenge for the Watson machine-learning pipelines. The combination of ingested corpus sets, question-and-answer pairs, configuration settings, and NLP algorithms contribute toward the challenging data state. We propose several data engineering techniques, such as answer key vetting and expansion, source ingestion, oversampling classes, and question set modifications to increase the computed true labels. In addition, algorithm engineering, such as an implementation of the Newton-Raphson logistic regression with a regularization term, relaxes the constraints of class imbalance during training adaptation. We conclude by empirically demonstrating that data and algorithm engineering are complementary and indispensable to overcome the challenges in this first Watson gamification for real-world business problems.

Babour, A., Khan, J.I..  2014.  Tweet Sentiment Analytics with Context Sensitive Tone-Word Lexicon. Web Intelligence (WI) and Intelligent Agent Technologies (IAT), 2014 IEEE/WIC/ACM International Joint Conferences on. 1:392-399.

In this paper we propose a twitter sentiment analytics that mines for opinion polarity about a given topic. Most of current semantic sentiment analytics depends on polarity lexicons. However, many key tone words are frequently bipolar. In this paper we demonstrate a technique which can accommodate the bipolarity of tone words by context sensitive tone lexicon learning mechanism where the context is modeled by the semantic neighborhood of the main target. Performance analysis shows that ability to contextualize the tone word polarity significantly improves the accuracy.

Yanfei Guo, Lama, P., Changjun Jiang, Xiaobo Zhou.  2014.  Automated and Agile Server ParameterTuning by Coordinated Learning and Control. Parallel and Distributed Systems, IEEE Transactions on. 25:876-886.

Automated server parameter tuning is crucial to performance and availability of Internet applications hosted in cloud environments. It is challenging due to high dynamics and burstiness of workloads, multi-tier service architecture, and virtualized server infrastructure. In this paper, we investigate automated and agile server parameter tuning for maximizing effective throughput of multi-tier Internet applications. A recent study proposed a reinforcement learning based server parameter tuning approach for minimizing average response time of multi-tier applications. Reinforcement learning is a decision making process determining the parameter tuning direction based on trial-and-error, instead of quantitative values for agile parameter tuning. It relies on a predefined adjustment value for each tuning action. However it is nontrivial or even infeasible to find an optimal value under highly dynamic and bursty workloads. We design a neural fuzzy control based approach that combines the strengths of fast online learning and self-adaptiveness of neural networks and fuzzy control. Due to the model independence, it is robust to highly dynamic and bursty workloads. It is agile in server parameter tuning due to its quantitative control outputs. We implemented the new approach on a testbed of virtualized data center hosting RUBiS and WikiBench benchmark applications. Experimental results demonstrate that the new approach significantly outperforms the reinforcement learning based approach for both improving effective system throughput and minimizing average response time.
 

2015-05-04
Boukhtouta, A., Lakhdari, N.-E., Debbabi, M..  2014.  Inferring Malware Family through Application Protocol Sequences Signature. New Technologies, Mobility and Security (NTMS), 2014 6th International Conference on. :1-5.

The dazzling emergence of cyber-threats exert today's cyberspace, which needs practical and efficient capabilities for malware traffic detection. In this paper, we propose an extension to an initial research effort, namely, towards fingerprinting malicious traffic by putting an emphasis on the attribution of maliciousness to malware families. The proposed technique in the previous work establishes a synergy between automatic dynamic analysis of malware and machine learning to fingerprint badness in network traffic. Machine learning algorithms are used with features that exploit only high-level properties of traffic packets (e.g. packet headers). Besides, the detection of malicious packets, we want to enhance fingerprinting capability with the identification of malware families responsible in the generation of malicious packets. The identification of the underlying malware family is derived from a sequence of application protocols, which is used as a signature to the family in question. Furthermore, our results show that our technique achieves promising malware family identification rate with low false positives.

Khosmood, F., Nico, P.L., Woolery, J..  2014.  User identification through command history analysis. Computational Intelligence in Cyber Security (CICS), 2014 IEEE Symposium on. :1-7.

As any veteran of the editor wars can attest, Unix users can be fiercely and irrationally attached to the commands they use and the manner in which they use them. In this work, we investigate the problem of identifying users out of a large set of candidates (25-97) through their command-line histories. Using standard algorithms and feature sets inspired by natural language authorship attribution literature, we demonstrate conclusively that individual users can be identified with a high degree of accuracy through their command-line behavior. Further, we report on the best performing feature combinations, from the many thousands that are possible, both in terms of accuracy and generality. We validate our work by experimenting on three user corpora comprising data gathered over three decades at three distinct locations. These are the Greenberg user profile corpus (168 users), Schonlau masquerading corpus (50 users) and Cal Poly command history corpus (97 users). The first two are well known corpora published in 1991 and 2001 respectively. The last is developed by the authors in a year-long study in 2014 and represents the most recent corpus of its kind. For a 50 user configuration, we find feature sets that can successfully identify users with over 90% accuracy on the Cal Poly, Greenberg and one variant of the Schonlau corpus, and over 87% on the other Schonlau variant.

Balakrishnan, R., Parekh, R..  2014.  Learning to predict subject-line opens for large-scale email marketing. Big Data (Big Data), 2014 IEEE International Conference on. :579-584.

Billions of dollars of services and goods are sold through email marketing. Subject lines have a strong influence on open rates of the e-mails, as the consumers often open e-mails based on the subject. Traditionally, the e-mail-subject lines are compiled based on the best assessment of the human editors. We propose a method to help the editors by predicting subject line open rates by learning from past subject lines. The method derives different types of features from subject lines based on keywords, performance of past subject lines and syntax. Furthermore, we evaluate the contribution of individual subject-line keywords to overall open rates based on an iterative method-namely Attribution Scoring - and use this for improved predictions. A random forest based model is trained to combine these features to predict the performance. We use a dataset of more than a hundred thousand different subject lines with many billions of impressions to train and test the method. The proposed method shows significant improvement in prediction accuracy over the baselines for both new as well as already used subject lines.
 

Yuxi Liu, Hatzinakos, D..  2014.  Human acoustic fingerprints: A novel biometric modality for mobile security. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :3784-3788.

Recently, the demand for more robust protection against unauthorized use of mobile devices has been rapidly growing. This paper presents a novel biometric modality Transient Evoked Otoacoustic Emission (TEOAE) for mobile security. Prior works have investigated TEOAE for biometrics in a setting where an individual is to be identified among a pre-enrolled identity gallery. However, this limits the applicability to mobile environment, where attacks in most cases are from imposters unknown to the system before. Therefore, we employ an unsupervised learning approach based on Autoencoder Neural Network to tackle such blind recognition problem. The learning model is trained upon a generic dataset and used to verify an individual in a random population. We also introduce the framework of mobile biometric system considering practical application. Experiments show the merits of the proposed method and system performance is further evaluated by cross-validation with an average EER 2.41% achieved.

2015-05-01
Hammoud, R.I., Sahin, C.S., Blasch, E.P., Rhodes, B.J..  2014.  Multi-source Multi-modal Activity Recognition in Aerial Video Surveillance. Computer Vision and Pattern Recognition Workshops (CVPRW), 2014 IEEE Conference on. :237-244.

Recognizing activities in wide aerial/overhead imagery remains a challenging problem due in part to low-resolution video and cluttered scenes with a large number of moving objects. In the context of this research, we deal with two un-synchronized data sources collected in real-world operating scenarios: full-motion videos (FMV) and analyst call-outs (ACO) in the form of chat messages (voice-to-text) made by a human watching the streamed FMV from an aerial platform. We present a multi-source multi-modal activity/event recognition system for surveillance applications, consisting of: (1) detecting and tracking multiple dynamic targets from a moving platform, (2) representing FMV target tracks and chat messages as graphs of attributes, (3) associating FMV tracks and chat messages using a probabilistic graph-based matching approach, and (4) detecting spatial-temporal activity boundaries. We also present an activity pattern learning framework which uses the multi-source associated data as training to index a large archive of FMV videos. Finally, we describe a multi-intelligence user interface for querying an index of activities of interest (AOIs) by movement type and geo-location, and for playing-back a summary of associated text (ACO) and activity video segments of targets-of-interest (TOIs) (in both pixel and geo-coordinates). Such tools help the end-user to quickly search, browse, and prepare mission reports from multi-source data.

Yuxi Liu, Hatzinakos, D..  2014.  Earprint: Transient Evoked Otoacoustic Emission for Biometrics. Information Forensics and Security, IEEE Transactions on. 9:2291-2301.

Biometrics is attracting increasing attention in privacy and security concerned issues, such as access control and remote financial transaction. However, advanced forgery and spoofing techniques are threatening the reliability of conventional biometric modalities. This has been motivating our investigation of a novel yet promising modality transient evoked otoacoustic emission (TEOAE), which is an acoustic response generated from cochlea after a click stimulus. Unlike conventional modalities that are easily accessible or captured, TEOAE is naturally immune to replay and falsification attacks as a physiological outcome from human auditory system. In this paper, we resort to wavelet analysis to derive the time-frequency representation of such nonstationary signal, which reveals individual uniqueness and long-term reproducibility. A machine learning technique linear discriminant analysis is subsequently utilized to reduce intrasubject variability and further capture intersubject differentiation features. Considering practical application, we also introduce a complete framework of the biometric system in both verification and identification modes. Comparative experiments on a TEOAE data set of biometric setting show the merits of the proposed method. Performance is further improved with fusion of information from both ears.

2015-04-30
El Masri, A., Wechsler, H., Likarish, P., Kang, B.B..  2014.  Identifying users with application-specific command streams. Privacy, Security and Trust (PST), 2014 Twelfth Annual International Conference on. :232-238.

This paper proposes and describes an active authentication model based on user profiles built from user-issued commands when interacting with GUI-based application. Previous behavioral models derived from user issued commands were limited to analyzing the user's interaction with the *Nix (Linux or Unix) command shell program. Human-computer interaction (HCI) research has explored the idea of building users profiles based on their behavioral patterns when interacting with such graphical interfaces. It did so by analyzing the user's keystroke and/or mouse dynamics. However, none had explored the idea of creating profiles by capturing users' usage characteristics when interacting with a specific application beyond how a user strikes the keyboard or moves the mouse across the screen. We obtain and utilize a dataset of user command streams collected from working with Microsoft (MS) Word to serve as a test bed. User profiles are first built using MS Word commands and identification takes place using machine learning algorithms. Best performance in terms of both accuracy and Area under the Curve (AUC) for Receiver Operating Characteristic (ROC) curve is reported using Random Forests (RF) and AdaBoost with random forests.

Hao Wang, Haibin Ouyang, Liqun Gao, Wei Qin.  2014.  Opposition-based learning harmony search algorithm with mutation for solving global optimization problems. Control and Decision Conference (2014 CCDC), The 26th Chinese. :1090-1094.

This paper develops an opposition-based learning harmony search algorithm with mutation (OLHS-M) for solving global continuous optimization problems. The proposed method is different from the original harmony search (HS) in three aspects. Firstly, opposition-based learning technique is incorporated to the process of improvisation to enlarge the algorithm search space. Then, a new modified mutation strategy is instead of the original pitch adjustment operation of HS to further improve the search ability of HS. Effective self-adaptive strategy is presented to fine-tune the key control parameters (e.g. harmony memory consideration rate HMCR, and pitch adjustment rate PAR) to balance the local and global search in the evolution of the search process. Numerical results demonstrate that the proposed algorithm performs much better than the existing improved HS variants that reported in recent literature in terms of the solution quality and the stability.

Yexing Li, Xinye Cai, Zhun Fan, Qingfu Zhang.  2014.  An external archive guided multiobjective evolutionary approach based on decomposition for continuous optimization. Evolutionary Computation (CEC), 2014 IEEE Congress on. :1124-1130.

In this paper, we propose a decomposition based multiobjective evolutionary algorithm that extracts information from an external archive to guide the evolutionary search for continuous optimization problem. The proposed algorithm used a mechanism to identify the promising regions(subproblems) through learning information from the external archive to guide evolutionary search process. In order to demonstrate the performance of the algorithm, we conduct experiments to compare it with other decomposition based approaches. The results validate that our proposed algorithm is very competitive.

2014-09-17
Colbaugh, R., Glass, K..  2013.  Moving target defense for adaptive adversaries. Intelligence and Security Informatics (ISI), 2013 IEEE International Conference on. :50-55.

Machine learning (ML) plays a central role in the solution of many security problems, for example enabling malicious and innocent activities to be rapidly and accurately distinguished and appropriate actions to be taken. Unfortunately, a standard assumption in ML - that the training and test data are identically distributed - is typically violated in security applications, leading to degraded algorithm performance and reduced security. Previous research has attempted to address this challenge by developing ML algorithms which are either robust to differences between training and test data or are able to predict and account for these differences. This paper adopts a different approach, developing a class of moving target (MT) defenses that are difficult for adversaries to reverse-engineer, which in turn decreases the adversaries' ability to generate training/test data differences that benefit them. We leverage the coevolutionary relationship between attackers and defenders to derive a simple, flexible MT defense strategy which is optimal or nearly optimal for a broad range of security problems. Case studies involving two distinct cyber defense applications demonstrate that the proposed MT algorithm outperforms standard static methods, offering effective defense against intelligent, adaptive adversaries.