Visible to the public Biblio

Found 765 results

Filters: Keyword is Training  [Clear All Filters]
2015-05-06
Chongxi Bao, Forte, D., Srivastava, A..  2014.  On application of one-class SVM to reverse engineering-based hardware Trojan detection. Quality Electronic Design (ISQED), 2014 15th International Symposium on. :47-54.

Due to design and fabrication outsourcing to foundries, the problem of malicious modifications to integrated circuits known as hardware Trojans has attracted attention in academia as well as industry. To reduce the risks associated with Trojans, researchers have proposed different approaches to detect them. Among these approaches, test-time detection approaches have drawn the greatest attention and most approaches assume the existence of a “golden model”. Prior works suggest using reverse-engineering to identify such Trojan-free ICs for the golden model but they did not state how to do this efficiently. In this paper, we propose an innovative and robust reverseengineering approach to identify the Trojan-free ICs. We adapt a well-studied machine learning method, one-class support vector machine, to solve our problem. Simulation results using state-of-the-art tools on several publicly available circuits show that our approach can detect hardware Trojans with high accuracy rate across different modeling and algorithm parameters.

Stevanovic, M., Pedersen, J.M..  2014.  An efficient flow-based botnet detection using supervised machine learning. Computing, Networking and Communications (ICNC), 2014 International Conference on. :797-801.

Botnet detection represents one of the most crucial prerequisites of successful botnet neutralization. This paper explores how accurate and timely detection can be achieved by using supervised machine learning as the tool of inferring about malicious botnet traffic. In order to do so, the paper introduces a novel flow-based detection system that relies on supervised machine learning for identifying botnet network traffic. For use in the system we consider eight highly regarded machine learning algorithms, indicating the best performing one. Furthermore, the paper evaluates how much traffic needs to be observed per flow in order to capture the patterns of malicious traffic. The proposed system has been tested through the series of experiments using traffic traces originating from two well-known P2P botnets and diverse non-malicious applications. The results of experiments indicate that the system is able to accurately and timely detect botnet traffic using purely flow-based traffic analysis and supervised machine learning. Additionally, the results show that in order to achieve accurate detection traffic flows need to be monitored for only a limited time period and number of packets per flow. This indicates a strong potential of using the proposed approach within a future on-line detection framework.

Tuia, D., Munoz-Mari, J., Rojo-Alvarez, J.L., Martinez-Ramon, M., Camps-Valls, G..  2014.  Explicit Recursive and Adaptive Filtering in Reproducing Kernel Hilbert Spaces. Neural Networks and Learning Systems, IEEE Transactions on. 25:1413-1419.

This brief presents a methodology to develop recursive filters in reproducing kernel Hilbert spaces. Unlike previous approaches that exploit the kernel trick on filtered and then mapped samples, we explicitly define the model recursivity in the Hilbert space. For that, we exploit some properties of functional analysis and recursive computation of dot products without the need of preimaging or a training dataset. We illustrate the feasibility of the methodology in the particular case of the γ-filter, which is an infinite impulse response filter with controlled stability and memory depth. Different algorithmic formulations emerge from the signal model. Experiments in chaotic and electroencephalographic time series prediction, complex nonlinear system identification, and adaptive antenna array processing demonstrate the potential of the approach for scenarios where recursivity and nonlinearity have to be readily combined.

Huang, T., Drake, B., Aalfs, D., Vidakovic, B..  2014.  Nonlinear Adaptive Filtering with Dimension Reduction in the Wavelet Domain. Data Compression Conference (DCC), 2014. :408-408.

Recent advances in adaptive filter theory and the hardware for signal acquisition have led to the realization that purely linear algorithms are often not adequate in these domains. Nonlinearities in the input space have become apparent with today's real world problems. Algorithms that process the data must keep pace with the advances in signal acquisition. Recently kernel adaptive (online) filtering algorithms have been proposed that make no assumptions regarding the linearity of the input space. Additionally, advances in wavelet data compression/dimension reduction have also led to new algorithms that are appropriate for producing a hybrid nonlinear filtering framework. In this paper we utilize a combination of wavelet dimension reduction and kernel adaptive filtering. We derive algorithms in which the dimension of the data is reduced by a wavelet transform. We follow this by kernel adaptive filtering algorithms on the reduced-domain data to find the appropriate model parameters demonstrating improved minimization of the mean-squared error (MSE). Another important feature of our methods is that the wavelet filter is also chosen based on the data, on-the-fly. In particular, it is shown that by using a few optimal wavelet coefficients from the constructed wavelet filter for both training and testing data sets as the input to the kernel adaptive filter, convergence to the near optimal learning curve (MSE) results. We demonstrate these algorithms on simulated and a real data set from food processing.

2015-05-05
Baughman, A.K., Chuang, W., Dixon, K.R., Benz, Z., Basilico, J..  2014.  DeepQA Jeopardy! Gamification: A Machine-Learning Perspective. Computational Intelligence and AI in Games, IEEE Transactions on. 6:55-66.

DeepQA is a large-scale natural language processing (NLP) question-and-answer system that responds across a breadth of structured and unstructured data, from hundreds of analytics that are combined with over 50 models, trained through machine learning. After the 2011 historic milestone of defeating the two best human players in the Jeopardy! game show, the technology behind IBM Watson, DeepQA, is undergoing gamification into real-world business problems. Gamifying a business domain for Watson is a composite of functional, content, and training adaptation for nongame play. During domain gamification for medical, financial, government, or any other business, each system change affects the machine-learning process. As opposed to the original Watson Jeopardy!, whose class distribution of positive-to-negative labels is 1:100, in adaptation the computed training instances, question-and-answer pairs transformed into true-false labels, result in a very low positive-to-negative ratio of 1:100 000. Such initial extreme class imbalance during domain gamification poses a big challenge for the Watson machine-learning pipelines. The combination of ingested corpus sets, question-and-answer pairs, configuration settings, and NLP algorithms contribute toward the challenging data state. We propose several data engineering techniques, such as answer key vetting and expansion, source ingestion, oversampling classes, and question set modifications to increase the computed true labels. In addition, algorithm engineering, such as an implementation of the Newton-Raphson logistic regression with a regularization term, relaxes the constraints of class imbalance during training adaptation. We conclude by empirically demonstrating that data and algorithm engineering are complementary and indispensable to overcome the challenges in this first Watson gamification for real-world business problems.

SHAR, L., Briand, L., Tan, H..  2014.  Web Application Vulnerability Prediction using Hybrid Program Analysis and Machine Learning. Dependable and Secure Computing, IEEE Transactions on. PP:1-1.

Due to limited time and resources, web software engineers need support in identifying vulnerable code. A practical approach to predicting vulnerable code would enable them to prioritize security auditing efforts. In this paper, we propose using a set of hybrid (static+dynamic) code attributes that characterize input validation and input sanitization code patterns and are expected to be significant indicators of web application vulnerabilities. Because static and dynamic program analyses complement each other, both techniques are used to extract the proposed attributes in an accurate and scalable way. Current vulnerability prediction techniques rely on the availability of data labeled with vulnerability information for training. For many real world applications, past vulnerability data is often not available or at least not complete. Hence, to address both situations where labeled past data is fully available or not, we apply both supervised and semi-supervised learning when building vulnerability predictors based on hybrid code attributes. Given that semi-supervised learning is entirely unexplored in this domain, we describe how to use this learning scheme effectively for vulnerability prediction. We performed empirical case studies on seven open source projects where we built and evaluated supervised and semi-supervised models. When cross validated with fully available labeled data, the supervised models achieve an average of 77 percent recall and 5 percent probability of false alarm for predicting SQL injection, cross site scripting, remote code execution and file inclusion vulnerabilities. With a low amount of labeled data, when compared to the supervised model, the semi-supervised model showed an average improvement of 24 percent higher recall and 3 percent lower probability of false alarm, thus suggesting semi-supervised learning may be a preferable solution for many real world applications where vulnerability data is missing.
 

2015-05-04
Okuno, S., Asai, H., Yamana, H..  2014.  A challenge of authorship identification for ten-thousand-scale microblog users. Big Data (Big Data), 2014 IEEE International Conference on. :52-54.

Internet security issues require authorship identification for all kinds of internet contents; however, authorship identification for microblog users is much harder than other documents because microblog texts are too short. Moreover, when the number of candidates becomes large, i.e., big data, it will take long time to identify. Our proposed method solves these problems. The experimental results show that our method successfully identifies the authorship with 53.2% of precision out of 10,000 microblog users in the almost half execution time of previous method.
 

Marukatat, R., Somkiadcharoen, R., Nalintasnai, R., Aramboonpong, T..  2014.  Authorship Attribution Analysis of Thai Online Messages. Information Science and Applications (ICISA), 2014 International Conference on. :1-4.

This paper presents a framework to identify the authors of Thai online messages. The identification is based on 53 writing attributes and the selected algorithms are support vector machine (SVM) and C4.5 decision tree. Experimental results indicate that the overall accuracies achieved by the SVM and the C4.5 were 79% and 75%, respectively. This difference was not statistically significant (at 95% confidence interval). As for the performance of identifying individual authors, in some cases the SVM was clearly better than the C4.5. But there were also other cases where both of them could not distinguish one author from another.

Pratanwanich, N., Lio, P..  2014.  Who Wrote This? Textual Modeling with Authorship Attribution in Big Data Data Mining Workshop (ICDMW), 2014 IEEE International Conference on. :645-652.

By representing large corpora with concise and meaningful elements, topic-based generative models aim to reduce the dimension and understand the content of documents. Those techniques originally analyze on words in the documents, but their extensions currently accommodate meta-data such as authorship information, which has been proved useful for textual modeling. The importance of learning authorship is to extract author interests and assign authors to anonymous texts. Author-Topic (AT) model, an unsupervised learning technique, successfully exploits authorship information to model both documents and author interests using topic representations. However, the AT model simplifies that each author has equal contribution on multiple-author documents. To overcome this limitation, we assumes that authors give different degrees of contributions on a document by using a Dirichlet distribution. This automatically transforms the unsupervised AT model to Supervised Author-Topic (SAT) model, which brings a novelty of authorship prediction on anonymous texts. The SAT model outperforms the AT model for identifying authors of documents written by either single authors or multiple authors with a better Receiver Operating Characteristic (ROC) curve and a significantly higher Area Under Curve (AUC). The SAT model not only achieves competitive performance to state-of-the-art techniques e.g. Random forests but also maintains the characteristics of the unsupervised models for information discovery i.e. Word distributions of topics, author interests, and author contributions.
 

Luque, J., Anguera, X..  2014.  On the modeling of natural vocal emotion expressions through binary key. Signal Processing Conference (EUSIPCO), 2014 Proceedings of the 22nd European. :1562-1566.

This work presents a novel method to estimate natural expressed emotions in speech through binary acoustic modeling. Standard acoustic features are mapped to a binary value representation and a support vector regression model is used to correlate them with the three-continuous emotional dimensions. Three different sets of speech features, two based on spectral parameters and one on prosody are compared on the VAM corpus, a set of spontaneous dialogues from a German TV talk-show. The regression analysis, in terms of correlation coefficient and mean absolute error, show that the binary key modeling is able to successfully capture speaker emotion characteristics. The proposed algorithm obtains comparable results to those reported on the literature while it relies on a much smaller set of acoustic descriptors. Furthermore, we also report on preliminary results based on the combination of the binary models, which brings further performance improvements.

Yuxi Liu, Hatzinakos, D..  2014.  Human acoustic fingerprints: A novel biometric modality for mobile security. Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. :3784-3788.

Recently, the demand for more robust protection against unauthorized use of mobile devices has been rapidly growing. This paper presents a novel biometric modality Transient Evoked Otoacoustic Emission (TEOAE) for mobile security. Prior works have investigated TEOAE for biometrics in a setting where an individual is to be identified among a pre-enrolled identity gallery. However, this limits the applicability to mobile environment, where attacks in most cases are from imposters unknown to the system before. Therefore, we employ an unsupervised learning approach based on Autoencoder Neural Network to tackle such blind recognition problem. The learning model is trained upon a generic dataset and used to verify an individual in a random population. We also introduce the framework of mobile biometric system considering practical application. Experiments show the merits of the proposed method and system performance is further evaluated by cross-validation with an average EER 2.41% achieved.

2015-04-30
Frauenstein, E.D., Von Solms, R..  2014.  Combatting phishing: A holistic human approach. Information Security for South Africa (ISSA), 2014. :1-10.

Phishing continues to remain a lucrative market for cyber criminals, mostly because of the vulnerable human element. Through emails and spoofed-websites, phishers exploit almost any opportunity using major events, considerable financial awards, fake warnings and the trusted reputation of established organizations, as a basis to gain their victims' trust. For many years, humans have often been referred to as the `weakest link' towards protecting information. To gain their victims' trust, phishers continue to use sophisticated looking emails and spoofed websites to trick them, and rely on their victims' lack of knowledge, lax security behavior and organizations' inadequate security measures towards protecting itself and their clients. As such, phishing security controls and vulnerabilities can arguably be classified into three main elements namely human factors (H), organizational aspects (O) and technological controls (T). All three of these elements have the common feature of human involvement and as such, security gaps are inevitable. Each element also functions as both security control and security vulnerability. A holistic framework towards combatting phishing is required whereby the human feature in all three of these elements is enhanced by means of a security education, training and awareness programme. This paper discusses the educational factors required to form part of a holistic framework, addressing the HOT elements as well as the relationships between these elements towards combatting phishing. The development of this framework uses the principles of design science to ensure that it is developed with rigor. Furthermore, this paper reports on the verification of the framework.

Katkar, V.D., Bhatia, D.S..  2014.  Lightweight approach for detection of denial of service attacks using numeric to binary preprocessing. Circuits, Systems, Communication and Information Technology Applications (CSCITA), 2014 International Conference on. :207-212.


Denial of Service (DoS) and Distributed Denial of Service (DDoS) attack, exhausts the resources of server/service and makes it unavailable for legitimate users. With increasing use of online services and attacks on these services, the importance of Intrusion Detection System (IDS) for detection of DoS/DDoS attacks has also grown. Detection accuracy & CPU utilization of Data mining based IDS is directly proportional to the quality of training dataset used to train it. Various preprocessing methods like normalization, discretization, fuzzification are used by researchers to improve the quality of training dataset. This paper evaluates the effect of various data preprocessing methods on the detection accuracy of DoS/DDoS attack detection IDS and proves that numeric to binary preprocessing method performs better compared to other methods. Experimental results obtained using KDD 99 dataset are provided to support the efficiency of proposed combination.
 

2014-09-26
Rossow, C., Dietrich, C.J., Grier, C., Kreibich, C., Paxson, V., Pohlmann, N., Bos, H., van Steen, M..  2012.  Prudent Practices for Designing Malware Experiments: Status Quo and Outlook. Security and Privacy (SP), 2012 IEEE Symposium on. :65-79.

Malware researchers rely on the observation of malicious code in execution to collect datasets for a wide array of experiments, including generation of detection models, study of longitudinal behavior, and validation of prior research. For such research to reflect prudent science, the work needs to address a number of concerns relating to the correct and representative use of the datasets, presentation of methodology in a fashion sufficiently transparent to enable reproducibility, and due consideration of the need not to harm others. In this paper we study the methodological rigor and prudence in 36 academic publications from 2006-2011 that rely on malware execution. 40% of these papers appeared in the 6 highest-ranked academic security conferences. We find frequent shortcomings, including problematic assumptions regarding the use of execution-driven datasets (25% of the papers), absence of description of security precautions taken during experiments (71% of the articles), and oftentimes insufficient description of the experimental setup. Deficiencies occur in top-tier venues and elsewhere alike, highlighting a need for the community to improve its handling of malware datasets. In the hope of aiding authors, reviewers, and readers, we frame guidelines regarding transparency, realism, correctness, and safety for collecting and using malware datasets.

2014-09-17
Colbaugh, R., Glass, K..  2013.  Moving target defense for adaptive adversaries. Intelligence and Security Informatics (ISI), 2013 IEEE International Conference on. :50-55.

Machine learning (ML) plays a central role in the solution of many security problems, for example enabling malicious and innocent activities to be rapidly and accurately distinguished and appropriate actions to be taken. Unfortunately, a standard assumption in ML - that the training and test data are identically distributed - is typically violated in security applications, leading to degraded algorithm performance and reduced security. Previous research has attempted to address this challenge by developing ML algorithms which are either robust to differences between training and test data or are able to predict and account for these differences. This paper adopts a different approach, developing a class of moving target (MT) defenses that are difficult for adversaries to reverse-engineer, which in turn decreases the adversaries' ability to generate training/test data differences that benefit them. We leverage the coevolutionary relationship between attackers and defenders to derive a simple, flexible MT defense strategy which is optimal or nearly optimal for a broad range of security problems. Case studies involving two distinct cyber defense applications demonstrate that the proposed MT algorithm outperforms standard static methods, offering effective defense against intelligent, adaptive adversaries.