Biblio
Support vector machines (SVMs) have been widely used for classification in machine learning and data mining. However, SVM faces a huge challenge in large scale classification tasks. Recent progresses have enabled additive kernel version of SVM efficiently solves such large scale problems nearly as fast as a linear classifier. This paper proposes a new accelerated mini-batch stochastic gradient descent algorithm for SVM classification with additive kernel (AK-ASGD). On the one hand, the gradient is approximated by the sum of a scalar polynomial function for each feature dimension; on the other hand, Nesterov's acceleration strategy is used. The experimental results on benchmark large scale classification data sets show that our proposed algorithm can achieve higher testing accuracies and has faster convergence rate.
We are currently witnessing the development of increasingly effective author identification systems (AISs) that have the potential to track users across the internet based on their writing style. In this paper, we discuss two methods for providing user anonymity with respect to writing style: Adversarial Stylometry and Adversarial Authorship. With Adversarial Stylometry, a user attempts to obfuscate their writing style by consciously altering it. With Adversarial Authorship, a user can select an author cluster target (ACT) and write toward this target with the intention of subverting an AIS so that the user's writing sample will be misclassified Our results show that Adversarial Authorship via interactive evolutionary hill-climbing outperforms Adversarial Stylometry.
Honeypots are servers or systems built to mimic critical parts of a network, distracting attackers while logging their information to develop attack profiles. This paper discusses the design and implementation of a honeypot disguised as a REpresentational State Transfer (REST) Application Programming Interface (API). We discuss the motivation for this work, design features of the honeypot, and experimental performance results under various traffic conditions. We also present analyses of both a distributed denial of service (DDoS) attack and a cross-site scripting (XSS) malware insertion attempt against this honeypot.
The article considers the approach to static analysis of program code and the general principles of static analyzer operation. The authors identify the most important syntactic and semantic information in the programs, which can be used to find errors in the source code. The general methodology for development of diagnostic rules is proposed, which will improve the efficiency of static code analyzers.
Testing and fixing Web Application Firewalls (WAFs) are two relevant and complementary challenges for security analysts. Automated testing helps to cost-effectively detect vulnerabilities in a WAF by generating effective test cases, i.e., attacks. Once vulnerabilities have been identified, the WAF needs to be fixed by augmenting its rule set to filter attacks without blocking legitimate requests. However, existing research suggests that rule sets are very difficult to understand and too complex to be manually fixed. In this paper, we formalise the problem of fixing vulnerable WAFs as a combinatorial optimisation problem. To solve it, we propose an automated approach that combines machine learning with multi-objective genetic algorithms. Given a set of legitimate requests and bypassing SQL injection attacks, our approach automatically infers regular expressions that, when added to the WAF's rule set, prevent many attacks while letting legitimate requests go through. Our empirical evaluation based on both open-source and proprietary WAFs shows that the generated filter rules are effective at blocking previously identified and successful SQL injection attacks (recall between 54.6% and 98.3%), while triggering in most cases no or few false positives (false positive rate between 0% and 2%).
Recently, conversation agents have attracted the attention of many companies such as IBM, Facebook, Google, and Amazon which have focused on developing tools or API (Application Programming Interfaces) for developers to create their own chat-bots. In this paper, we focus on new approaches to evaluate such systems presenting some recommendations resulted from evaluating a real chatbot use case. Testing conversational agents or chatbots is not a trivial task due to the multitude aspects/tasks (e.g., natural language understanding, dialog management and, response generation) which must be considered separately and as a mixture. Also, the creation of a general testing tool is a challenge since evaluation is very sensitive to the application context. Finally, exhaustive testing can be a tedious task for the project team what creates a need for a tool to perform it automatically. This paper opens a discussion about how conversational systems testing tools are essential to ensure well-functioning of such systems as well as to help interface designers guiding them to develop consistent conversational interfaces.
Ensuring software security is essential for developing a reliable software. A software can suffer from security problems due to the weakness in code constructs during software development. Our goal is to relate software security with different code constructs so that developers can be aware very early of their coding weaknesses that might be related to a software vulnerability. In this study, we chose Java nano-patterns as code constructs that are method-level patterns defined on the attributes of Java methods. This study aims to find out the correlation between software vulnerability and method-level structural code constructs known as nano-patterns. We found the vulnerable methods from 39 versions of three major releases of Apache Tomcat for our first case study. We extracted nano-patterns from the affected methods of these releases. We also extracted nano-patterns from the non-vulnerable methods of Apache Tomcat, and for this, we selected the last version of three major releases (6.0.45 for release 6, 7.0.69 for release 7 and 8.0.33 for release 8) as the non-vulnerable versions. Then, we compared the nano-pattern distributions in vulnerable versus non-vulnerable methods. In our second case study, we extracted nano-patterns from the affected methods of three vulnerable J2EE web applications: Blueblog 1.0, Personalblog 1.2.6 and Roller 0.9.9, all of which were deliberately made vulnerable for testing purpose. We found that some nano-patterns such as objCreator, staticFieldReader, typeManipulator, looper, exceptions, localWriter, arrReader are more prevalent in affected methods whereas some such as straightLine are more vivid in non-affected methods. We conclude that nano-patterns can be used as the indicator of vulnerability-proneness of code.
We use model-based testing techniques to detect logical vulnerabilities in implementations of the Wi-Fi handshake. This reveals new fingerprinting techniques, multiple downgrade attacks, and Denial of Service (DoS) vulnerabilities. Stations use the Wi-Fi handshake to securely connect with wireless networks. In this handshake, mutually supported capabilities are determined, and fresh pairwise keys are negotiated. As a result, a proper implementation of the Wi-Fi handshake is essential in protecting all subsequent traffic. To detect the presence of erroneous behaviour, we propose a model-based technique that generates a set of representative test cases. These tests cover all states of the Wi-Fi handshake, and explore various edge cases in each state. We then treat the implementation under test as a black box, and execute all generated tests. Determining whether a failed test introduces a security weakness is done manually. We tested 12 implementations using this approach, and discovered irregularities in all of them. Our findings include fingerprinting mechanisms, DoS attacks, and downgrade attacks where an adversary can force usage of the insecure WPA-TKIP cipher. Finally, we explain how one of our downgrade attacks highlights incorrect claims made in the 802.11 standard.
In a world where traditional notions of privacy are increasingly challenged by the myriad companies that collect and analyze our data, it is important that decision-making entities are held accountable for unfair treatments arising from irresponsible data usage. Unfortunately, a lack of appropriate methodologies and tools means that even identifying unfair or discriminatory effects can be a challenge in practice. We introduce the unwarranted associations (UA) framework, a principled methodology for the discovery of unfair, discriminatory, or offensive user treatment in data-driven applications. The UA framework unifies and rationalizes a number of prior attempts at formalizing algorithmic fairness. It uniquely combines multiple investigative primitives and fairness metrics with broad applicability, granular exploration of unfair treatment in user subgroups, and incorporation of natural notions of utility that may account for observed disparities. We instantiate the UA framework in FairTest, the first comprehensive tool that helps developers check data-driven applications for unfair user treatment. It enables scalable and statistically rigorous investigation of associations between application outcomes (such as prices or premiums) and sensitive user attributes (such as race or gender). Furthermore, FairTest provides debugging capabilities that let programmers rule out potential confounders for observed unfair effects. We report on use of FairTest to investigate and in some cases address disparate impact, offensive labeling, and uneven rates of algorithmic error in four data-driven applications. As examples, our results reveal subtle biases against older populations in the distribution of error in a predictive health application and offensive racial labeling in an image tagger.
In this paper, a novel method to do feature selection to detect botnets at their phase of Command and Control (C&C) is presented. A major problem is that researchers have proposed features based on their expertise, but there is no a method to evaluate these features since some of these features could get a lower detection rate than other. To this aim, we find the feature set based on connections of botnets at their phase of C&C, that maximizes the detection rate of these botnets. A Genetic Algorithm (GA) was used to select the set of features that gives the highest detection rate. We used the machine learning algorithm C4.5, this algorithm did the classification between connections belonging or not to a botnet. The datasets used in this paper were extracted from the repositories ISOT and ISCX. Some tests were done to get the best parameters in a GA and the algorithm C4.5. We also performed experiments in order to obtain the best set of features for each botnet analyzed (specific), and for each type of botnet (general) too. The results are shown at the end of the paper, in which a considerable reduction of features and a higher detection rate than the related work presented were obtained.
Hierarchical Graph Neuron (HGN) is an extension of network-centric algorithm called Graph Neuron (GN), which is used to perform parallel distributed pattern recognition. In this research, HGN scheme is used to classify intrusion attacks in computer networks. Patterns of intrusion attacks are preprocessed in three steps: selecting attributes using information gain attribute evaluation, discretizing the selected attributes using entropy-based discretization supervised method, and selecting the training data using K-Means clustering algorithm. After the preprocessing stage, the HGN scheme is then deployed to classify intrusion attack using the KDD Cup 99 dataset. The results of the classification are measured in terms of accuracy rate, detection rate, false positive rate and true negative rate. The test result shows that the HGN scheme is promising and stable in classifying the intrusion attack patterns with accuracy rate reaches 96.27%, detection rate reaches 99.20%, true negative rate below 15.73%, and false positive rate as low as 0.80%.
The Internet of Things (IoT) era envisions billions of interconnected devices capable of providing new interactions between the physical and digital worlds, offering new range of content and services. At the fundamental level, IoT nodes are physical devices that exist in the real world, consisting of networking, sensor, and processing components. Some application examples include mobile and pervasive computing or sensor nets, and require distributed device deployment that feed information into databases for exploitation. While the data can be centralized, there are advantages, such as system resiliency and security to adopting a decentralized architecture that pushes the computation and storage to the network edge and onto IoT devices. However, these devices tend to be much more limited in computation power than traditional racked servers. This research explores using the Cassandra distributed database on IoT-representative device specifications. Experiments conducted on both virtual machines and Raspberry Pi's to simulate IoT devices, examined latency issues with network compression, processing workloads, and various memory and node configurations in laboratory settings. We demonstrate that distributed databases are feasible on Raspberry Pi's as IoT representative devices and show findings that may help in application design.
Android operating system is constantly overwhelmed by new sophisticated threats and new zero-day attacks. While aggressive malware, for instance malicious behaviors able to cipher data files or lock the GUI, are not worried to circumvention users by infection (that can try to disinfect the device), there exist malware with the aim to perform malicious actions stealthy, i.e., trying to not manifest their presence to the users. This kind of malware is less recognizable, because users are not aware of their presence. In this paper we propose FormalDroid, a tool able to detect silent malicious beaviours and to localize the malicious payload in Android application. Evaluating real-world malware samples we obtain an accuracy equal to 0.94.
The security of computer programs and systems is a very critical issue. With the number of attacks launched on computer networks and software, businesses and IT professionals are taking steps to ensure that their information systems are as secure as possible. However, many programmers do not think about adding security to their programs until their projects are near completion. This is a major mistake because a system is as secure as its weakest link. If security is viewed as an afterthought, it is highly likely that the resulting system will have a large number of vulnerabilities, which could be exploited by attackers. One of the reasons programmers overlook adding security to their code is because it is viewed as a complicated or time-consuming process. This paper presents a tool that will help programmers think more about security and add security tactics to their code with ease. We created a model that learns from existing open source projects and documentation using machine learning and text mining techniques. Our tool contains a module that runs in the background to analyze code as the programmer types and offers suggestions of where security could be included. In addition, our tool fetches existing open source implementations of cryptographic algorithms and sample code from repositories to aid programmers in adding security easily to their projects.
There is widening chasm between the ease of creating software and difficulty of "building security in". This paper reviews the approach, the findings and recent experiments from a seven-year effort to enable consistency across a large, diverse development organization and software portfolio via policies, guidance, automated tools and services. Experience shows that developing secure software is an elusive goal for most. It requires every team to know and apply a wide range of security knowledge in the context of what software is being built, how the software will be used, and the projected threats in the environment where the software will operate. The drive for better outcomes for secure development and increased developer productivity led to experiments to augment developer knowledge and eventually realize the goal of "building the right security in".
Smartphones have become the pervasive personal computing platform. Recent years thus have witnessed exponential growth in research and development for secure and usable authentication schemes for smartphones. Several explicit (e.g., PIN-based) and/or implicit (e.g., biometrics-based) authentication methods have been designed and published in the literature. In fact, some of them have been embedded in commercial mobile products as well. However, the published studies report only the brighter side of the proposed scheme(s), e.g., higher accuracy attained by the proposed mechanism. While other associated operational issues, such as computational overhead, robustness to different environmental conditions/attacks, usability, are intentionally or unintentionally ignored. More specifically, most publicly available frameworks did not discuss or explore any other evaluation criterion, usability and environment-related measures except the accuracy under zero-effort. Thus, their baseline operations usually give a false sense of progress. This paper, therefore, presents some guidelines to researchers for designing, implementation, and evaluating smartphone user authentication methods for a positive impact on future technological developments.
Web Application becomes the leading solution for the utilization of systems that need access globally, distributed, cost-effective, as well as the diversity of the content that can run on this technology. At the same time web application security have always been a major issue that must be considered due to the fact that 60% of Internet attacks targeting web application platform. One of the biggest impacts on this technology is Cross Site Scripting (XSS) attack, the most frequently occurred and are always in the TOP 10 list of Open Web Application Security Project (OWASP). Vulnerabilities in this attack occur in the absence of checking, testing, and the attention about secure coding practices. There are several alternatives to prevent the attacks that associated with this threat. Network Intrusion Detection System can be used as one solution to prevent the influence of XSS Attack. This paper investigates the XSS attack recognition and detection using regular expression pattern matching and a preprocessing method. Experiments are conducted on a testbed with the aim to reveal the behaviour of the attack.
This paper investigates practical strategies for distributing payload across images with content-adaptive steganography and for pooling outputs of a single-image detector for steganalysis. Adopting a statistical model for the detector's output, the steganographer minimizes the power of the most powerful detector of an omniscient Warden, while the Warden, informed by the payload spreading strategy, detects with the likelihood ratio test in the form of a matched filter. Experimental results with state-of-the-art content-adaptive additive embedding schemes and rich models are included to show the relevance of the results.
There are several security requirements identification methods proposed by researchers in up-front requirements engineering (RE). However, in open source software (OSS) projects, developers use lightweight representation and refine requirements frequently by writing comments. They also tend to discuss security aspect in comments by providing code snippets, attachments, and external resource links. Since most security requirements identification methods in up-front RE are based on textual information retrieval techniques, these methods are not suitable for OSS projects or just-in-time RE. In our study, we propose a new model based on logistic regression to identify security requirements in OSS projects. We used five metrics to build security requirements identification models and tested the performance of these metrics by applying those models to three OSS projects. Our results show that four out of five metrics achieved high performance in intra-project testing.
In this paper we present a case study of applying fitness dimensions in API design assessment. We argue that API assessment is company specific and should take into consideration various stakeholders in the API ecosystem. We identified new fitness dimensions and introduced the notion of design considerations for fitness dimensions such as priorities, tradeoffs, and technical versus cognitive classification.
Software security is an important aspect of ensuring software quality. The goal of this study is to help developers evaluate software security using traceable patterns and software metrics during development. The concept of traceable patterns is similar to design patterns but they can be automatically recognized and extracted from source code. If these patterns can better predict vulnerable code compared to traditional software metrics, they can be used in developing a vulnerability prediction model to classify code as vulnerable or not. By analyzing and comparing the performance of traceable patterns with metrics, we propose a vulnerability prediction model. This study explores the performance of some code patterns in vulnerability prediction and compares them with traditional software metrics. We use the findings to build an effective vulnerability prediction model. We evaluate security vulnerabilities reported for Apache Tomcat, Apache CXF and three stand-alone Java web applications. We use machine learning and statistical techniques for predicting vulnerabilities using traceable patterns and metrics as features. We found that patterns have a lower false negative rate and higher recall in detecting vulnerable code than the traditional software metrics.
In-vehicle networks like Controller Area Network, FlexRay, Ethernet are now subjected to huge security threats where unauthorized entities can take control of the whole vehicle. This can pose very serious threats including accidents. Security features like encryption, message authentication are getting implemented in vehicle networks to counteract these issues. This paper is proposing a set of novel validation techniques to ensure that vehicle network security is fool proof. Security validation against requirements, security validation using white box approach, black box approach and grey box approaches are put forward. Test system architecture, validation of message authentication, decoding the patterns from vehicle network data, using diagnostics as a security loophole, V2V V2X loopholes, gateway module security testing are considered in detail. Aim of this research paper is to put forward a set of tools and methods for finding and reporting any security loopholes in the in-vehicle network security implementation.
This paper describes a novel aerospace electronic component risk assessment methodology and supporting virtual laboratory structure designed to augment existing supply chain management practices and aid in Microelectronics Trust Assurance. This toolkit and methodology applies structure to the unclear and evolving risk assessment problem, allowing quantification of key risks affecting both advanced and obsolete systems that rely on semiconductor technologies. The impacts of logistics & supply chain risk, technology & counterfeit risk, and faulty component risk on trusted and non-trusted procurement options are quantified. The benefits of component testing on part reliability are assessed and incorporated into counterfeit mitigation calculations. This toolkit and methodology seek to assist acquisition staff by providing actionable decision data regarding the increasing threat of counterfeit components by assessing the risks faced by systems, identifying mitigation strategies to reduce this risk, and resolving these risks through the optimal test and procurement path based on the component criticality risk tolerance of the program.