Biblio
Defect prediction is an active topic in software quality assurance, which can help developers find potential bugs and make better use of resources. To improve prediction performance, this paper introduces cross-entropy, one common measure for natural language, as a new code metric into defect prediction tasks and proposes a framework called DefectLearner for this process. We first build a recurrent neural network language model to learn regularities in source code from software repository. Based on the trained model, the cross-entropy of each component can be calculated. To evaluate the discrimination for defect-proneness, cross-entropy is compared with 20 widely used metrics on 12 open-source projects. The experimental results show that cross-entropy metric is more discriminative than 50% of the traditional metrics. Besides, we combine cross-entropy with traditional metric suites together for accurate defect prediction. With cross-entropy added, the performance of prediction models is improved by an average of 2.8% in F1-score.
Intellectual property (IP) and integrated circuit (IC) piracy are of increasing concern to IP/IC providers because of the globalization of IC design flow and supply chains. Such globalization is driven by the cost associated with the design, fabrication, and testing of integrated circuits and allows avenues for piracy. To protect the designs against IC piracy, we propose a fingerprinting scheme based on side-channel power analysis and machine learning methods. The proposed method distinguishes the ICs which realize a modified netlist, yet same functionality. Our method doesn't imply any hardware overhead. We specifically focus on the ability to detect minimal design variations, as quantified by the number of logic gates changed. Accuracy of the proposed scheme is greater than 96 percent, and typically 99 percent in detecting one or more gate-level netlist changes. Additionally, the effect of temperature has been investigated as part of this work. Results depict 95.4 percent accuracy in detecting the exact number of gate changes when data and classifier use the same temperature, while training with different temperatures results in 33.6 percent accuracy. This shows the effectiveness of building temperature-dependent classifiers from simulations at known operating temperatures.
Often, analysts have to face a challenging situation when formally verifying the implementation of a security protocol: they need to build a model of the protocol from only poorly or not documented code, and with little or no help from the developers to better understand it. Security protocols implementations frequently use services provided by libraries coded in the C programming language; automatic tools for codelevel reverse engineering offer good support to comprehend the behavior of code in object-oriented languages but are ineffective to deal with libraries in C. Here we propose a systematic, yet human-dependent approach, which combines the capabilities of state-of-the-art tools in order to help the analyst to retrieve, step by step, the security protocol specifications from a library in C. Those specifications can then be used to create the formal model needed to carry out the analysis.
Data dependency flow have been reformulated as Context Free Grammar (CFG) reachability problem, and the idea was explored in detection of some web vulnerabilities, particularly Cross Site Scripting (XSS) and Access Control. However, reformulation of SQL Injection Vulnerability (SQLIV) detection as grammar reachability problem has not been investigated. In this paper, concepts of data dependency flow was used to reformulate SQLIVs detection as a CFG reachability problem. The paper, consequently defines reachability analysis strategy for SQLIVs detection.
This paper describes an approach to detecting malicious code introduced by insiders, which can compromise the data integrity in a program. The approach identifies security spots in a program, which are either malicious code or benign code. Malicious code is detected by reviewing each security spot to determine whether it is malicious or benign. The integrity breach conditions (IBCs) for object-oriented programs are specified to identify security spots in the programs. The IBCs are specified by means of the concepts of coupling within an object or between objects. A prototype tool is developed to validate the approach with a case study.
Software insecurity is being identified as one of the leading causes of security breaches. In this paper, we revisited one of the strategies in solving software insecurity, which is the use of software quality metrics. We utilized a multilayer deep feedforward network in examining whether there is a combination of metrics that can predict the appearance of security-related bugs. We also applied the traditional machine learning algorithms such as decision tree, random forest, naïve bayes, and support vector machines and compared the results with that of the Deep Learning technique. The results have successfully demonstrated that it was possible to develop an effective predictive model to forecast software insecurity based on the software metrics and using Deep Learning. All the models generated have shown an accuracy of more than sixty percent with Deep Learning leading the list. This finding proved that utilizing Deep Learning methods and a combination of software metrics can be tapped to create a better forecasting model thereby aiding software developers in predicting security bugs.
The papers in this special section explore recent advancements in parallel graph processing. In the sphere of modern data science and data-driven applications, graph algorithms have achieved a pivotal place in advancing the state of scientific discovery and knowledge. Nearly three centuries of ideas have made graph theory and its applications a mature area in computational sciences. Yet, today we find ourselves at a crossroads between theory and application. Spurred by the digital revolution, data from a diverse range of high throughput channels and devices, from across internet-scale applications, are starting to mark a new era in data-driven computing and discovery. Building robust graph models and implementing scalable graph application frameworks in the context of this new era are proving to be significant challenges. Concomitant to the digital revolution, we have also experienced an explosion in computing architectures, with a broad range of multicores, manycores, heterogeneous platforms, and hardware accelerators (CPUs, GPUs) being actively developed and deployed within servers and multinode clusters. Recent advances have started to show that in more than one way, these two fields—graph theory and architectures–are capable of benefiting and in fact spurring new research directions in one another. This special section is aimed at introducing some of the new avenues of cutting-edge research happening at the intersection of graph algorithm design and their implementation on advanced parallel architectures.
Increasing number of Internet-scale applications, such as video streaming, incur huge amount of wide area traffic. Such traffic over the unreliable Internet without bandwidth guarantee suffers unpredictable network performance. This result, however, is unappealing to the application providers. Fortunately, Internet giants like Google and Microsoft are increasingly deploying their private wide area networks (WANs) to connect their global datacenters. Such high-speed private WANs are reliable, and can provide predictable network performance. In this paper, we propose a new type of service-inter-datacenter network as a service (iDaaS), where traditional application providers can reserve bandwidth from those Internet giants to guarantee their wide area traffic. Specifically, we design a bandwidth trading market among multiple iDaaS providers and application providers, and concentrate on the essential bandwidth pricing problem. The involved challenging issue is that the bandwidth price of each iDaaS provider is not only influenced by other iDaaS providers, but also affected by the application providers. To address this issue, we characterize the interaction between iDaaS providers and application providers using a Stackelberg game model, and analyze the existence and uniqueness of the equilibrium. We further present an efficient bandwidth pricing algorithm by blending the advantage of a geometrical Nash bargaining solution and the demand segmentation method. For comparison, we present two bandwidth reservation algorithms, where each iDaaS provider's bandwidth is reserved in a weighted fair manner and a max-min fair manner, respectively. Finally, we conduct comprehensive trace-driven experiments. The evaluation results show that our proposed algorithms not only ensure the revenue of iDaaS providers, but also provide bandwidth guarantee for application providers with lower bandwidth price per unit.
Machine learning (ML) algorithms provide a good solution for many security sensitive applications, they themselves, however, face the threats of adversary attacks. As a key problem in machine learning, how to design robust feature selection algorithms against these attacks becomes a hot issue. The current researches on defending evasion attacks mainly focus on wrapped adversarial feature selection algorithm, i.e., WAFS, which is dependent on the classification algorithms, and time cost is very high for large-scale data. Since mRMR (minimum Redundancy and Maximum Relevance) algorithm is one of the most popular filter algorithms for feature selection without considering any classifier during feature selection process. In this paper, we propose a novel adversary-aware feature selection algorithm under filter model based on mRMR, named FAFS. The algorithm, on the one hand, takes the correlation between a single feature and a label, and the redundancy between features into account; on the other hand, when selecting features, it not only considers the generalization ability in the absence of attack, but also the robustness under attack. The performance of four algorithms, i.e., mRMR, TWFS (Traditional Wrapped Feature Selection algorithm), WAFS, and FAFS is evaluated on spam filtering and PDF malicious detection in the Perfect Knowledge attack scenarios. The experiment results show that FAFS has a better performance under evasion attacks with less time complexity, and comparable classification accuracy.
Collaborative filtering (CF) has made it possible to build personalized recommendation models leveraging the collective data of large user groups, albeit with prescribed models that cannot easily leverage the existence of known behavioral models in particular settings. In this paper, we facilitate the combination of CF with existing behavioral models by introducing Bayesian Behavioral Collaborative Filtering (BBCF). BBCF works by embedding arbitrary (black-box) probabilistic models of human behavior in a latent variable Bayesian framework capable of collectively leveraging behavioral models trained on all users for personalized recommendation. There are three key advantages of BBCF compared to traditional CF and non-CF methods: (1) BBCF can leverage highly specialized behavioral models for specific CF use cases that may outperform existing generic models used in standard CF, (2) the behavioral models used in BBCF may offer enhanced intepretability and explainability compared to generic CF methods, and (3) compared to non-CF methods that would train a behavioral model per specific user and thus may suffer when individual user data is limited, BBCF leverages the data of all users thus enabling strong performance across the data availability spectrum including the near cold-start case. Experimentally, we compare BBCF to individual and global behavioral models as well as CF techniques; our evaluation domains span sequential and non-sequential tasks with a range of behavioral models for individual users, tasks, or goal-oriented behavior. Our results demonstrate that BBCF is competitive if not better than existing methods while still offering the interpretability and explainability benefits intrinsic to many behavioral models.
Predicting software faults before software testing activities can help rational distribution of time and resources. Software metrics are used for software fault prediction due to their close relationship with software faults. Thanks to the non-linear fitting ability, Neural networks are increasingly used in the prediction model. We first filter metric set of the embedded software by statistical methods to reduce the dimensions of model input. Then we build a back propagation neural network with simple structure but good performance and apply it to two practical embedded software projects. The verification results show that the model has good ability to predict software faults.
Testing which is an indispensable part of software engineering is itself an art and science which emerged as a discipline over a period. On testing, if defects are found, testers diminish the risk by providing the awareness of defects and solutions to deal with them before release. If testing does not find any defects, testing assure that under certain conditions the system functions correctly. To guarantee that enough testing has been done, major risk areas need to be tested. We have to identify the risks, analyse and control them. We need to categorize the risk items to decide the extent of testing to be covered. Also, Implementation of structured metrics is lagging in software testing. Efficient metrics are necessary to evaluate, manage the testing process and make testing a part of engineering discipline. This paper proposes the usage of risk based testing using FMEA technique and provides an ideal set of metrics which provides a way to ensure effective testing process.
Cisco Adaptive Security Appliance (ASA) 5500 Series Firewall is amongst the most popular and technically advanced for securing organisational networks and systems. One of its most valuable features is its threat detection function which is available on every version of the firewall running a software version of 8.0(2) or higher. Threat detection operates at layers 3 and 4 to determine a baseline for network traffic, analysing packet drop statistics and generating threat reports based on traffic patterns. Despite producing a large volume of statistical information relating to several security events, further effort is required to mine and visually report more significant information and conclude the security status of the network. There are several commercial off-the-shelf tools available to undertake this task, however, they are expensive and may require a cloud subscription. Furthermore, if the information transmitted over the network is sensitive or requires confidentiality, the involvement of a third party or a third-party tool may place organisational security at risk. Therefore, this paper presents a fuzzy logic aided intelligent threat detection solution, which is a cost-free, intuitive and comprehensible solution, enhancing and simplifying the threat detection process for all. In particular, it employs a fuzzy reasoning system based on the threat detection statistics, and presents results/threats through a developed dashboard user interface, for ease of understanding for administrators and users. The paper further demonstrates the successful utilisation of a fuzzy reasoning system for selected and prioritised security events in basic threat detection, although it can be extended to encompass more complex situations, such as complete basic threat detection, advanced threat detection, scanning threat detection, and customised feature based threat detection.
The power of artificial neural networks to form predictive models for phenomenon that exhibit non-linear relationships is a given fact. Despite this advantage, artificial neural networks are known to suffer drawbacks such as long training times and computational intensity. The researchers propose a two-tiered approach to enhance the learning performance of artificial neural networks for phenomenon with time series where data exhibits predictable changes that occur every calendar year. This paper focuses on the initial results of the first phase of the proposed algorithm which incorporates clustering and classification prior to application of the backpropagation algorithm. The 2016–2017 zonal load data of France is used as the data set. K-means is chosen as the clustering algorithm and a comparison is made between Naïve Bayes and k-Nearest Neighbors to determine the better classifier for this data set. The initial results show that electrical load behavior is not necessarily reflective of calendar clustering even without using the min-max temperature recorded during the inclusive months. Simulating the day-type classification process using one cluster, initial results show that the k-nearest neighbors outperforms the Naïve Bayes classifier for this data set and that the best feature to be used for classification into day type is the daily min-max load. These classified load data is expected to reduce training time and improve the overall performance of short-term load demand predictive models in a future paper.
Diversity has been identified as one of the key dimensions of recommendation utility that should be considered besides the overall accuracy of the system. A common diversification approach is to rerank results produced by a baseline recommendation engine according to a diversification criterion. The intent-aware framework is one of the frameworks that has been proposed for recommendations diversification. It assumes existence of a set of aspects associated with items, which also represent user intentions, and the framework promotes diversity across the aspects to address user expectations more accurately. In this paper we consider item-based collaborative filtering and suggest that the traditional view of item similarity is lacking a user perspective. We argue that user preferences towards different aspects should be reflected in recommendations produced by the system. We incorporate the intent-aware framework into the item-based recommendation algorithm by injecting personalised intent-aware covariance into the item similarity measure, and explore the impact of such change on the performance of the algorithm. Our experiments show that the proposed method improves both accuracy and diversity of recommendations, offering better accuracy/diversity tradeoff than existing solutions.
To add more functionality and enhance usability of web applications, JavaScript (JS) is frequently used. Even with many advantages and usefulness of JS, an annoying fact is that many recent cyberattacks such as drive-by-download attacks exploit vulnerability of JS codes. In general, malicious JS codes are not easy to detect, because they sneakily exploit vulnerabilities of browsers and plugin software, and attack visitors of a web site unknowingly. To protect users from such threads, the development of an accurate detection system for malicious JS is soliciting. Conventional approaches often employ signature and heuristic-based methods, which are prone to suffer from zero-day attacks, i.e., causing many false negatives and/or false positives. For this problem, this paper adopts a machine-learning approach to feature learning called Doc2Vec, which is a neural network model that can learn context information of texts. The extracted features are given to a classifier model (e.g., SVM and neural networks) and it judges the maliciousness of a JS code. In the performance evaluation, we use the D3M Dataset (Drive-by-Download Data by Marionette) for malicious JS codes and JSUPACK for benign ones for both training and test purposes. We then compare the performance to other feature learning methods. Our experimental results show that the proposed Doc2Vec features provide better accuracy and fast classification in malicious JS code detection compared to conventional approaches.
Cyber-Physical Systems (CPS) is mostly deployed in security-critical applications where their failures can cause serious consequences, and therefore it is critical to evaluate its availability. In this paper, an architecture model of CPS is established from the perspective of object-oriented system. The system is a unified whole formed by various independent objects (including sensors, controllers and actuators) through communication connection. Then the paper presents the Object-oriented Timed Petri Net to model the system. The modeling method can be used to describe the whole system and the characteristics of the object. At the same time, the availability analysis of the system is carried out by using the mathematical analysis method and simulation tool of Petri net. Finally, a concrete case is given to verify the feasibility of the modeling method in CPS availability analysis.
With rapid advances in the fields of the Internet of Things and autonomous systems, the network security of cyber-physical systems(CPS) becomes more and more important. This paper focuses on the real-time security evaluation for unmanned aircraft systems which are cyber-physical systems relying on information communication and control system to achieve autonomous decision making. Our problem formulation is motivated by scenarios involving autonomous unmanned aerial vehicles(UAVs) working continuously under data-driven attacks when in an open, uncertain, and even hostile environment. Firstly, we investigated the state estimation method in CPS integrated with data-driven attacks model, and then proposed a real-time security scoring algorithm to evaluate the security condition of unmanned aircraft systems under different threat patterns, considering the vulnerability of the systems and consequences brought by data attacks. Our simulation in a UAV illustrated the efficiency and reliability of the algorithm.