Biblio
Software Quality Testing has always been a crucial part of the software development process and lately, there has been a rise in the usage of testing applications. While a well-planned and performed test, regardless of its nature - automated or manual, is a key factor when deciding on the results of the test, it is often not enough to give a more deep and thorough view of the whole process. That can be achieved with properly selected software metrics that can be used for proper risk assessment and evaluation of the development.This paper considers the most commonly used metrics when measuring a performed test and examines metrics that can be applied in the development process.
The usage of connected devices and their role within our daily- and business life gains more and more impact. In addition, various derivations of Cyber-Physical Systems (CPS) reach new business fields, like smart healthcare or Industry 4.0. Although these systems do bring many advantages for users by extending the overall functionality of existing systems, they come with several challenges, especially for system engineers and architects. One key challenge consists in achieving a sufficiently high level of security within the CPS environment, as sensitive data or safety-critical functions are often integral parts of CPS. Being system of systems (SoS), CPS complexity, unpredictability and heterogeneity complicate analyzing the overall level of security, as well as providing a way to detect ongoing attacks. Usually, security metrics and frameworks provide an effective tool to measure the level of security of a given component or system. Although several comprehensive surveys exist, an assessment of the effectiveness of the existing solutions for CPS environments is insufficiently investigated in literature. In this work, we address this gap by benchmarking a carefully selected variety of existing security metrics in terms of their usability for CPS. Accordingly, we pinpoint critical CPS challenges and qualitatively assess the effectiveness of the existing metrics for CPS systems.
The emergence of Industrial Cyber-Physical Systems (ICPS) in today's business world is still steadily progressing to new dimensions. Although they bring many new advantages to business processes and enable automation and a wider range of service capability, they also propose a variety of new challenges. One major challenge, which is introduced by such System-of-Systems (SoS), lies in the security aspect. As security may not have had that significant role in traditional embedded system engineering, a generic way to measure the level of security within an ICPS would provide a significant benefit for system engineers and involved stakeholders. Even though many security metrics and frameworks exist, most of them insufficiently consider an SoS context and the challenges of such environments. Therefore, we aim to define a security metric for ICPS, which measures the level of security during the system design, tests, and integration as well as at runtime. For this, we try to focus on a semantic point of view, which on one hand has not been considered in security metric definitions yet, and on the other hand allows us to handle the complexity of SoS architectures. Furthermore, our approach allows combining the critical characteristics of an ICPS, like uncertainty, required reliability, multi-criticality and safety aspects.
Recent changes to greenhouse gas emission policies are catalyzing the electric vehicle (EV) market making it readily accessible to consumers. While there are challenges that arise with dense deployment of EVs, one of the major future concerns is cyber security threat. In this paper, cyber security threats in the form of tampering with EV battery's State of Charge (SOC) was explored. A Back Propagation (BP) Neural Network (NN) was trained and tested based on experimental data to estimate SOC of battery under normal operation and cyber-attack scenarios. NeuralWare software was used to run scenarios. Different statistic metrics of the predicted values were compared against the actual values of the specific battery tested to measure the stability and accuracy of the proposed BP network under different operating conditions. The results showed that BP NN was able to capture and detect the false entries due to a cyber-attack on its network.
Context: Software security is an imperative aspect of software quality. Early detection of vulnerable code during development can better ensure the security of the codebase and minimize testing efforts. Although traditional software metrics are used for early detection of vulnerabilities, they do not clearly address the granularity level of the issue to precisely pinpoint vulnerabilities. The goal of this study is to employ method-level traceable patterns (nano-patterns) in vulnerability prediction and empirically compare their performance with traditional software metrics. The concept of nano-patterns is similar to design patterns, but these constructs can be automatically recognized and extracted from source code. If nano-patterns can better predict vulnerable methods compared to software metrics, they can be used in developing vulnerability prediction models with better accuracy. Aims: This study explores the performance of method-level patterns in vulnerability prediction. We also compare them with method-level software metrics. Method: We studied vulnerabilities reported for two major releases of Apache Tomcat (6 and 7), Apache CXF, and two stand-alone Java web applications. We used three machine learning techniques to predict vulnerabilities using nano-patterns as features. We applied the same techniques using method-level software metrics as features and compared their performance with nano-patterns. Results: We found that nano-patterns show lower false negative rates for classifying vulnerable methods (for Tomcat 6, 21% vs 34.7%) and therefore, have higher recall in predicting vulnerable code than the software metrics used. On the other hand, software metrics show higher precision than nano-patterns (79.4% vs 76.6%). Conclusion: In summary, we suggest developers use nano-patterns as features for vulnerability prediction to augment existing approaches as these code constructs outperform standard metrics in terms of prediction recall.
Cloud Storage Brokers (CSB) provide seamless and concurrent access to multiple Cloud Storage Services (CSS) while abstracting cloud complexities from end-users. However, this multi-cloud strategy faces several security challenges including enlarged attack surfaces, malicious insider threats, security complexities due to integration of disparate components and API interoperability issues. Novel security approaches are imperative to tackle these security issues. Therefore, this paper proposes CS-BAuditor, a novel cloud security system that continuously audits CSB resources, to detect malicious activities and unauthorized changes e.g. bucket policy misconfigurations, and remediates these anomalies. The cloud state is maintained via a continuous snapshotting mechanism thereby ensuring fault tolerance. We adopt the principles of chaos engineering by integrating BrokerMonkey, a component that continuously injects failure into our reference CSB system, CloudRAID. Hence, CSBAuditor is continuously tested for efficiency i.e. its ability to detect the changes injected by BrokerMonkey. CSBAuditor employs security metrics for risk analysis by computing severity scores for detected vulnerabilities using the Common Configuration Scoring System, thereby overcoming the limitation of insufficient security metrics in existing cloud auditing schemes. CSBAuditor has been tested using various strategies including chaos engineering failure injection strategies. Our experimental evaluation validates the efficiency of our approach against the aforementioned security issues with a detection and recovery rate of over 96 %.
Confidentiality, Integrity, and Availability are principal keys to build any secure software. Considering the security principles during the different software development phases would reduce software vulnerabilities. This paper measures the impact of the different software quality metrics on Confidentiality, Integrity, or Availability for any given object-oriented PHP application, which has a list of reported vulnerabilities. The National Vulnerability Database was used to provide the impact score on confidentiality, integrity, and availability for the reported vulnerabilities on the selected applications. This paper includes a study for these scores and its correlation with 25 code metrics for the given vulnerable source code. The achieved results were able to correlate 23.7% of the variability in `Integrity' to four metrics: Vocabulary Used in Code, Card and Agresti, Intelligent Content, and Efferent Coupling metrics. The Length (Halstead metric) could alone predict about 24.2 % of the observed variability in ` Availability'. The results indicate no significant correlation of `Confidentiality' with the tested code metrics.
The "aging" phenomenon occurs after the long-term running of software, with the fault rate rising and running efficiency dropping. As there is no corresponding testing type for this phenomenon among conventional software tests, "software runtime accumulative testing" is proposed. Through analyzing several examples of software aging causing serious accidents, software is placed in the system environment required for running and the occurrence mechanism of software aging is analyzed. In addition, corresponding testing contents and recommended testing methods are designed with regard to all factors causing software aging, and the testing process and key points of testing requirement analysis for carrying out runtime accumulative testing are summarized, thereby providing a method and guidance for carrying out "software runtime accumulative testing" in software engineering.
Predicting software faults before software testing activities can help rational distribution of time and resources. Software metrics are used for software fault prediction due to their close relationship with software faults. Thanks to the non-linear fitting ability, Neural networks are increasingly used in the prediction model. We first filter metric set of the embedded software by statistical methods to reduce the dimensions of model input. Then we build a back propagation neural network with simple structure but good performance and apply it to two practical embedded software projects. The verification results show that the model has good ability to predict software faults.
The 2018 Biometric Technology Rally was an evaluation, sponsored by the U.S. Department of Homeland Security, Science and Technology Directorate (DHS S&T), that challenged industry to provide face or face/iris systems capable of unmanned, traveler identification in a high-throughput security environment. Selected systems were installed at the Maryland Test Facility (MdTF), a DHS S&T affiliated bio-metrics testing laboratory, and evaluated using a population of 363 naive human subjects recruited from the general public. The performance of each system was examined based on measured throughput, capture capability, matching capability, and user satisfaction metrics. This research documents the performance of unmanned face and face/iris systems required to maintain an average total subject interaction time of less than 10 seconds. The results highlight discrepancies between the performance of biometric systems as anticipated by the system designers and the measured performance, indicating an incomplete understanding of the main determinants of system performance. Our research shows that failure-to-acquire errors, unpredicted by system designers, were the main driver of non-identification rates instead of failure-to-match errors, which were better predicted. This outcome indicates the need for a renewed focus on reducing the failure-to-acquire rate in high-throughput, unmanned biometric systems.
Testing which is an indispensable part of software engineering is itself an art and science which emerged as a discipline over a period. On testing, if defects are found, testers diminish the risk by providing the awareness of defects and solutions to deal with them before release. If testing does not find any defects, testing assure that under certain conditions the system functions correctly. To guarantee that enough testing has been done, major risk areas need to be tested. We have to identify the risks, analyse and control them. We need to categorize the risk items to decide the extent of testing to be covered. Also, Implementation of structured metrics is lagging in software testing. Efficient metrics are necessary to evaluate, manage the testing process and make testing a part of engineering discipline. This paper proposes the usage of risk based testing using FMEA technique and provides an ideal set of metrics which provides a way to ensure effective testing process.
With widely applied in various fields, deep learning (DL) is becoming the key driving force in industry. Although it has achieved great success in artificial intelligence tasks, similar to traditional software, it has defects that, once it failed, unpredictable accidents and losses would be caused. In this paper, we propose a test cases generation technique based on an adversarial samples generation algorithm for image classification deep neural networks (DNNs), which can generate a large number of good test cases for the testing of DNNs, especially in case that test cases are insufficient. We briefly introduce our method, and implement the framework. We conduct experiments on some classic DNN models and datasets. We further evaluate the test set by using a coverage metric based on states of the DNN.
In transient distributed cloud computing environment, software is vulnerable to attack, which leads to software functional completeness, so it is necessary to carry out functional testing. In order to solve the problem of high overhead and high complexity of unsupervised test methods, an intelligent evaluation method for transient analysis software function testing based on active depth learning algorithm is proposed. Firstly, the active deep learning mathematical model of transient analysis software function test is constructed by using association rule mining method, and the correlation dimension characteristics of software function failure are analyzed. Then the reliability of the software is measured by the spectral density distribution method of software functional completeness. The intelligent evaluation model of transient analysis software function testing is established in the transient distributed cloud computing environment, and the function testing and reliability intelligent evaluation are realized. Finally, the performance of the transient analysis software is verified by the simulation experiment. The results show that the accuracy of the software functional integrity positioning is high and the intelligent evaluation of the transient analysis software function testing has a good self-adaptability by using this method to carry out the function test of the transient analysis software. It ensures the safe and reliable operation of the software.
Recent years, more and more testing criteria for deep learning systems has been proposed to ensure system robustness and reliability. These criteria were defined based on different perspectives of diversity. However, there lacks comprehensive investigation on what are the most essential diversities that should be considered by a testing criteria for deep learning systems. Therefore, in this paper, we conduct an empirical study to investigate the relation between test diversities and erroneous behaviors of deep learning models. We define five metrics to reflect diversities in neuron activities, and leverage metamorphic testing to detect erroneous behaviors. We investigate the correlation between metrics and erroneous behaviors. We also go further step to measure the quality of test suites under the guidance of defined metrics. Our results provided comprehensive insights on the essential diversities for testing criteria to exhibit good fault detection ability.
Several assessment techniques and methodologies exist to analyze the security of an application dynamically. However, they either are focused on a particular product or are mainly concerned about the assessment process rather than the product's security confidence. Most crucially, they tend to assess the security of a target application as a standalone artifact without assessing its host infrastructure. Such attempts can undervalue the overall security posture since the infrastructure becomes crucial when it hosts a critical application. We present an ontology-based security model that aims to provide the necessary knowledge, including network settings, application configurations, testing techniques and tools, and security metrics to evaluate the security aptitude of a critical application in the context of its hosting infrastructure. The objective is to integrate the current good practices and standards in security testing and virtualization to furnish an on-demand and test-ready virtual target infrastructure to execute the critical application and to initiate a context-aware and quantifiable security assessment process in an automated manner. Furthermore, we present a security assessment architecture to reflect on how the ontology can be integrated into a standard process.
Software vulnerabilities often remain hidden until an attacker exploits the weak/insecure code. Therefore, testing the software from a vulnerability discovery perspective becomes challenging for developers if they do not inspect their code thoroughly (which is time-consuming). We propose that vulnerability prediction using certain software metrics can support the testing process by identifying vulnerable code-components (e.g., functions, classes, etc.). Once a code-component is predicted as vulnerable, the developers can focus their testing efforts on it, thereby avoiding the time/effort required for testing the entire application. The current paper presents a study that compares how software metrics perform as vulnerability predictors for software projects developed in two different languages (Java vs Python). The goal of this research is to analyze the vulnerability prediction performance of software metrics for different programming languages. We designed and conducted experiments on security vulnerabilities reported for three Java projects (Apache Tomcat 6, Tomcat 7, Apache CXF) and two Python projects (Django and Keystone). In this paper, we focus on a specific type of code component: Functions. We apply Machine Learning models for predicting vulnerable functions. Overall results show that software metrics-based vulnerability prediction is more useful for Java projects than Python projects (i.e., software metrics when used as features were able to predict Java vulnerable functions with a higher recall and precision compared to Python vulnerable functions prediction).
Currently, the complexity of software quality and testing is increasing exponentially with a huge number of challenges knocking doors, especially when testing a mission-critical application in banking and other critical domains, or the new technology trends with decentralized and nonintegrated testing tools. From practical experience, software testing has become costly and more effort-intensive with unlimited scope. This thesis promotes the Scalable Quality and Testing Lab (SQTL), it's a centralized quality and testing platform, which integrates a powerful manual, automation and business intelligence tools. SQTL helps quality engineers (QE) effectively organize, manage and control all testing activities in one centralized lab, starting from creating test cases, then executing different testing types such as web, security and others. And finally, ending with analyzing and displaying all testing activities result in an interactive dashboard, which allows QE to forecast new bugs especially those related to security. The centralized SQTL is to empower QE during the testing cycle, help them to achieve a greater level of software quality in minimum time, effort and cost, and decrease defect density metric.
Severe class imbalance between the majority and minority classes in large datasets can prejudice Machine Learning classifiers toward the majority class. Our work uniquely consolidates two case studies, each utilizing three learners implemented within an Apache Spark framework, six sampling methods, and five sampling distribution ratios to analyze the effect of severe class imbalance on big data analytics. We use three performance metrics to evaluate this study: Area Under the Receiver Operating Characteristic Curve, Area Under the Precision-Recall Curve, and Geometric Mean. In the first case study, models were trained on one dataset (POST) and tested on another (SlowlorisBig). In the second case study, the training and testing dataset roles were switched. Our comparison of performance metrics shows that Area Under the Precision-Recall Curve and Geometric Mean are sensitive to changes in the sampling distribution ratio, whereas Area Under the Receiver Operating Characteristic Curve is relatively unaffected. In addition, we demonstrate that when comparing sampling methods, borderline-SMOTE2 outperforms the other methods in the first case study, and Random Undersampling is the top performer in the second case study.