Biblio
Safety- and security-critical developers have long recognized the importance of applying a high degree of scrutiny to a system’s (or subsystem’s) I/O messages. However, lack of care in the development of message-handling components can lead to an increase, rather than a decrease, in the attack surface. On the DARPA Cyber-Assured Systems Engineering (CASE) program, we have focused our research effort on identifying cyber vulnerabilities early in system development, in particular at the Architecture development phase, and then automatically synthesizing components that mitigate against the identified vulnerabilities from high-level specifications. This approach is highly compatible with the goals of the LangSec community. Advances in formal methods have allowed us to produce hardware/software implementations that are both performant and guaranteed correct. With these tools, we can synthesize high-assurance “building blocks” that can be composed automatically with high confidence to create trustworthy systems, using a method we call Security-Enhancing Architectural Transformations. Our synthesis-focused approach provides a higherleverage insertion point for formal methods than is possible with post facto analytic methods, as the formal methods tools directly contribute to the implementation of the system, without requiring developers to become formal methods experts. Our techniques encompass Systems, Hardware, and Software Development, as well as Hardware/Software Co-Design/CoAssurance. We illustrate our method and tools with an example that implements security-improving transformations on system architectures expressed using the Architecture Analysis and Design Language (AADL). We show how message-handling components can be synthesized from high-level regular or context-free language specifications, as well as a novel specification language for self-describing messages called Contiguity Types, and verified to meet arithmetic constraints extracted from the AADL model. Finally, we guarantee that the intent of the message processing logic is accurately reflected in the application binary code through the use of the verified CakeML compiler, in the case of software, or the Restricted Algorithmic C toolchain with ACL2-based formal verification, in the case of hardware/software co-design.
Software Defined Networking (SDN) is a networking paradigm that has been very popular due to its advantages over traditional networks with regard to scalability, flexibility, and its ability to solve many security issues. Nevertheless, SDN networks are exposed to new security threats and attacks, especially Distributed Denial of Service (DDoS) attacks. For this aim, we have proposed a model able to detect and mitigate attacks automatically in SDN networks using Machine Learning (ML). Different than other approaches found in literature which use the native flow features only for attack detection, our model extends the native features. The extended flow features are the average flow packet size, the number of flows to the same host as the current flow in the last 5 seconds, and the number of flows to the same host and port as the current flow in the last 5 seconds. Six ML algorithms were evaluated, namely Logistic Regression (LR), Naive Bayes (NB), K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree (DT), and Random Forest (RF). The experiments showed that RF is the best performing ML algorithm. Also, results showed that our model is able to detect attacks accurately and quickly, with a low probability of dropping normal traffic.
Software developers can use diverse techniques and tools to reduce the number of vulnerabilities, but the effectiveness of existing solutions in real projects is questionable. For example, Static Analysis Tools (SATs) report potential vulnerabilities by analyzing code patterns, and Software Metrics (SMs) can be used to predict vulnerabilities based on high-level characteristics of the code. In theory, both approaches can be applied from the early stages of the development process, but it is well known that they fail to detect critical vulnerabilities and raise a large number of false alarms. This paper studies the hypothesis of using Machine Learning (ML) to combine alerts from SATs with SMs to predict vulnerabilities in a large software project (under development for many years). In practice, we use four ML algorithms, alerts from two SATs, and a large number of SMs to predict whether a source code file is vulnerable or not (binary classification) and to predict the vulnerability category (multiclass classification). Results show that one can achieve either high precision or high recall, but not both at the same time. To understand the reason, we analyze and compare snippets of source code, demonstrating that vulnerable and non-vulnerable files share similar characteristics, making it hard to distinguish vulnerable from non-vulnerable code based on SAT alerts and SMs.