CMU SoS Lablet Quarterly Executive Summary - January 2023
A. Fundamental Research
High level report of result or partial result that helped move security science forward-- In most cases it should point to a "hard problem". These are the most important research accomplishments of the Lablet in the previous quarter.
Jonathan Aldrich
PUBLIC ACCOMPLISHMENT HIGHLIGHTS
Blockchains have been proposed to support transactions on distributed, shared state, but hackers have exploited security vulnerabilities in existing programs. We applied user-centered design in the creation of Obsidian, a new language that uses typestate and linearity to support stronger safety guarantees than current approaches for programming blockchain systems.
We continued to explore some outgrowths of our Obsidian work on usable progamming languages. In particular, we are interested in reasoning more kinds of things, including lightweight specifications and information flow--and we would like to study how to make this reasoning usable, as in our prior Obsidian work. In detail:
- Static information flow control (IFC) systems provide the ability to restrict data flows within a program, enabling vulnerable functionality or confidential data to be statically isolated from unsecured data or program logic. Despite the wide applicability of IFC as a mechanism for guaranteeing confidentiality and integrity--- the fundamental properties on which computer security relies--- existing IFC systems have seen little use, requiring users to reason about complicated mechanisms such as lattices of security labels and dual notions of confidentiality and integrity within these lattices. We propose a system that diverges significantly from previous work on information flow control, opting to reason directly about the data that programmers already work with. In doing so, we naturally and seamlessly combine the clasically separate notions of confidentiality and integrity into one unified framework, further simplifying reasoning. In an arXiv paper and accompanying presentation at HATRA'22, we motivate and showcase our work through two case studies on TLS private key management: one for Rocket, a popular Rust web framework, and another for Conduit, a server implementation for the Matrix messaging service written in Rust.
We have continued to work on the theory for this new approach. In the near future, we hope to do Wizard of Oz studies to study how well programmers can understand our information flow specifications, and to make them more usable.
- We have also started looking at adapting liquid types to the setting of Java and studying how to make them more usable. Liquid types allow users to add lightweight specifications in the form of logical expressions to types. The logical expressions are chosen to be decidable, so the user experience is similar to type systems in many ways. The student doing the work published a workshop paper at HATRA '21 on a usability study with her system, before coming to CMU and before joining the lablet group; as part of the lablet project, we are now looking at extensions to increase expressiveness and improve usability of the idea. While the current work is in the context of Java, it is very much aligned with the goals of Obsidian and the liquid types could be adapted to the context of smart contracts in the future.
- Finally, we have started exploring extending Obsidian or other smart contract languages with gradual specification and verification. In a separate NSF-funded project, we have recently developed gradual verification [Wise et al., OOPSLA 2020; DiVincenzo et al, arXiv:2210.02428], an approach to combining static and dynamic techniques to support verification in a context where some code is specified and verified and others is not. This approach seems ideal in a smart contract setting, where a verified contract may interact with other contracts that are verified, as well as some that are unverified; in our SoS work we've started to look at design considerations and applications in Obsidian and related languages.
Lujo Bauer
Securing Safety-Critical Machine Learning Algorithms
Public Accomplishments
ML models have shown promise in classifying raw executable files (binaries) as malicious or benign with high accuracy. This has led to the increasing influence of ML-based classification methods in academic and real-world malware detection, a critical tool in cybersecurity. However, previous work provoked caution by creating variants of malicious binaries, referred to as adversarial examples, that are transformed in a functionality-preserving way to evade detection.
We investigated the effectiveness of using adversarial training methods to create malware classification models that are more robust to some state-of-the-art attacks. To train our most robust models, we significantly increased the efficiency and scale of creating adversarial examples to make adversarial training practical, a first for raw-binary malware detectors. We then analyzed the effects of varying the length of adversarial training and various versions of attacks to train with. We found that data augmentation does not deter state-of-the-art attacks, but using a generic gradient-guided method used in other discrete domains does improve robustness. We also showed that in most cases, models can be made more robust to malware-domain attacks by adversarially training with lower-effort versions of the same attack. In the best case, we reduced one state-of-the-art attack's success rate from 90% to 5%. We also found that training with some attacks can increase robustness to other types of attacks.
Adversarial training for raw-binary malware classifiers.
Lorrie Cranor
Characterizing user behavior and anticipating its effects on computer security with a Security Behavior Observatory
PUBLIC ACCOMPLISHMENT HIGHLIGHTS
The SBO addresses the hard problem of “Understanding and Accounting for Human Behavior” by collecting data directly from people’s own home computers, thereby capturing people’s computing behavior “in the wild.”
Paper Submitted: Accurate and Generalizable Behavior-Based Models for Predicting User Exposure to Malicious Websites. Kyle Crichton, Jin-Dong Dong, Akira Yamada, Yukikio Sawaya, Lorrie Cranor, and Nicolas Christin. Submitted to USENIX Security 2023.
- This project first evaluates a previously created algorithm using the SBO dataset and an external dataset as comparison points. Our findings validate the initial results.
- We then improve upon these methods by employing a new sequential representation of user browsing, rather than the aggregate session metrics, to better capture browsing data.
- Using this new data structure along with a Long Short-Term Memory (LSTM) prediction model, a substantial increase in predictive accuracy (AUC score of 0.994) is achieved that reaches the level that could make it viable in a number of real-world security use cases.
- These improvements are reproducible across two disparate datasets, one of home computer users in the United States (SBO) and another of computer users in Japan.
- Additional features present in the SBO dataset is tested to compare predictive ability of models deployed in the user's browser to that deployed on the network but only minor differences are observed.
David Garlan
Model-Based Explanation For Human-in-the-Loop Security
PUBLIC ACCOMPLISHMENT HIGHLIGHTS
We completed publication of the material that we reported last quarter, as above: We have developed a framework that uses various statistical approaches commonly used in machine learning for simplifying explanations of plans made in large trade-off spaces. The approach combinds principle component analysis (PCA), decision trees, and classification to understand key factors in deciding which plans to choose. The approach can allow explanations to focus on factors that really impacted the choice of plan, reducing that amount of information and context a human would need to understand to comprehend an explanation. We have several publications about this currently under review.
We have been generalizing the approach of using machine learning techniques to reduce dimensionality that needs to be considered in explanation to industrial manufacturing. Under a project with Lockheed Martin, we have begun working with colleagues who are applying planning techniques to industrial manufacturing pipelines. For example, creating some manufactured widget might be done with different types of milling or printing machines, with different materials. Planning in this instance allows the manufacturing process to be agile wrt changes in material costs, supplier schedules, and changing priorities. Explanation is required in this domain as well, and we have shown that the techniques delivered under this project can also apply in the manufacturing domain.
Joshua Sunshine
Security Science Research Experience for Undergraduates
ACCOMPLISHMENTS
The Security Science Research Experience for Undergraduates is funding four students to work with Carnegie Mellon Researchers in Summer 2022:
- Emily Chang, University of Virginia, "picoCTF Cybersecurity & Education Research through Online Gaming," Advisors: Hanan Hibshi and Maverick Woo.
- Patrick May, College of Wooster, "Developer Awareness of Secure Programming Practices." Advisor: Hanan Hibshi.
- Lyric Sampson, Alabama A&M University, "AI Ethics in Open Source," Advisors: James Herbsleb and Laura Dabbish
- Daniel Verdi do Amarante, University of Richmond, "Natural Test Case Generation Using Deep Learning," Advisors: Rohan Padhye and Vincent Hellendoorn
The Summer 2022 Security Science Research Experience for Undergraduates is now complete. The four funded students and many additional students from the NSF-funded Research Experiences for Undergraduates in Software Engineering (REUSE) program met attended a presentation from Adam Tagert about research at the NSA.