Biblio
Sandboxes are increasingly important building materials for secure software systems. In recognition of their potential to improve the security posture of many systems at various points in the development lifecycle, researchers have spent the last several decades developing, improving, and evaluating sandboxing techniques. What has been done in this space? Where are the barriers to advancement? What are the gaps in these efforts? We systematically analyze a decade of sandbox research from five top-tier security and systems conferences using qualitative content analysis, statistical clustering, and graph-based metrics to answer these questions and more. We find that the term “sandbox” currently has no widely accepted or acceptable definition. We use our broad scope to propose the first concise and comprehensive definition for “sandbox” that consistently encompasses research sandboxes. We learn that the sandboxing landscape covers a range of deployment options and policy enforcement techniques collectively capable of defending diverse sets of components while mitigating a wide range of vulnerabilities. Researchers consistently make security, performance, and applicability claims about their sandboxes and tend to narrowly define the claims to ensure they can be evaluated. Those claims are validated using multi-faceted strategies spanning proof, analytical analysis, benchmark suites, case studies, and argumentation. However, we find two cases for improvement: (1) the arguments researchers present are often ad hoc and (2) sandbox usability is mostly uncharted territory. We propose ways to structure arguments to ensure they fully support their corresponding claims and suggest lightweight means of evaluating sandbox usability.
It is more expensive and time consuming to build modern software without extensive supply chains. Supply chains decrease these development risks, but typically at the cost of increased security risk. In particular, it is often difficult to understand or verify what a software component delivered by a third party does or could do. Such a component could contain unwanted behaviors, vulnerabilities, or malicious code, many of which become incorporated in applications utilizing the component. Sandboxes provide relief by encapsulating a component and imposing a security policy on it. This limits the operations the component can perform without as much need to trust or verify the component. Instead, a component user must trust or verify the relatively simple sandbox. Given this appealing prospect, researchers have spent the last few decades developing new sandboxing techniques and sandboxes. However, while sandboxes have been adopted in practice, they are not as pervasive as they could be. Why are sandboxes not achieving ubiquity at the same rate as extensive supply chains? This thesis advances our understanding of and overcomes some barriers to sandbox adoption. We systematically analyze ten years (2004 – 2014) of sandboxing research from top-tier security and systems conferences. We uncover two barriers: (1) sandboxes are often validated using relatively subjective techniques and (2) usability for sandbox deployers is often ignored by the studied community. We then focus on the Java sandbox to empirically study its use within the open source community. We find features in the sandbox that benign applications do not use, which have promoted a thriving exploit landscape. We develop run time monitors for the Java Virtual Machine (JVM) to turn off these features, stopping all known sandbox escaping JVM exploits without breaking benign applications. Furthermore, we find that the sandbox contains a high degree of complexity benign applications need that hampers sandbox use. When studying the sandbox’s use, we did not find a single application that successfully deployed the sandbox for security purposes, which motivated us to overcome benignly-used complexity via tooling. We develop and evaluate a series of tools to automate the most complex tasks, which currently require error-prone manual effort. Our tools help users derive, express, and refine a security policy and impose it on targeted Java application JARs and classes. This tooling is evaluated through case studies with industrial collaborators where we sandbox components that were previously difficult to sandbox securely. Finally, we observe that design and implementation complexity causes sandbox developers to accidentally create vulnerable sandboxes. Thus, we develop and evaluate a sandboxing technique that leverages existing cloud computing environments to execute untrusted computations. Malicious outcomes produced by the computations are contained by ephemeral virtual machines. We describe a field trial using this technique with Adobe Reader and compare the new sandbox to existing sandboxes using a qualitative framework we developed.
The ubiquitously-installed Java Runtime Environment (JRE) provides a complex, flexible set of mechanisms that support the execution of untrusted code inside a secure sandbox. However, many recent exploits have successfully escaped the sandbox, allowing attackers to infect numerous Java hosts. We hypothesize that the Java security model affords developers more flexibility than they need or use in practice, and thus its complexity compromises security without improving practical functionality. We describe an empirical study of the ways benign open-source Java applications use and interact with the Java security manager. We found that developers regularly misunderstand or misuse Java security mechanisms, that benign programs do not use all of the vast flexibility afforded by the Java security model, and that there are clear differences between the ways benign and exploit programs interact with the security manager. We validate these results by deriving two restrictions on application behavior that restrict (1) security manager modifications and (2) privilege escalation. We demonstrate that enforcing these rules at runtime stop a representative proportion of modern Java 7 exploits without breaking backwards compatibility with benign applications. These practical rules should be enforced in the JRE to fortify the Java sandbox.
The ubiquitously-installed Java Runtime Environment (JRE) provides a complex, flexible set of mechanisms that support the execution of untrusted code inside a secure sandbox. However, many recent exploits have successfully escaped the sandbox, allowing attackers to infect numerous Java hosts. We hypothesize that the Java security model affords developers more flexibility than they need or use in practice, and thus its complexity compromises security without improving practical functionality. We describe an empirical study of the ways benign open-source Java applications use and interact with the Java security manager. We found that developers regularly misunderstand or misuse Java security mechanisms, that benign programs do not use all of the vast flexibility afforded by the Java security model, and that there are clear differences between the ways benign and exploit programs interact with the security manager. We validate these results by deriving two restrictions on application behavior that restrict (1) security manager modifications and (2) privilege escalation. We demonstrate that enforcing these rules at runtime stop a representative proportion of modern Java 7 exploits without breaking backwards compatibility with benign applications. These practical rules should be enforced in the JRE to fortify the Java sandbox.
Sandboxes impose a security policy, isolating applications and their components from the rest of a system. While many sandboxing techniques exist, state of the art sandboxes generally perform their functions within the system that is being defended. As a result, when the sandbox fails or is bypassed, the security of the surrounding system can no longer be assured. We experiment with the idea of in-nimbo sandboxing, encapsulating untrusted computations away from the system we are trying to protect. The idea is to delegate computations that may be vulnerable or malicious to virtual machine instances in a cloud computing environment.
This may not reduce the possibility of an in-situ sandbox compromise, but it could significantly reduce the consequences should that possibility be realized. To achieve this advantage, there are additional requirements, including: (1) A regulated channel between the local and cloud environments that supports interaction with the encapsulated application, (2) Performance design that acceptably minimizes latencies in excess of the in-situ baseline.
To test the feasibility of the idea, we built an in-nimbo sandbox for Adobe Reader, an application that historically has been subject to significant attacks. We undertook a prototype deployment with PDF users in a large aerospace firm. In addition to thwarting several examples of existing PDF-based malware, we found that the added increment of latency, perhaps surprisingly, does not overly impair the user experience with respect to performance or usability.
To help users create stronger text-based passwords, many web sites have deployed password meters that provide visual feedback on password strength. Although these meters are in wide use, their effects on the security and usability of passwords have not been well studied. We present a 2,931-subject study of password creation in the presence of 14 password meters. We found that meters with a variety of visual appearances led users to create longer passwords. However, significant increases in resistance to a password-cracking algorithm were only achieved using meters that scored passwords stringently. These stringent meters also led participants to include more digits, symbols, and uppercase letters. Password meters also affected the act of password creation. Participants who saw stringent meters spent longer creating their password and were more likely to change their password while entering it, yet they were also more likely to find the password meter annoying. However, the most stringent meter and those without visual bars caused participants to place less importance on satisfying the meter. Participants who saw more lenient meters tried to fill the meter and were averse to choosing passwords a meter deemed “bad” or “poor.” Our findings can serve as guidelines for administrators seeking to nudge users towards stronger passwords.