Biblio
Malicious domains are key components to a variety of cyber attacks. Several recent techniques are proposed to identify malicious domains through analysis of DNS data. The general approach is to build classifiers based on DNS-related local domain features. One potential problem is that many local features, e.g., domain name patterns and temporal patterns, tend to be not robust. Attackers could easily alter these features to evade detection without affecting much their attack capabilities. In this paper, we take a complementary approach. Instead of focusing on local features, we propose to discover and analyze global associations among domains. The key challenges are (1) to build meaningful associations among domains; and (2) to use these associations to reason about the potential maliciousness of domains. For the first challenge, we take advantage of the modus operandi of attackers. To avoid detection, malicious domains exhibit dynamic behavior by, for example, frequently changing the malicious domain-IP resolutions and creating new domains. This makes it very likely for attackers to reuse resources. It is indeed commonly observed that over a period of time multiple malicious domains are hosted on the same IPs and multiple IPs host the same malicious domains, which creates intrinsic association among them. For the second challenge, we develop a graph-based inference technique over associated domains. Our approach is based on the intuition that a domain having strong associations with known malicious domains is likely to be malicious. Carefully established associations enable the discovery of a large set of new malicious domains using a very small set of previously known malicious ones. Our experiments over a public passive DNS database show that the proposed technique can achieve high true positive rates (over 95%) while maintaining low false positive rates (less than 0.5%). Further, even with a small set of known malicious domains (a couple of hundreds), our technique can discover a large set of potential malicious domains (in the scale of up to tens of thousands).
Mobile code distribution relies on digital signatures to guarantee code authenticity. Unfortunately, standard signature schemes are not well suited for use in conjunction with program transformation techniques, such as aspect-oriented programming. With these techniques, code development is performed in sequence by multiple teams of programmers. This is fundamentally different from traditional single-developer/ single-user models, where users can verify end-to-end (i.e., developer-to-user) authenticity of the code using digital signatures. To address this limitation, we introduce FLEX, a flexible code authentication framework for mobile applications. FLEX allows semi-trusted intermediaries to modify mobile code without invalidating the developer's signature, as long as the modification complies with a "contract" issued by the developer. We introduce formal definitions for secure code modification, and show that our instantiation of FLEX is secure under these definitions. Although FLEX can be instantiated using any language, we design AMJ–a novel programming language that supports code annotations–and implement a FLEX prototype based on our new language.
Databases have become one of the most important components in modern software systems. For example, web services, cloud computing systems, and online transaction processing systems all rely heavily on databases. To abstract the complexity of accessing a database, developers make use of Object-Relational Mapping (ORM) frameworks. ORM frameworks provide an abstraction layer between the application logic and the underlying database. Such abstraction layer automatically maps objects in Object-Oriented Languages to database records, which significantly reduces the amount of boilerplate code that needs to be written. Despite the advantages of using ORM frameworks, we observe several difficulties in maintaining ORM code (i.e., code that makes use of ORM frameworks) when cooperating with our industrial partner. After conducting studies on other open source systems, we find that such difficulties are common in other Java systems. Our study finds that i) ORM cannot completely encapsulate database accesses in objects or abstract the underlying database technology, thus may cause ORM code changes more scattered; ii) ORM code changes are more frequent than regular code, but there is a lack of tools that help developers verify ORM code at compilation time; iii) we find that changes to ORM code are more commonly due to performance or security reasons; however, traditional static code analyzers need to be extended to capture the peculiarities of ORM code in order to detect such problems. Our study highlights the hidden maintenance costs of using ORM frameworks, and provides some initial insights about potential approaches to help maintain ORM code. Future studies should carefully examine ORM code, especially given the rising use of ORM in modern software systems.
Cloud providers are in a position to greatly improve the trust clients have in network services: IaaS platforms can isolate services so they cannot leak data, and can help verify that they are securely deployed. We describe a new system called CQSTR that allows clients to verify a service's security properties. CQSTR provides a new cloud container abstraction similar to Linux containers but for VM clusters within IaaS clouds. Cloud containers enforce constraints on what software can run, and control where and how much data can be communicated across service boundaries. With CQSTR, IaaS providers can make assertions about the security properties of a service running in the cloud. We investigate implementations of CQSTR on both Amazon AWS and OpenStack. With AWS, we build on virtual private clouds to limit network access and on authorization mechanisms to limit storage access. However, with AWS certain security properties can be checked only by monitoring audit logs for violations after the fact. We modified OpenStack to implement the full CQSTR model with only modest code changes. We show how to use CQSTR to build more secure deployments of the data analytics frameworks PredictionIO, PacketPig, and SpamAssassin. In experiments on CloudLab we found that the performance impact of CQSTR on applications is near zero.
In today's enterprise networks, there are many ways for a determined attacker to obtain a foothold, bypass current protection technologies, and attack the intended target. Over several years we have developed the Self-shielding Dynamic Network Architecture (SDNA) technology, which prevents an attacker from targeting, entering, or spreading through an enterprise network by adding dynamics that present a changing view of the network over space and time. SDNA was developed with the support of government sponsored research and development and corporate internal resources. The SDNA technology was purchased by Cryptonite, LLC in 2015 and has been developed into a robust product offering called Cryptonite NXT. In this paper, we describe the journey and lessons learned along the course of feasibility demonstration, technology development, security testing, productization, and deployment in a production network.
Unlike a random, run-of-the-mill website infection, in a strategic web attack, the adversary carefully chooses the target frequently visited by an organization or a group of individuals to compromise, for the purpose of gaining a step closer to the organization or collecting information from the group. This type of attacks, called "watering hole", have been increasingly utilized by APT actors to get into the internal networks of big companies and government agencies or monitor politically oriented groups. With its importance, little has been done so far to understand how the attack works, not to mention any concrete step to counter this threat. In this paper, we report our first step toward better understanding this emerging threat, through systematically discovering and analyzing new watering hole instances and attack campaigns. This was made possible by a carefully designed methodology, which repeatedly monitors a large number potential watering hole targets to detect unusual changes that could be indicative of strategic compromises. Running this system on the HTTP traffic generated from visits to 61K websites for over 5 years, we are able to discover and confirm 17 watering holes and 6 campaigns never reported before. Given so far there are merely 29 watering holes reported by blogs and technical reports, the findings we made contribute to the research on this attack vector, by adding 59% more attack instances and information about how they work to the public knowledge. Analyzing the new watering holes allows us to gain deeper understanding of these attacks, such as repeated compromises of political websites, their long lifetimes, unique evasion strategy (leveraging other compromised sites to serve attack payloads) and new exploit techniques (no malware delivery, web only information gathering). Also, our study brings to light interesting new observations, including the discovery of a recent JSONP attack on an NGO website that has been widely reported and apparently forced the attack to stop.
Because of rampant security breaches in IoT devices, searching vulnerabilities in massive IoT ecosystems is more crucial than ever. Recent studies have demonstrated that control-flow graph (CFG) based bug search techniques can be effective and accurate in IoT devices across different architectures. However, these CFG-based bug search approaches are far from being scalable to handle an enormous amount of IoT devices in the wild, due to their expensive graph matching overhead. Inspired by rich experience in image and video search, we propose a new bug search scheme which addresses the scalability challenge in existing cross-platform bug search techniques and further improves search accuracy. Unlike existing techniques that directly conduct searches based upon raw features (CFGs) from the binary code, we convert the CFGs into high-level numeric feature vectors. Compared with the CFG feature, high-level numeric feature vectors are more robust to code variation across different architectures, and can easily achieve realtime search by using state-of-the-art hashing techniques. We have implemented a bug search engine, Genius, and compared it with state-of-art bug search approaches. Experimental results show that Genius outperforms baseline approaches for various query loads in terms of speed and accuracy. We also evaluated Genius on a real-world dataset of 33,045 devices which was collected from public sources and our system. The experiment showed that Genius can finish a search within 1 second on average when performed over 8,126 firmware images of 420,558,702 functions. By only looking at the top 50 candidates in the search result, we found 38 potentially vulnerable firmware images across 5 vendors, and confirmed 23 of them by our manual analysis. We also found that it took only 0.1 seconds on average to finish searching for all 154 vulnerabilities in two latest commercial firmware images from D-LINK. 103 of them are potentially vulnerable in these images, and 16 of them were confirmed.
We present a code- and input-sensitive sanitization synthesis approach for repairing string vulnerabilities that are common in web applications. The synthesized sanitization patch modifies the user input in an optimal way while guaranteeing that the repaired web application is not vulnerable. Given a web application, an input pattern and an attack pattern, we use automata-based static string analysis techniques to compute a sanitization signature that characterizes safe input values that obey the given input pattern and are safe with respect to the given attack pattern. Using the sanitization signature, we synthesize an optimal sanitization patch that converts malicious user inputs to benign ones with minimal editing. When the generated patch is added to the web application, it is guaranteed that the repaired web application is no longer vulnerable. We present refinements to previous sanitization synthesis algorithms that reduce the runtime sanitization cost significantly. We evaluate our approach on open source web applications using common input and attack patterns, demonstrating the effectiveness of our approach.
There is an increasing trend for data owners to store their data in a third-party cloud server and buy the service from the cloud server to provide information to other users. To ensure confidentiality, the data is usually encrypted. Therefore, an encrypted data searching scheme with privacy preserving is of paramount importance. Predicate encryption (PE) is one of the attractive solutions due to its attribute-hiding merit. However, as cloud is not always trusted, verifying the searched results is also crucial. Firstly, a generic construction of Publicly Verifiable Predicate Encryption (PVPE) scheme is proposed to provide verification for PE. We reduce the security of PVPE to the security of PE. However, from practical point of view, to decrease the communication overhead and computation overhead, an improved PVPE is proposed with the trade-off of a small probability of error.
In local differential privacy (LDP), each user perturbs her data locally before sending the noisy data to a data collector. The latter then analyzes the data to obtain useful statistics. Unlike the setting of centralized differential privacy, in LDP the data collector never gains access to the exact values of sensitive data, which protects not only the privacy of data contributors but also the collector itself against the risk of potential data leakage. Existing LDP solutions in the literature are mostly limited to the case that each user possesses a tuple of numeric or categorical values, and the data collector computes basic statistics such as counts or mean values. To the best of our knowledge, no existing work tackles more complex data mining tasks such as heavy hitter discovery over set-valued data. In this paper, we present a systematic study of heavy hitter mining under LDP. We first review existing solutions, extend them to the heavy hitter estimation, and explain why their effectiveness is limited. We then propose LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with LDP. The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset. We provide both in-depth theoretical analysis and extensive experiments to compare LDPMiner against adaptations of previous solutions. The results show that LDPMiner significantly improves over existing methods. More importantly, LDPMiner successfully identifies the majority true heavy hitters in practical settings.
Differential privacy is a precise mathematical constraint meant to ensure privacy of individual pieces of information in a database even while queries are being answered about the aggregate. Intuitively, one must come to terms with what differential privacy does and does not guarantee. For example, the definition prevents a strong adversary who knows all but one entry in the database from further inferring about the last one. This strong adversary assumption can be overlooked, resulting in misinterpretation of the privacy guarantee of differential privacy. Herein we give an equivalent definition of privacy using mutual information that makes plain some of the subtleties of differential privacy. The mutual-information differential privacy is in fact sandwiched between ε-differential privacy and (ε,δ)-differential privacy in terms of its strength. In contrast to previous works using unconditional mutual information, differential privacy is fundamentally related to conditional mutual information, accompanied by a maximization over the database distribution. The conceptual advantage of using mutual information, aside from yielding a simpler and more intuitive definition of differential privacy, is that its properties are well understood. Several properties of differential privacy are easily verified for the mutual information alternative, such as composition theorems.
The phenomenon of metal-insulator-transition (MIT) in strongly correlated oxides, such as NbO2, have shown the oscillation behavior in recent experiments. In this work, the MIT based two-terminal device is proposed as a compact oscillation neuron for the parallel read operation from the resistive synaptic array. The weighted sum is represented by the frequency of the oscillation neuron. Compared to the complex CMOS integrate-and-fire neuron with tens of transistors, the oscillation neuron achieves significant area reduction, thereby alleviating the column pitch matching problem of the peripheral circuitry in resistive memories. Firstly, the impact of MIT device characteristics on the weighted sum accuracy is investigated when the oscillation neuron is connected to a single resistive synaptic device. Secondly, the array-level performance is explored when the oscillation neurons are connected to the resistive synaptic array. To address the interference of oscillation between columns in simple cross-point arrays, a 2-transistor-1-resistor (2T1R) array architecture is proposed at negligible increase in array area. Finally, the circuit-level benchmark of the proposed oscillation neuron with the CMOS neuron is performed. At single neuron node level, oscillation neuron shows textgreater12.5X reduction of area. At 128×128 array level, oscillation neuron shows a reduction of ˜4% total area, textgreater30% latency, ˜5X energy and ˜40X leakage power, demonstrating its advantage of being integrated into the resistive synaptic array for neuro-inspired computing.
Convolutional Neural Network (CNN) is a powerful technique widely used in computer vision area, which also demands much more computations and memory resources than traditional solutions. The emerging metal-oxide resistive random-access memory (RRAM) and RRAM crossbar have shown great potential on neuromorphic applications with high energy efficiency. However, the interfaces between analog RRAM crossbars and digital peripheral functions, namely Analog-to-Digital Converters (ADCs) and Digital-to-Analog Converters (DACs), consume most of the area and energy of RRAM-based CNN design due to the large amount of intermediate data in CNN. In this paper, we propose an energy efficient structure for RRAM-based CNN. Based on the analysis of data distribution, a quantization method is proposed to transfer the intermediate data into 1 bit and eliminate DACs. An energy efficient structure using input data as selection signals is proposed to reduce the ADC cost for merging results of multiple crossbars. The experimental results show that the proposed method and structure can save 80% area and more than 95% energy while maintaining the same or comparable classification accuracy of CNN on MNIST.
Users’ online behaviors such as ratings and examination of items are recognized as one of the most valuable sources of information for learning users’ preferences in order to make personalized recommendations. But most previous works focus on modeling only one type of users’ behaviors such as numerical ratings or browsing records, which are referred to as explicit feedback and implicit feedback, respectively. In this article, we study a Semisupervised Collaborative Recommendation (SSCR) problem with labeled feedback (for explicit feedback) and unlabeled feedback (for implicit feedback), in analogy to the well-known Semisupervised Learning (SSL) setting with labeled instances and unlabeled instances. SSCR is associated with two fundamental challenges, that is, heterogeneity of two types of users’ feedback and uncertainty of the unlabeled feedback. As a response, we design a novel Self-Transfer Learning (sTL) algorithm to iteratively identify and integrate likely positive unlabeled feedback, which is inspired by the general forward/backward process in machine learning. The merit of sTL is its ability to learn users’ preferences from heterogeneous behaviors in a joint and selective manner. We conduct extensive empirical studies of sTL and several very competitive baselines on three large datasets. The experimental results show that our sTL is significantly better than the state-of-the-art methods.
As the use of social media technologies proliferates in organizations, it is important to understand the nefarious behaviors, such as cyberbullying, that may accompany such technology use and how to discourage these behaviors. We draw from neutralization theory and the criminological theory of general deterrence to develop and empirically test a research model to explain why cyberbullying may occur and how the behavior may be discouraged. We created a research model of three second-order formative constructs to examine their predictive influence on intentions to cyberbully. We used PLS- SEM to analyze the responses of 174 Facebook users in two different cyberbullying scenarios. Our model suggests that neutralization techniques enable cyberbullying behavior and while sanction certainty is an important deterrent, sanction severity appears ineffective. We discuss the theoretical and practical implications of our model and results.
We study the value of data privacy in a game-theoretic model of trading private data, where a data collector purchases private data from strategic data subjects (individuals) through an incentive mechanism. The private data of each individual represents her knowledge about an underlying state, which is the information that the data collector desires to learn. Different from most of the existing work on privacy-aware surveys, our model does not assume the data collector to be trustworthy. Then, an individual takes full control of its own data privacy and reports only a privacy-preserving version of her data. In this paper, the value of ε units of privacy is measured by the minimum payment of all nonnegative payment mechanisms, under which an individual's best response at a Nash equilibrium is to report the data with a privacy level of ε. The higher ε is, the less private the reported data is. We derive lower and upper bounds on the value of privacy which are asymptotically tight as the number of data subjects becomes large. Specifically, the lower bound assures that it is impossible to use less amount of payment to buy ε units of privacy, and the upper bound is given by an achievable payment mechanism that we designed. Based on these fundamental limits, we further derive lower and upper bounds on the minimum total payment for the data collector to achieve a given learning accuracy target, and show that the total payment of the designed mechanism is at most one individual's payment away from the minimum.
Centralized spectrum management is one of the key dynamic spectrum access (DSA) mechanisms proposed to govern the spectrum sharing between government incumbent users (IUs) and commercial secondary users (SUs). In the current centralized DSA designs, the operation data of both government IUs and commercial SUs needs to be shared with a central server. However, the operation data of government IUs is often classified information and the SU operation data may also be commercial secret. The current system design dissatisfies the privacy requirement of both IUs and SUs since the central server is not necessarily trust-worthy for holding such sensitive operation data. To address the privacy issue, this paper presents a privacy-preserving centralized DSA system (P2-SAS), which realizes the complex spectrum allocation process of DSA through efficient secure multi-party computation. In P2-SAS, none of the IU or SU operation data would be exposed to any snooping party, including the central server itself. We formally prove the correctness and privacy-preserving property of P2-SAS and evaluate its scalability and practicality using experiments based on real-world data. Experiment results show that P2-SAS can respond an SU's spectrum request in 6.96 seconds with communication overhead of less than 4 MB.
TLS and SSH are two of the most commonly used protocols for securing Internet traffic. Many of the implementations of these protocols rely on the cryptographic primitives provided in the OpenSSL library. In this work we disclose a vulnerability in OpenSSL, affecting all versions and forks (e.g. LibreSSL and BoringSSL) since roughly October 2005, which renders the implementation of the DSA signature scheme vulnerable to cache-based side-channel attacks. Exploiting the software defect, we demonstrate the first published cache-based key-recovery attack on these protocols: 260 SSH-2 handshakes to extract a 1024/160-bit DSA host key from an OpenSSH server, and 580 TLS 1.2 handshakes to extract a 2048/256-bit DSA key from an stunnel server.
Go is a programming language developed at Google, with channel-based concurrent features based on CSP. Go can detect global communication deadlocks at runtime when all threads of execution are blocked, but deadlocks in other paths of execution could be undetected. We present a new static analyser for concurrent Go code to find potential communication errors such as communication mismatch and deadlocks at compile time. Our tool extracts the communication operations as session types, which are then converted into Communicating Finite State Machines (CFSMs). Finally, we apply a recent theoretical result on choreography synthesis to generate a global graph representing the overall communication pattern of a concurrent program. If the synthesis is successful, then the program is free from communication errors. We have implemented the technique in a tool, and applied it to analyse common Go concurrency patterns and an open source application with over 700 lines of code.
Sybil attacks, in which an adversary creates a large number of identities, present a formidable problem for the robustness of recommendation systems. One promising method of sybil detection is to use data from social network ties to implicitly infer trust. Previous work along this dimension typically a) assumes that it is difficult/costly for an adversary to create edges to honest nodes in the network; and b) limits the amount of damage done per such edge, using conductance-based methods. However, these methods fail to detect a simple class of sybil attacks which have been identified in online systems. Indeed, conductance-based methods seem inherently unable to do so, as they are based on the assumption that creating many edges to honest nodes is difficult, which seems to fail in real-world settings. We create a sybil defense system that accounts for the adversary's ability to launch such attacks yet provably withstands them by: Notassuminganyrestrictiononthenumberofedgesanadversarycanform,butinsteadmakingamuch weaker assumption that creating edges from sybils to most honest nodes is difficult, yet allowing that the remaining nodes can be freely connected to. Relaxing the goal from classifying all nodes as honest or sybil to the goal of classifying the "core" nodes of the network as honest; and classifying no sybil nodes as honest. Exploiting a new, for sybil detection, social network property, namely, that nodes can be embedded in low-dimensional spaces.
Cyber-Physical Embedded Systems (CPESs) are distributed embedded systems integrated with various actuators and sensors. When it comes to the issue of CPES security, the most significant problem is the security of Embedded Sensor Networks (ESNs). With the continuous growth of ESNs, the security of transferring data from sensors to their destinations has become an important research area. Due to the limitations in power, storage, and processing capabilities, existing security mechanisms for wired or wireless networks cannot apply directly to ESNs. Meanwhile, ESNs are likely to be attacked by different kinds of attacks in industrial scenarios. Therefore, there is a need to develop new techniques or modify the current security mechanisms to overcome these problems. In this article, we focus on Intrusion Detection (ID) techniques and propose a new attack-defense game model to detect malicious nodes using a repeated game approach. As a direct consequence of the game model, attackers and defenders make different strategies to achieve optimal payoffs. Importantly, error detection and missing detection are taken into consideration in Intrusion Detection Systems (IDSs), where a game tree model is introduced to solve this problem. In addition, we analyze and prove the existence of pure Nash equilibrium and mixed Nash equilibrium. Simulations show that the proposed model can both reduce energy consumption by up to 50% compared with the existing All Monitor (AM) model and improve the detection rate by up to 10% to 15% compared with the existing Cluster Head (CH) monitor model.
In the age of the IoT (Internet of Things), a random number generator plays an important role of generating encryption keys and authenticating a piece of an embedded equipment. The random numbers are required to be uniformly distributed statistically and unpredictable. To satisfy the requirements, a physical true random number generator (TR-NG) is used. In this paper, we implement a TRNG using an SR latch on 40 nm CMOS ASIC. This TRNG generates the random number by exclusive ORing (XORing) the outputs of 256 SR latches. We evaluate the random number generated using statistical tests in accordance with BSI AIS 20/31 and using an IID (Independent and Identically Distributed) test, and the entropy estimation in accordance with NIST SP800-90B changing the supply voltage and environmental temperature within its rated values. As a result, the TRNG passed all the tests except in a few cases. From this experiment, we found that the TRNG has a robustness against environmental change. The power consumption is 18.8 micro Watt at 2.5 MHz. This TRNG is suitable for embedded systems to improve security in IoT systems.
In this research, we aim to suggest a method for designing trustworthy PRVAs (product recommendation virtual agents). We define an agent's trustworthiness as being operated by user emotion and knowledgeableness perceived by humans. Also, we suggest a user inner state transition model for increasing trust. To increase trust, we aim to cause user emotion to transition to positive by using emotional contagion and to cause user knowledgeableness perceived to become higher by increasing an agent's knowledge. We carried out two experiments to inspect this model. In experiment 1, the PRVAs recommended package tours and became highly knowledgeable in the latter half of ten recommendations. In experiment 2, the PRVAs recommended the same package tours and expressed a positive emotion in the latter half. As a result, participants' inner states transitioned as we expected, and it was proved that this model was valuable for PRVA recommendation.
By maintaining the data in main memory, in-memory databases dramatically reduce the I/O cost of transaction processing. However, for recovery purposes, in-memory systems still need to flush the log to disk, which incurs a substantial number of I/Os. Recently, command logging has been proposed to replace the traditional data log (e.g., ARIES logging) in in-memory databases. Instead of recording how the tuples are updated, command logging only tracks the transactions that are being executed, thereby effectively reducing the size of the log and improving the performance. However, when a failure occurs, all the transactions in the log after the last checkpoint must be redone sequentially and this significantly increases the cost of recovery. In this paper, we first extend the command logging technique to a distributed system, where all the nodes can perform their recovery in parallel. We show that in a distributed system, the only bottleneck of recovery caused by command logging is the synchronization process that attempts to resolve the data dependency among the transactions. We then propose an adaptive logging approach by combining data logging and command logging. The percentage of data logging versus command logging becomes a tuning knob between the performance of transaction processing and recovery to meet different OLTP requirements, and a model is proposed to guide such tuning. Our experimental study compares the performance of our proposed adaptive logging, ARIES-style data logging and command logging on top of H-Store. The results show that adaptive logging can achieve a 10x boost for recovery and a transaction throughput that is comparable to that of command logging.
Software applications run on a variety of platforms (filesystems, virtual slices, mobile hardware, etc.) that do not provide 100% uptime. As such, these applications may crash at any unfortunate moment losing volatile data and, when re-launched, they must be able to correctly recover from potentially inconsistent states left on persistent storage. From a verification perspective, crash recovery bugs can be particularly frustrating because, even when it has been formally proved for a program that it satisfies a property, the proof is foiled by these external events that crash and restart the program. In this paper we first provide a hierarchical formal model of what it means for a program to be crash recoverable. Our model captures the recoverability of many real world programs, including those in our evaluation which use sophisticated recovery algorithms such as shadow paging and write-ahead logging. Next, we introduce a novel technique capable of automatically proving that a program correctly recovers from a crash via a reduction to reachability. Our technique takes an input control-flow automaton and transforms it into an encoding that blends the capture of snapshots of pre-crash states into a symbolic search for a proof that recovery terminates and every recovered execution simulates some crash-free execution. Our encoding is designed to enable one to apply existing abstraction techniques in order to do the work that is necessary to prove recoverability. We have implemented our technique in a tool called Eleven82, capable of analyzing C programs to detect recoverability bugs or prove their absence. We have applied our tool to benchmark examples drawn from industrial file systems and databases, including GDBM, LevelDB, LMDB, PostgreSQL, SQLite, VMware and ZooKeeper. Within minutes, our tool is able to discover bugs or prove that these fragments are crash recoverable.