Measuring and Improving Management of Today's PKI - UMD - October 2016
PI(s): David Levin
Researchers: Frank Cangialosi (UMD, undergraduate)
PROJECT OVERVIEW
Authentication allows a user to know, when they go to a website, that they are truly communicating with whom they expect, and not an impersonator. This critical property is made possible with a set of cryptographic and networking protocols collectively referred to as a public key infrastructure (PKI). While online use of the PKI is mostly automated, there is a surprising amount of human intervention in management tasks that are crucial to its proper operation. This project studies the following questions: Are administrators doing what users of the Web need them to do in order to ensure security? And, how can we help facilitate or automate these tasks?
We are performing internet-wide measurements of how online certificates are actively being managed, including how quickly and thoroughly administrators revoke their certificates after a potential key compromise, and what role third-party hosting services play. In particular, we find that CDNs (content distribution networks)—which serve content for many of the most popular websites—appear to have access to content providers' private keys, violating the fundamental assumption of PKIs (i.e., no one shares their private keys). We are performing the first widespread analyses of the extent to which websites are sharing their private keys, and exploring what impact this has on the management of the PKI and on users' privacy and security in general.
HARD PROBLEM(S) ADDRESSED
Metrics; Human Behavior.
ACCOMPLISHMENT HIGHLIGHTS
- We analyzed invalid certificates in the Web's PKI and have found that almost 88% of SSL/TLS certificates advertised over the past three years are invalid. Because measurement studies of the HTTPS ecosystem generally focus only on valid certificates, this means that the vast majority of certificates available in the public Web had not been studied. Through our analysis, we have demonstrated that despite their invalidity, much can be understood from invalid certificates, including the ability to track end-user devices by the certificates they give out. We demonstrated that invalid SSL certificates allow us to uniquely track over 6.7M devices. Taken together, our results open up a heretofore largely-ignored portion of the Internet to further study.
During this past quarter, we investigated a discrepency between the datasets we use in this study, which reported dramatically different sets of hosts. We concluded that this is largely due to blacklisting (either on behalf of the source or target of the measurements), and updated our analysis to account for this. We are curating our analysis scripts and datasets to make them publicly available.This work was accepted to the ACM Internet Measurement Conference (ACM IMC 2016).
- Key sharing is strictly forbidden (and typically assumed not to happen), but in reality, many websites—and the majority the most popular ones—are hosted at least in part by third parties such as content distribution networks (CDNs) or web hosting services. Put simply: administrators of websites who deal with critically sensitive user data are giving their private keys to third parties. Critically, this sharing of keys is undetectable by most users, and widely unknown even among researchers. We performed a wide-scale measurement study of administrators' decisions regarding key sharing with third-party hosting services and the impact this sharing has on management. We found widespread key sharing, outsourcing of key management, and that third-party providers are slightly more thorough (though slower) to react to key compromise
During this past quarter, we refined our methods of determining who hosts a given certificate, and re-performed our analysis in light of these new methods. This has resulted in an unprecedented dataset of (a) who hosts and who manages given certificates, (b) which domain names and Autonomous Systems are owned by the same companies. We are finalizing our curating of this dataset to make it publicly available.
This work was accepted to the ACM Conference on Computer and Communications Security (ACM CCS 2016).
- Browsers must periodically download revocation information from CAs, or else the efforts on behalf of website administrators would be wasted. Unfortunately, browser developers are reluctant to do this, as it consumes bandwidth and potentially increases page load times. We have developed techniques for more efficiently disseminating revocation information. Our initial designs include probabilistic data structures known as Bloom filters that browsers can download and query to determine if a given certificate has been revoked. One of the key concerns with Bloom filters is false positives, which in this case would correspond to a browser believing that a certificate has been revoked when in fact it has not (a fail-safe default, but also a potential source of inefficiency). We have thus developed techniques for identifying false positives and disseminating those, as well. We are also investigating group signature-based schemes that permit multiple CAs to their portion of a Bloom filter that aggregates all of the CAs' revocation information. We have developed a Firefox plugin that implements our initial design.
COMMUNITY INTERACTION
This quarter, Levin presented the results to groups of graduate and undergraduate students at UMD, as well as students and faculty at several other U.S. universities. Levin also presented results to international collaborators at the University of Jordan in Amman, Jordan. Levin also presented these results (particularly those pertaining to key sharing) to CloudFlare, one of the largest CDNs to host HTTPS content.