Measuring and Improving Management of Today's PKI - UMD - January 2016
PI(s): David Levin
Researchers: Frank Cangialosi (UMD, undergraduate)
PROJECT OVERVIEW
Authentication allows a user to know that, when they go to a website, they are truly communicating with whom they expect, and not an impersonator. This critical property is made possible with a set of cryptographic and networking protocols collectively referred to as a public key infrastructure (PKI). While online use of the PKI is mostly automated, there is a surprising amount of human intervention in management tasks that are crucial to its proper operation. This project studies the following questions: Are administrators doing what users of the Web need them to do in order to ensure security? and, how can we help facilitate or automate these tasks?
We are performing internet-wide measurements of how online certificates are actively being managed, including how quickly and thoroughly administrators revoke their certificates after a potential key compromise, and what role third-party hosting services play. In particular, we find that CDNs (content distribution networks)—which serve content for many of the most popular websites—appear to have access to content providers' private keys, violating the fundamental assumption of PKIs (i.e., no one shares their private keys). We are performing the first widespread analyses of the extent to which websites are sharing their private keys, and exploring what impact this has on the management of the PKI and on users' privacy and security in general.
HARD PROBLEM(S) ADDRESSED
Metrics; Human Behavior.
ACCOMPLISHMENT HIGHLIGHTS
- Developed techniques for determining whether it is the hosting domain who manages their customers' certificates or the customers themselves. This is based on measuring how the customers' certificates were revoked after a well-known key compromise event. This is the first such technique of its kind, and provides unique insight into how the Web's PKI is managed. Our results thus far indicate that, when third-party hosting services manage their customers' certificates, they regularly react slower than when individual customers manage them, but eventually revoke more of the certificates (when they revoke any at all). We generated a ground-truth dataset, and evaluated it to show high accuracy in its ability to classify. There do remain some domains that are difficult to capture; we are currently investigating how best to do so.
- Refined our technique for determining whether two domains are owned by the same company, a technique we refer to as the "domain equivalence problem." We have incorporated more data, which allowed us to select a new set of features. We now achieve ~3% false positive rate and ~12% false negative rate. We are currently making a more efficient implementation so that we can apply this to all the third-party hosting domains in our dataset as well as all the domains they are hosting to determine how many different companies (not just how many different domains) appear on a given certificate together, and so on.
- Began investigating the quality of key types and sources of randomness for servers among the most popular websites.
COMMUNITY INTERACTION
This quarter, Levin has presented the results to groups of graduate and undergraduate students at UMD.