Privacy policies are used to communicate company data practices to consumers and must be accurate and comprehensive. Each policy author is free to use their own nomenclature when describing data practices, which leads to different ways in which similar information types are described across policies. A formal ontology can help policy authors, users and regulators consistently check how data practice descriptions relate to other interpretations of information types. In this paper, we describe an empirical method for manually constructing an information type ontology from privacy policies. The method consists of seven heuristics that explain how to infer hypernym, meronym and synonym relationships from information type phrases, which we discovered using grounded analysis of five privacy policies. The method was evaluated on 50 mobile privacy policies which produced an ontology consisting of 355 unique information type names. Based on the manual results, we describe an automated technique consisting of 14 reusable semantic rules to extract hypernymy, meronymy, and synonymy relations from information type phrases. The technique was evaluated on the manually constructed ontology to yield .95 precision and .51 recall.
|