An Evaluation of Constituency-based Hyponymy Extraction from Privacy Policies
Title | An Evaluation of Constituency-based Hyponymy Extraction from Privacy Policies |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Morgan Evans, Jaspreet Bhatia, Sudarshan Wadkar, Travis Breaux |
Conference Name | 25th IEEE International Requirements Engineering Conference |
Conference Location | Lisbon, Spain |
Keywords | compliance, hypernym, Hyponym, natural language processing, Ontology, privacy policy |
Abstract | Requirements analysts can model regulated data practices to identify and reason about risks of noncompliance. If terminology is inconsistent or ambiguous, however, these models and their conclusions will be unreliable. To study this problem, we investigated an approach to automatically construct an information type ontology by identifying information type hyponymy in privacy policies using Tregex patterns. Tregex is a utility to match regular expressions against constituency parse trees, which are hierarchical expressions of natural language clauses, including noun and verb phrases. We discovered the Tregex patterns by applying content analysis to 30 privacy policies from six domains (shopping, telecommunication, social networks, employment, health, and news.) From this dataset, three semantic and four lexical categories of hyponymy emerged based on category completeness and wordorder. Among these, we identified and empirically evaluated 72 Tregex patterns to automate the extraction of hyponyms from privacy policies. The patterns match information type hyponyms with an average precision of 0.72 and recall of 0.74. |
Citation Key | node-36394 |