Visible to the public An Evaluation of Constituency-based Hyponymy Extraction from Privacy Policies Conflict Detection Enabled

TitleAn Evaluation of Constituency-based Hyponymy Extraction from Privacy Policies
Publication TypeConference Paper
Year of Publication2017
AuthorsMorgan Evans, Jaspreet Bhatia, Sudarshan Wadkar, Travis Breaux
Conference Name25th IEEE International Requirements Engineering Conference
Conference LocationLisbon, Spain
Keywordscompliance, hypernym, Hyponym, natural language processing, Ontology, privacy policy
Abstract

Requirements analysts can model regulated data practices to identify and reason about risks of noncompliance. If terminology is inconsistent or ambiguous, however, these models and their conclusions will be unreliable. To study this problem, we investigated an approach to automatically construct an information type ontology by identifying information type hyponymy in privacy policies using Tregex patterns. Tregex is a utility to match regular expressions against constituency parse trees, which are hierarchical expressions of natural language clauses, including noun and verb phrases. We discovered the Tregex patterns by applying content analysis to 30 privacy policies from six domains (shopping, telecommunication, social networks, employment, health, and news.) From this dataset, three semantic and four lexical categories of hyponymy emerged based on category completeness and wordorder. Among these, we identified and empirically evaluated 72 Tregex patterns to automate the extraction of hyponyms from privacy policies. The patterns match information type hyponyms with an average precision of 0.72 and recall of 0.74.

Citation Keynode-36394