Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence
Title | Enhanced Privacy and Data Protection using Natural Language Processing and Artificial Intelligence |
Publication Type | Conference Paper |
Year of Publication | 2020 |
Authors | Martinelli, F., Marulli, F., Mercaldo, F., Marrone, S., Santone, A. |
Conference Name | 2020 International Joint Conference on Neural Networks (IJCNN) |
Keywords | artificial intelligence, artificial intelligence systems, artificial intelligence training, data obfuscation procedures, data privacy, data protection, digital documents, expert systems, Human Behavior, knowledge transfer, machine learning, Medical services, natural language processing, physical documents, privacy, privacy preservation, pubcrawl, Resiliency, Scalability, Sensitive Data Extraction, sensitive information, specific-domain knowledge data sets, Task Analysis, text analysis, unlabeled domain-specific documents corpora, unsupervised machine learning |
Abstract | Artificial Intelligence systems have enabled significant benefits for users and society, but whilst the data for their feeding are always increasing, a side to privacy and security leaks is offered. The severe vulnerabilities to the right to privacy obliged governments to enact specific regulations to ensure privacy preservation in any kind of transaction involving sensitive information. In the case of digital and/or physical documents comprising sensitive information, the right to privacy can be preserved by data obfuscation procedures. The capability of recognizing sensitive information for obfuscation is typically entrusted to the experience of human experts, who are over-whelmed by the ever increasing amount of documents to process. Artificial intelligence could proficiently mitigate the effort of the human officers and speed up processes. Anyway, until enough knowledge won't be available in a machine readable format, automatic and effectively working systems can't be developed. In this work we propose a methodology for transferring and leveraging general knowledge across specific-domain tasks. We built, from scratch, specific-domain knowledge data sets, for training artificial intelligence models supporting human experts in privacy preserving tasks. We exploited a mixture of natural language processing techniques applied to unlabeled domain-specific documents corpora for automatically obtain labeled documents, where sensitive information are recognized and tagged. We performed preliminary tests just over 10.000 documents from the healthcare and justice domains. Human experts supported us during the validation. Results we obtained, estimated in terms of precision, recall and F1-score metrics across these two domains, were promising and encouraged us to further investigations. |
DOI | 10.1109/IJCNN48605.2020.9206801 |
Citation Key | martinelli_enhanced_2020 |
- natural language processing
- unsupervised machine learning
- unlabeled domain-specific documents corpora
- text analysis
- Task Analysis
- specific-domain knowledge data sets
- sensitive information
- Sensitive Data Extraction
- Scalability
- Resiliency
- pubcrawl
- privacy preservation
- physical documents
- expert systems
- Medical services
- machine learning
- knowledge transfer
- Human behavior
- digital documents
- Data protection
- data privacy
- data obfuscation procedures
- artificial intelligence training
- artificial intelligence systems
- Artificial Intelligence
- privacy