Automated U.S diplomatic cables security classification: Topic model pruning vs. classification based on clusters

Submitted by K_Hooper on Wed, 01/10/2018 - 10:14am

Title	Automated U.S diplomatic cables security classification: Topic model pruning vs. classification based on clusters
Publication Type	Conference Paper
Year of Publication	2017
Authors	Alzhrani, K., Rudd, E. M., Chow, C. E., Boult, T. E.
Conference Name	2017 IEEE International Symposium on Technologies for Homeland Security (HST)
Date Published	apr
Keywords	adversarial environment, automated US diplomatic cables security classification, automatic unstructured text security class detection, cyberattacks, data infrastructure protection, data leak prevention system, data protection, DLP system, Edward Snowden incident, Electronic mail, email leakage, feature extraction, government data processing, Human Behavior, insider threats, natural language processing, pattern classification, printers, protection mechanism, pubcrawl, Resiliency, Scalability, security, security of data, sensitive information leaks, sensitive text data, Sensitivity, Springs, text analysis, text leakage, text security classification, topic model pruning, Training, untrusted channels, US government, WikiLeaks dataset
Abstract	The U.S Government has been the target for cyberattacks from all over the world. Just recently, former President Obama accused the Russian government of the leaking emails to Wikileaks and declared that the U.S. might be forced to respond. While Russia denied involvement, it is clear that the U.S. has to take some defensive measures to protect its data infrastructure. Insider threats have been the cause of other sensitive information leaks too, including the infamous Edward Snowden incident. Most of the recent leaks were in the form of text. Due to the nature of text data, security classifications are assigned manually. In an adversarial environment, insiders can leak texts through E-mail, printers, or any untrusted channels. The optimal defense is to automatically detect the unstructured text security class and enforce the appropriate protection mechanism without degrading services or daily tasks. Unfortunately, existing Data Leak Prevention (DLP) systems are not well suited for detecting unstructured texts. In this paper, we compare two recent approaches in the literature for text security classification, evaluating them on actual sensitive text data from the WikiLeaks dataset.
DOI	10.1109/THS.2017.7943471
Citation Key	alzhrani_automated_2017

Groups:

Science of Security VO