Visible to the public Automated U.S diplomatic cables security classification: Topic model pruning vs. classification based on clusters

TitleAutomated U.S diplomatic cables security classification: Topic model pruning vs. classification based on clusters
Publication TypeConference Paper
Year of Publication2017
AuthorsAlzhrani, K., Rudd, E. M., Chow, C. E., Boult, T. E.
Conference Name2017 IEEE International Symposium on Technologies for Homeland Security (HST)
Date Publishedapr
Keywordsadversarial environment, automated US diplomatic cables security classification, automatic unstructured text security class detection, cyberattacks, data infrastructure protection, data leak prevention system, data protection, DLP system, Edward Snowden incident, Electronic mail, email leakage, feature extraction, government data processing, Human Behavior, insider threats, natural language processing, pattern classification, printers, protection mechanism, pubcrawl, Resiliency, Scalability, security, security of data, sensitive information leaks, sensitive text data, Sensitivity, Springs, text analysis, text leakage, text security classification, topic model pruning, Training, untrusted channels, US government, WikiLeaks dataset
AbstractThe U.S Government has been the target for cyberattacks from all over the world. Just recently, former President Obama accused the Russian government of the leaking emails to Wikileaks and declared that the U.S. might be forced to respond. While Russia denied involvement, it is clear that the U.S. has to take some defensive measures to protect its data infrastructure. Insider threats have been the cause of other sensitive information leaks too, including the infamous Edward Snowden incident. Most of the recent leaks were in the form of text. Due to the nature of text data, security classifications are assigned manually. In an adversarial environment, insiders can leak texts through E-mail, printers, or any untrusted channels. The optimal defense is to automatically detect the unstructured text security class and enforce the appropriate protection mechanism without degrading services or daily tasks. Unfortunately, existing Data Leak Prevention (DLP) systems are not well suited for detecting unstructured texts. In this paper, we compare two recent approaches in the literature for text security classification, evaluating them on actual sensitive text data from the WikiLeaks dataset.
DOI10.1109/THS.2017.7943471
Citation Keyalzhrani_automated_2017