Visible to the public Pattern Recognition and Reconstruction: Detecting Malicious Deletions in Textual Communications

TitlePattern Recognition and Reconstruction: Detecting Malicious Deletions in Textual Communications
Publication TypeConference Paper
Year of Publication2021
AuthorsSolanke, Abiodun A., Chen, Xihui, Ramírez-Cruz, Yunior
Conference Name2021 IEEE International Conference on Big Data (Big Data)
KeywordsBig Data, data deletion, digital forensics, feature extraction, Forensic Artificial Intelligence, Image edge detection, Linguistics, metadata, privacy, pubcrawl, Scalability, Semantics, text mining, Variational Graph Autoencoders
AbstractDigital forensic artifacts aim to provide evidence from digital sources for attributing blame to suspects, assessing their intents, corroborating their statements or alibis, etc. Textual data is a significant source of artifacts, which can take various forms, for instance in the form of communications. E-mails, memos, tweets, and text messages are all examples of textual communications. Complex statistical, linguistic and other scientific procedures can be manually applied to this data to uncover significant clues that point the way to factual information. While expert investigators can undertake this task, there is a possibility that critical information is missed or overlooked. The primary objective of this work is to aid investigators by partially automating the detection of suspicious e-mail deletions. Our approach consists in building a dynamic graph to represent the temporal evolution of communications, and then using a Variational Graph Autoencoder to detect possible e-mail deletions in this graph. Our model uses multiple types of features for representing node and edge attributes, some of which are based on metadata of the messages and the rest are extracted from the contents using natural language processing and text mining techniques. We use the autoencoder to detect missing edges, which we interpret as potential deletions; and to reconstruct their features, from which we emit hypotheses about the topics of deleted messages. We conducted an empirical evaluation of our model on the Enron e-mail dataset, which shows that our model is able to accurately detect a significant proportion of missing communications and to reconstruct the corresponding topic vectors.
DOI10.1109/BigData52589.2021.9671921
Citation Keysolanke_pattern_2021