Title | Towards a neural language model for signature extraction from forensic logs |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Thaler, S., Menkonvski, V., Petkovic, M. |
Conference Name | 2017 5th International Symposium on Digital Forensic and Security (ISDFS) |
Date Published | apr |
Keywords | Clustering algorithms, complex relationship learning, Data analysis, digital forensics, error-prone, forensic log analysis, Forensics, handcrafted algorithms, heuristics, Human Behavior, knowledge based systems, learning (artificial intelligence), log line clustering, log message, natural language processing, natural language text, neural language model, neural nets, Neural networks, nonmutable part identification, pattern clustering, Predictive models, pubcrawl, Resiliency, rule-based approaches, rule-based systems, Scalability, signature extraction frameworks, Software, text analysis, use cases |
Abstract | Signature extraction is a critical preprocessing step in forensic log analysis because it enables sophisticated analysis techniques to be applied to logs. Currently, most signature extraction frameworks either use rule-based approaches or handcrafted algorithms. Rule-based systems are error-prone and require high maintenance effort. Hand-crafted algorithms use heuristics and tend to work well only for specialized use cases. In this paper we present a novel approach to extract signatures from forensic logs that is based on a neural language model. This language model learns to identify mutable and non-mutable parts in a log message. We use this information to extract signatures. Neural language models have shown to work extremely well for learning complex relationships in natural language text. We experimentally demonstrate that our model can detect which parts are mutable with an accuracy of 86.4%. We also show how extracted signatures can be used for clustering log lines. |
DOI | 10.1109/ISDFS.2017.7916497 |
Citation Key | thaler_towards_2017 |