Biblio
Inculcating an attacker mindset (i.e. learning to think like an attacker) is an essential skill for engineers and administrators to improve the overall security of software. Describing the approach that adversaries use to discover and exploit vulnerabilities to infiltrate software systems can help inform such an attacker mindset. Aims: Our goal is to assist developers and administrators in inculcating an attacker mindset by proposing an approach to codify attacker behavior in cybersecurity penetration testing competition. Method: We use an existing multimodal dataset of events captured during the 2018 National Collegiate Penetration Testing Competition (CPTC'18) to characterize the approach a team of attackers used to discover and exploit vulnerabilities. Results: We collected 44 events to characterize the approach that one of the participating teams took to discover and exploit seven vulnerabilities. We used the MITRE ATT&CK ™ framework to codify the approach in terms of tactics and techniques. Conclusions: We show that characterizing attackers' campaign as a chronological sequence of MITRE ATT&CK ™ tactics and techniques is feasible. We hope that such a characterization can inform the attacker mindset of engineers and administrators in their pursuit of engineering secure software systems.
When reasoning about software security, researchers and practitioners use the phrase “attack surface” as a metaphor for risk. Enumerate and minimize the ways attackers can break in then risk is reduced and the system is better pro- tected, the metaphor says. But software systems are much more complicated than their surfaces. We propose function- and file-level attack surface metrics—proximity and risky walk—that enable fine-grained risk assessment. Our risky walk metric is highly configurable: we use PageRank on a probability-weighted call graph to simulate attacker be- havior of finding or exploiting a vulnerability. We provide evidence-based guidance for deploying these metrics, includ- ing an extensive parameter tuning study. We conducted an empirical study on two large open source projects, FFmpeg and Wireshark, to investigate the potential correlation be- tween our metrics and historical post-release vulnerabilities. We found our metrics to be statistically significantly asso- ciated with vulnerable functions/files with a small-to-large Cohen’s d effect size. Our prediction model achieved an increase of 36% (in FFmpeg) and 27% (in Wireshark) in the average value of F 2 -measure over a base model built with SLOC and coupling metrics. Our prediction model outperformed comparable models from prior literature with notable improvements: 58% reduction in false negative rate, 81% reduction in false positive rate, and 548% increase in F 2 -measure. These metrics advance vulnerability prevention by (a) being flexible in terms of granularity, (b) performing better than vulnerability prediction literature, and (c) being tunable so that practitioners can tailor the metrics to their products and better assess security risk.
Cyberinfrastructure is increasingly becoming target of a wide spectrum of attacks from Denial of
Service to large-scale defacement of the digital presence of an organization. Intrusion Detection System
(IDSs) provide administrators a defensive edge over intruders lodging such malicious attacks. However,
with the sheer number of different IDSs available, one has to objectively assess the capabilities of different
IDSs to select an IDS that meets specific organizational requirements. A prerequisite to enable such
an objective assessment is the implicit comparability of IDS literature. In this study, we review IDS
literature to understand the implicit comparability of IDS literature from the perspective of metrics
used in the empirical evaluation of the IDS. We identified 22 metrics commonly used in the empirical
evaluation of IDS and constructed search terms to retrieve papers that mention the metric. We manually
reviewed a sample of 495 papers and found 159 of them to be relevant. We then estimated the number
of relevant papers in the entire set of papers retrieved from IEEE. We found that, in the evaluation
of IDSs, multiple different metrics are used and the trade-off between metrics is rarely considered. In
a retrospective analysis of the IDS literature, we found the the evaluation criteria has been improving
over time, albeit marginally. The inconsistencies in the use of evaluation metrics may not enable direct
comparison of one IDS to another.