Biblio
Cyberinfrastructure is increasingly becoming target of a wide spectrum of attacks from Denial of
Service to large-scale defacement of the digital presence of an organization. Intrusion Detection System
(IDSs) provide administrators a defensive edge over intruders lodging such malicious attacks. However,
with the sheer number of different IDSs available, one has to objectively assess the capabilities of different
IDSs to select an IDS that meets specific organizational requirements. A prerequisite to enable such
an objective assessment is the implicit comparability of IDS literature. In this study, we review IDS
literature to understand the implicit comparability of IDS literature from the perspective of metrics
used in the empirical evaluation of the IDS. We identified 22 metrics commonly used in the empirical
evaluation of IDS and constructed search terms to retrieve papers that mention the metric. We manually
reviewed a sample of 495 papers and found 159 of them to be relevant. We then estimated the number
of relevant papers in the entire set of papers retrieved from IEEE. We found that, in the evaluation
of IDSs, multiple different metrics are used and the trade-off between metrics is rarely considered. In
a retrospective analysis of the IDS literature, we found the the evaluation criteria has been improving
over time, albeit marginally. The inconsistencies in the use of evaluation metrics may not enable direct
comparison of one IDS to another.
When reasoning about software security, researchers and practitioners use the phrase “attack surface” as a metaphor for risk. Enumerate and minimize the ways attackers can break in then risk is reduced and the system is better pro- tected, the metaphor says. But software systems are much more complicated than their surfaces. We propose function- and file-level attack surface metrics—proximity and risky walk—that enable fine-grained risk assessment. Our risky walk metric is highly configurable: we use PageRank on a probability-weighted call graph to simulate attacker be- havior of finding or exploiting a vulnerability. We provide evidence-based guidance for deploying these metrics, includ- ing an extensive parameter tuning study. We conducted an empirical study on two large open source projects, FFmpeg and Wireshark, to investigate the potential correlation be- tween our metrics and historical post-release vulnerabilities. We found our metrics to be statistically significantly asso- ciated with vulnerable functions/files with a small-to-large Cohen’s d effect size. Our prediction model achieved an increase of 36% (in FFmpeg) and 27% (in Wireshark) in the average value of F 2 -measure over a base model built with SLOC and coupling metrics. Our prediction model outperformed comparable models from prior literature with notable improvements: 58% reduction in false negative rate, 81% reduction in false positive rate, and 548% increase in F 2 -measure. These metrics advance vulnerability prevention by (a) being flexible in terms of granularity, (b) performing better than vulnerability prediction literature, and (c) being tunable so that practitioners can tailor the metrics to their products and better assess security risk.