Hard Problems: Predictive Security Metrics (ACM)

Submitted by BrandonB on Mon, 05/18/2015 - 9:37am

SoS Newsletter- Advanced Book Block


	Hard Problems: Predictive Security Metrics (ACM)

Predictive security metrics are a hard problem in the Science of Security. A survey of the ACM Digital Library found nearly three hundred scholarly articles about research into security metrics that were published in 2014. This series of bibliographical citations includes those actually published by ACM. A separate listing of works researching these areas but not published by ACM, and therefore subject to intellectual property restrictions about the use of abstracts, is cited under the heading “Citations for Hard Problems.”

Arpan Chakraborty, Brent Harrison, Pu Yang, David Roberts, Robert St. Amant; Exploring Key-Level Analytics for Computational Modeling of Typing Behavior; HotSoS '14 Proceedings of the 2014 Symposium and Bootcamp on the Science of Security, April, 2014, Article No. 34. Doi: 10.1145/2600176.2600210 Abstract: Typing is a human activity that can be affected by a number of situational and task-specific factors. Changes in typing behavior resulting from the manipulation of such factors can be predictably observed through key-level input analytics. Here we present a study designed to explore these relationships. Participants play a typing game in which letter composition, word length and number of words appearing together are varied across levels. Inter-keystroke timings and other higher order statistics (such as bursts and pauses), as well as typing strategies, are analyzed from game logs to find the best set of metrics that quantify the effect that different experimental factors have on observable metrics. Beyond task-specific factors, we also study the effects of habituation by recording changes in performance with practice. Currently a work in progress, this research aims at developing a predictive model of human typing. We believe this insight can lead to the development of novel security proofs for interactive systems that can be deployed on existing infrastructure with minimal overhead. Possible applications of such predictive capabilities include anomalous behavior detection, authentication using typing signatures, bot detection using word challenges etc.
Keywords: cognitive modeling, typing, user interfaces (ID#: 15-4403)
URL: http://doi.acm.org/10.1145/2600176.2600210

Arild B. Torjusen, Habtamu Abie, Ebenezer Paintsil, Denis Trcek, Åsmund Skomedal; Towards Run-Time Verification of Adaptive Security for IoT in eHealth; ECSAW '14 Proceedings of the 2014 European Conference on Software Architecture Workshops, August 2014, Article No. 4. Doi: 10.1145/2642803.2642807 Abstract: This paper integrates run-time verification enablers in the feedback adaptation loop of the ASSET adaptive security framework for Internet of Things (IoT) in the eHealth settings and instantiates the resulting framework with Colored Petri Nets. The run-time enablers make machine-readable formal models of a system state and context available at run-time. In addition, they make requirements that define the objectives of verification available at run-time as formal specifications and enable dynamic context monitoring and adaptation. Run-time adaptive behavior that deviates from the normal mode of operation of the system represents a major threat to the sustainability of critical eHealth services. Therefore, the integration of run-time enablers into the ASSET adaptive framework could lead to a sustainable security framework for IoT in eHealth.
Keywords: Adaptive Security, Formal Run-time Verification, IoT, eHealth (ID#: 15-4404)
URL: http://doi.acm.org/10.1145/2642803.2642807

William Herlands, Thomas Hobson, Paula J. Donovan; Effective Entropy: Security-Centric Metric for Memory Randomization Techniques; CSET'14 Proceedings of the 7th USENIX Conference on Cyber Security Experimentation and Test, April 2014, Pages 5-5. Doi: (none provided) Abstract: User space memory randomization techniques are an emerging field of cyber defensive technology which attempts to protect computing systems by randomizing the layout of memory. Quantitative metrics are needed to evaluate their effectiveness at securing systems against modern adversaries and to compare between randomization technologies. We introduce Effective Entropy, a measure of entropy in user space memory which quantitatively considers an adversary's ability to leverage low entropy regions of memory via absolute and dynamic intersection connections. Effective Entropy is indicative of adversary workload and enables comparison between different randomization techniques. Using Effective Entropy, we present a comparison of static Address Space Layout Randomization (ASLR), Position Independent Executable (PIE) ASLR, and a theoretical fine grain randomization technique.
Keywords: (not provided) (ID#: 15-4405)
URL: http://dl.acm.org/citation.cfm?id=2671214.2671219

Shouhuai Xu; Cybersecurity Dynamics; HotSoS '14 Proceedings of the 2014 Symposium and Bootcamp on the Science of Security, April 2014, Article No. 14. Doi: 10.1145/2600176.2600190 Abstract: We explore the emerging field of Cybersecurity Dynamics, a candidate foundation for the Science of Cybersecurity.
Keywords: cybersecurity dynamics, security analysis, security model (ID#: 15-4406)
URL: http://doi.acm.org/10.1145/2600176.2600190

Michael Sherman, Gradeigh Clark, Yulong Yang, Shridatt Sugrim, Arttu Modig, Janne Lindqvist, Antti Oulasvirta, Teemu Roos; User-Generated Free-Form Gestures for Authentication: Security and Memorability; MobiSys '14 Proceedings of the 12th Annual International Conference On Mobile Systems, Applications, and Services, June 2014, Pages 176-189. Doi: 10.1145/2594368.2594375 Abstract: This paper studies the security and memorability of free-form multitouch gestures for mobile authentication. Towards this end, we collected a dataset with a generate-test-retest paradigm where participants (N=63) generated free-form gestures, repeated them, and were later retested for memory. Half of the participants decided to generate one-finger gestures, and the other half generated multi-finger gestures. Although there has been recent work on template-based gestures, there are yet no metrics to analyze security of either template or free-form gestures. For example, entropy-based metrics used for text-based passwords are not suitable for capturing the security and memorability of free-form gestures. Hence, we modify a recently proposed metric for analyzing information capacity of continuous full-body movements for this purpose. Our metric computed estimated mutual information in repeated sets of gestures. Surprisingly, one-finger gestures had higher average mutual information. Gestures with many hard angles and turns had the highest mutual information. The best-remembered gestures included signatures and simple angular shapes. We also implemented a multitouch recognizer to evaluate the practicality of free-form gestures in a real authentication system and how they perform against shoulder surfing attacks. We discuss strategies for generating secure and memorable free-form gestures. We conclude that free-form gestures present a robust method for mobile authentication.
Keywords: gestures, memorability, mutual information, security (ID#: 15-4407)
URL: http://doi.acm.org/10.1145/2594368.2594375

A. M. Mora, P. De las Cuevas, J. J. Merelo, S. Zamarripa, M. Juan, A. I. Esparcia-Alcázar, M. Burvall, H. Arfwedson, Z. Hodaie; MUSES: A corporate user-centric system which applies computational intelligence methods; SAC '14 Proceedings of the 29th Annual ACM Symposium on Applied Computing, March 2014, Pages 1719-1723. Doi: 10.1145/2554850.2555059 Abstract: This work presents the description of the architecture of a novel enterprise security system, still in development, which can prevent and deal with the security flaws derived from the users in a company. Thus, the Multiplatform Usable Endpoint Security system (MUSES) considers diverse factors such as the information distribution, the type of accesses, the context where the users are, the category of users, or the mix between personal and private data, among others. This system includes an event correlator and a risk and trust analysis engine to perform the decision process. MUSES follows a set of defined security rules, according to the enterprise security policies, but it is able to self-adapt the decisions and even create new security rules depending on the user behaviour, the specific device, and the situation or context. To this aim MUSES applies machine learning and computational intelligence techniques which can also be used to predict potential unsafe or dangerous user's behaviour.
Keywords: BYOD, enterprise security, event correlation, multiplatform, risk and trust analysis, security policies, self-adaptation, user-centric system (ID#: 15-4408)
URL: http://doi.acm.org/10.1145/2554850.2555059

Cornel Barna, Mark Shtern, Michael Smit, Vassilios Tzerpos, Marin Litoiu; Mitigating DoS Attacks Using Performance Model-Driven Adaptive Algorithms; ACM Transactions on Autonomous and Adaptive Systems (TAAS), Volume 9, Issue 1, March 2014, Article No. 3. Doi: 10.1145/2567926 Abstract: Denial of Service (DoS) attacks overwhelm online services, preventing legitimate users from accessing a service, often with impact on revenue or consumer trust. Approaches exist to filter network-level attacks, but application-level attacks are harder to detect at the firewall. Filtering at this level can be computationally expensive and difficult to scale, while still producing false positives that block legitimate users. This article presents a model-based adaptive architecture and algorithm for detecting DoS attacks at the web application level and mitigating them. Using a performance model to predict the impact of arriving requests, a decision engine adaptively generates rules for filtering traffic and sending suspicious traffic for further review, where the end user is given the opportunity to demonstrate they are a legitimate user. If no legitimate user responds to the challenge, the request is dropped. Experiments performed on a scalable implementation demonstrate effective mitigation of attacks launched using a real-world DoS attack tool.
Keywords: Denial of service, DoS attack mitigation, distributed denial of service, layered queuing network, model-based adaptation, performance model (ID#: 15-4409)
URL: http://doi.acm.org/10.1145/2567926

Julien Freudiger, Shantanu Rane, Alejandro E. Brito, Ersin Uzun; Privacy Preserving Data Quality Assessment for High-Fidelity Data Sharing; WISCS '14 Proceedings of the 2014 ACM Workshop on Information Sharing & Collaborative Security, November 2014, Pages 21-29. Doi: 10.1145/2663876.2663885 Abstract: In a data-driven economy that struggles to cope with the volume and diversity of information, data quality assessment has become a necessary precursor to data analytics. Real-world data often contains inconsistencies, conflicts and errors. Such dirty data increases processing costs and has a negative impact on analytics. Assessing the quality of a dataset is especially important when a party is considering acquisition of data held by an untrusted entity. In this scenario, it is necessary to consider privacy risks of the stakeholders. This paper examines challenges in privacy-preserving data quality assessment. A two-party scenario is considered, consisting of a client that wishes to test data quality and a server that holds the dataset. Privacy-preserving protocols are presented for testing important data quality metrics: completeness, consistency, uniqueness, timeliness and validity. For semi-honest parties, the protocols ensure that the client does not discover any information about the data other than the value of the quality metric. The server does not discover the parameters of the client's query, the specific attributes being tested and the computed value of the data quality metric. The proposed protocols employ additively homomorphic encryption in conjunction with condensed data representations such as counting hash tables and histograms, serving as efficient alternatives to solutions based on private set intersection.
Keywords: cryptographic protocols, data quality assessment, privacy and confidentiality (ID#: 15-4410)
URL: http://doi.acm.org/10.1145/2663876.2663885

Shweta Subramani, Mladen Vouk, Laurie Williams; An Analysis of Fedora Security Profile; HotSoS '14 Proceedings of the 2014 Symposium and Bootcamp on the Science of Security, April 2014, Article No. 35. Doi: 10.1145/2600176.2600211 Abstract: This paper examines security faults/vulnerabilities reported for Fedora. Results indicate that, at least in some situations, fault roughly constant may be used to guide estimation of residual vulnerabilities in an already released product, as well as possibly guide testing of the next version of the product.
Keywords: Fedora, detection, non-operational testing, prediction, security faults, vulnerabilities (ID#: 15-4411)
URL: http://doi.acm.org/10.1145/2600176.2600211

Omer Tripp, Julia Rubin; A Bayesian Approach to Privacy Enforcement in Smartphones; SEC'14 Proceedings of the 23rd USENIX Conference on Security Symposium, August 2014, Pages 175-190. Doi: (none provided) Abstract: Mobile apps often require access to private data, such as the device ID or location. At the same time, popular platforms like Android and iOS have limited support for user privacy. This frequently leads to unauthorized disclosure of private information by mobile apps, e.g. for advertising and analytics purposes. This paper addresses the problem of privacy enforcement in mobile systems, which we formulate as a classification problem: When arriving at a privacy sink (e.g., database update or outgoing web message), the runtime system must classify the sink's behavior as either legitimate or illegitimate. The traditional approach of information-flow (or taint) tracking applies "binary" classification, whereby information release is legitimate iff there is no data flow from a privacy source to sink arguments. While this is a useful heuristic, it also leads to false alarms. We propose to address privacy enforcement as a learning problem, relaxing binary judgments into a quantitative/ probabilistic mode of reasoning. Specifically, we propose a Bayesian notion of statistical classification, which conditions the judgment whether a release point is legitimate on the evidence arising at that point. In our concrete approach, implemented as the BAYESDROID system that is soon to be featured in a commercial product, the evidence refers to the similarity between the data values about to be released and the private data stored on the device. Compared to TaintDroid, a state-of-the-art taint-based tool for privacy enforcement, BAYESDROID is substantially more accurate. Applied to 54 top-popular Google Play apps, BAYESDROID is able to detect 27 privacy violations with only 1 false alarm.
Keywords: (not provided) (ID#: 15-4412)
URL: http://dl.acm.org/citation.cfm?id=2671225.2671237

Reid Priedhorsky, Aron Culotta, Sara Y. Del Valle; Inferring the Origin Locations of Tweets with Quantitative Confidence; CSCW '14 Proceedings of the 17th ACM Conference On Computer Supported Cooperative Work & Social Computing, February 2014, Pages 1523-1536. Doi: 10.1145/2531602.2531607 Abstract: Social Internet content plays an increasingly critical role in many domains, including public health, disaster management, and politics. However, its utility is limited by missing geographic information; for example, fewer than 1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable, content-based approach to estimate the location of tweets using a novel yet simple variant of gaussian mixture models. Further, because real-world applications depend on quantified uncertainty for such estimates, we propose novel metrics of accuracy, precision, and calibration, and we evaluate our approach accordingly. Experiments on 13 million global, comprehensively multi-lingual tweets show that our approach yields reliable, well-calibrated results competitive with previous computationally intensive methods. We also show that a relatively small number of training data are required for good estimates (roughly 30,000 tweets) and models are quite time-invariant (effective on tweets many weeks newer than the training set). Finally, we show that toponyms and languages with small geographic footprint provide the most useful location signals.
Keywords: gaussian mixture models, geo-location, location inference, metrics, twitter, uncertainty quantification (ID#: 15-4413)
URL: http://doi.acm.org/10.1145/2531602.2531607

Rinkaj Goyal, Pravin Chandra, Yogesh Singh; Why Interaction Between Metrics Should be Considered in the Development of Software Quality Models: A Preliminary Study; ACM SIGSOFT Software Engineering Notes, Volume 39 Issue 4, July 2014, Pages 1-4. Doi: 10.1145/2632434.2659853 Abstract: This study examines the need to consider interactions between the measurements (metrics) of different quality factors in the development of software quality models. Though the correlation between metrics has been explored to a considerable depth in the development of these models, consideration of interactions between predictors is comparatively new in software engineering. This preliminary study is supported by statistically-proven results, differentiating interactions with correlation analysis. The issues raised here will assist analysts to improve empirical analyses by incorporating interactions in software quality model development, where amalgamating effects between different characteristics or subcharacteristics are observed.
Keywords: empirical software engineering, interaction, metrics, quality models, regression analysis, software fault prediction models (ID#: 15-4414)
URL: http://doi.acm.org/10.1145/2632434.2659853

Richard J. Oentaryo, Ee-Peng Lim, Jia-Wei Low, David Lo, Michael Finegold; Predicting Response in Mobile Advertising with Hierarchical Importance-Aware Factorization Machine; WSDM '14 Proceedings of the 7th ACM International Conference On Web Search And Data Mining, February 2014, Pages 123-132. Doi: 10.1145/2556195.2556240 Abstract: Mobile advertising has recently seen dramatic growth, fueled by the global proliferation of mobile phones and devices. The task of predicting ad response is thus crucial for maximizing business revenue. However, ad response data change dynamically over time, and are subject to cold-start situations in which limited history hinders reliable prediction. There is also a need for a robust regression estimation for high prediction accuracy, and good ranking to distinguish the impacts of different ads. To this end, we develop a Hierarchical Importance-aware Factorization Machine (HIFM), which provides an effective generic latent factor framework that incorporates importance weights and hierarchical learning. Comprehensive empirical studies on a real-world mobile advertising dataset show that HIFM outperforms the contemporary temporal latent factor models. The results also demonstrate the efficacy of the HIFM's importance-aware and hierarchical learning in improving the overall prediction and prediction in cold-start scenarios, respectively.
Keywords: factorization machine, hierarchy, importance weight, mobile advertising, response prediction (ID#: 15-4415)
URL: http://doi.acm.org/10.1145/2556195.2556240

Thomas Fritz, Andrew Begel, Sebastian C. Müller, Serap Yigit-Elliott, Manuela Züger; Using Psycho-Physiological Measures to Assess Task Difficulty in Software Development; ICSE 2014 Proceedings of the 36th International Conference on Software Engineering, May 2014, pages 402-413. Doi: 10.1145/2568225.2568266 Abstract: Software developers make programming mistakes that cause serious bugs for their customers. Existing work to detect problematic software focuses mainly on post hoc identification of correlations between bug fixes and code. We propose a new approach to address this problem --- detect when software developers are experiencing difficulty while they work on their programming tasks, and stop them before they can introduce bugs into the code. In this paper, we investigate a novel approach to classify the difficulty of code comprehension tasks using data from psycho-physiological sensors. We present the results of a study we conducted with 15 professional programmers to see how well an eye-tracker, an electrodermal activity sensor, and an electroencephalography sensor could be used to predict whether developers would find a task to be difficult. We can predict nominal task difficulty (easy/difficult) for a new developer with 64.99% precision and 64.58% recall, and for a new task with 84.38% precision and 69.79% recall. We can improve the Naive Bayes classifier's performance if we trained it on just the eye-tracking data over the entire dataset, or by using a sliding window data collection schema with a 55 second time window. Our work brings the community closer to a viable and reliable measure of task difficulty that could power the next generation of programming support tools.
Keywords: psycho-physiological, study, task difficulty (ID#: 15-4416)
URL: http://doi.acm.org/10.1145/2568225.2568266

Elvis S. Liu, Georgios K. Theodoropoulos; Interest Management for Distributed Virtual Environments: A Survey; ACM Computing Surveys (CSUR), Volume 46 Issue 4, April 2014, Article No. 51. Doi: 10.1145/2535417 Abstract: The past two decades have witnessed an explosion in the deployment of large-scale distributed simulations and distributed virtual environments in different domains, including military and academic simulation systems, social media, and commercial applications such as massively multiplayer online games. As these systems become larger, more data intensive, and more latency sensitive, the optimisation of the flow of data, a paradigm referred to as interest management, has become increasingly critical to address the scalability requirements and enable their successful deployment. Numerous interest management schemes have been proposed for different application scenarios. This article provides a comprehensive survey of the state of the art in the design of interest management algorithms and systems. The scope of the survey includes current and historical projects providing a taxonomy of the existing schemes and summarising their key features. Identifying the primary requirements of interest management, the article discusses the trade-offs involved in the design of existing approaches.
Keywords: Interest management, data distribution management, distributed virtual environments, high-level architecture, massively multiplayer online games (ID#: 15-4417)
URL: http://doi.acm.org/10.1145/2535417

Tony Ohmann, Michael Herzberg, Sebastian Fiss, Armand Halbert, Marc Palyart, Ivan Beschastnikh, Yuriy Brun; Behavioral Resource-Aware Model Inference; ASE '14 Proceedings of the 29th ACM/IEEE International Conference On Automated Software Engineering, September 2014, Pages 19-30. Doi: 10.1145/2642937.2642988 Abstract: Software bugs often arise because of differences between what developers think their system does and what the system actually does. These differences frustrate debugging and comprehension efforts. We describe Perfume, an automated approach for inferring behavioral, resource-aware models of software systems from logs of their executions. These finite state machine models ease understanding of system behavior and resource use. Perfume improves on the state of the art in model inference by differentiating behaviorally similar executions that differ in resource consumption. For example, Perfume separates otherwise identical requests that hit a cache from those that miss it, which can aid understanding how the cache affects system behavior and removing cache-related bugs. A small user study demonstrates that using Perfume is more effective than using logs and another model inference tool for system comprehension. A case study on the TCP protocol demonstrates that Perfume models can help understand non-trivial protocol behavior. Perfume models capture key system properties and improve system comprehension, while being reasonably robust to noise likely to occur in real-world executions.
Keywords: comprehension, debugging, log analysis, model inference, performance, perfume, system understanding (ID#: 15-4418)
URL: http://doi.acm.org/10.1145/2642937.2642988

Moshe Lichman, Padhraic Smyth; Modeling Human Location Data with Mixtures of Kernel Densities; KDD '14 Proceedings of the 20th ACM SIGKDD International Conference On Knowledge Discovery And Data Mining, August 2014, Pages 35-44. Doi: 10.1145/2623330.2623681 Abstract: Location-based data is increasingly prevalent with the rapid increase and adoption of mobile devices. In this paper we address the problem of learning spatial density models, focusing specifically on individual-level data. Modeling and predicting a spatial distribution for an individual is a challenging problem given both (a) the typical sparsity of data at the individual level and (b) the heterogeneity of spatial mobility patterns across individuals. We investigate the application of kernel density estimation (KDE) to this problem using a mixture model approach that can interpolate between an individual's data and broader patterns in the population as a whole. The mixture-KDE approach is evaluated on two large geolocation/check-in data sets, from Twitter and Gowalla, with comparisons to non-KDE baselines, using both log-likelihood and detection of simulated identity theft as evaluation metrics. Our experimental results indicate that the mixture-KDE method provides a useful and accurate methodology for capturing and predicting individual-level spatial patterns in the presence of noisy and sparse data.
Keywords: anomaly/novelty detection, kernel density estimation, probabilistic methods, social media, spatial, user modeling (ID#: 15-4419)
URL: http://doi.acm.org/10.1145/2623330.2623681

Ramesh A., Anusha J., Clarence J.M. Tauro; A Novel, Generalized Recommender System for Social Media Using the Collaborative-Filtering Technique; ACM SIGSOFT Software Engineering Notes, Volume 39 Issue 3, May 2014, Pages 1-4. Doi: 10.1145/2597716.2597721 Abstract: Our goal in this paper is to discuss various methods available for Recommender Systems and describe an end-to-end approach for designing a Recommender System for social media using the collaborative-filtering approach. We will discuss the scope of contributions made in the recommender-system field, pros and cons for the collaborative-filtering approach, and current trends and challenges involved in the market with respect to the implementation of collaborative filtering.
Keywords: algorithms, collaborative filtering, recommendation, recommender systems, social media (ID#: 15-4420)
URL: http://doi.acm.org/10.1145/2597716.2597721

Anna C. Squicciarini, Cornelia Caragea, Rahul Balakavi; Analyzing Images' Privacy for the Modern Web; HT '14 Proceedings of the 25th ACM Conference On Hypertext And Social Media, September 2014, pages 136-147. Doi: 10.1145/2631775.2631803 Abstract: Images are now one of the most common form of content shared in online user-contributed sites and social Web 2.0 applications. In this paper, we present an extensive study exploring privacy and sharing needs of users' uploaded images. We develop learning models to estimate adequate privacy settings for newly uploaded images, based on carefully selected image-specific features. We focus on a set of visual-content features and on tags. We identify the smallest set of features, that by themselves or combined together with others, can perform well in properly predicting the degree of sensitivity of users' images. We consider both the case of binary privacy settings (i.e. public, private), as well as the case of more complex privacy options, characterized by multiple sharing options. Our results show that with few carefully selected features, one may achieve extremely high accuracy, especially when high-quality tags are available.
Keywords: privacy, image analysis (ID#: 15-4421)
URL: http://doi.acm.org/10.1145/2631775.2631803

Ramin Moazeni, Daniel Link, Barry Boehm; COCOMO II Parameters and IDPD: Bilateral Relevances; ICSSP 2014 Proceedings of the 2014 International Conference on Software and System Process, May 2014, Pages 20-24. Doi: 10.1145/2600821.2600847 Abstract: The phenomenon called Incremental Development Productivity Decline (IDPD) is presumed to be present in all incremental soft-ware projects to some extent. COCOMO II is a popular parametric cost estimation model that has not yet been adapted to account for the challenges that IDPD poses to cost estimation. Instead, its cost driver and scale factors stay constant throughout the increments of a project. While a simple response could be to make these parameters variable per increment, questions are raised as to whether the existing parameters are enough to predict the behavior of an incrementally developed project even in that case. Individual COCOMO II parameters are evaluated with regard to their development over the course of increments and how they influence IDPD. The reverse is also done. In light of data collected in recent experimental projects, additional new variable parameters that either extend COCOMO II or could stand on their own are proposed.
Keywords: IDPD, Parametric cost estimation, cost drivers, incremental development, scale factors (ID#: 15-4422)
URL: http://doi.acm.org/10.1145/2600821.2600847

Nicolás E. Bordenabe, Konstantinos Chatzikokolakis, Catuscia Palamidessi; Optimal Geo-Indistinguishable Mechanisms for Location Privacy; CCS '14 Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, November, 2014, Pages 251-262. Doi: 10.1145/2660267.2660345 Abstract: We consider the geo-indistinguishability approach to location privacy, and the trade-off with respect to utility. We show that, given a desired degree of geo-indistinguishability, it is possible to construct a mechanism that minimizes the service quality loss, using linear programming techniques. In addition we show that, under certain conditions, such mechanism also provides optimal privacy in the sense of Shokri et al. Furthermore, we propose a method to reduce the number of constraints of the linear program from cubic to quadratic, maintaining the privacy guarantees and without affecting significantly the utility of the generated mechanism. This reduces considerably the time required to solve the linear program, thus enlarging significantly the location sets for which the optimal mechanisms can be computed.
Keywords: differential privacy, geo-indistinguishability, linear optimization, location obfuscation, location privacy (ID#: 15-4423)
URL: http://doi.acm.org/10.1145/2660267.2660345

Elizeu Santos-Neto, Tatiana Pontes, Jussara Almeida, Matei Ripeanu; On the Choice of Data Sources to Improve Content Discoverability via Textual Feature Optimization; HT '14 Proceedings of the 25th ACM Conference On Hypertext And Social Media, September 2014, Pages 273-278. Doi: 10.1145/2631775.2631815 Abstract: A large portion of the audience of video content items on the web currently comes from keyword-based search and/or tag-based navigation. Thus, the textual features of this content (e.g., the title, description, and tags) can directly impact the view count of a particular content item, and ultimately the advertisement generated revenue. More importantly, the textual features can generally be optimized to attract more search traffic. This study makes progress on the problem of automating tag selection for online video content with the goal of increasing viewership. It brings two key insights: first, based on evidence that existing tags for YouTube videos can be improved by an automated tag recommender, even for a sample of well curated movies, it explores the impact of using information mined from repositories created by different production modes (e.g., peer- and expert-produced); second, this study performs a preliminary characterization of the factors that impact the quality of the tag recommendation pipeline for different input data sources.
Keywords: peer-production, social tagging, video popularity (ID#: 15-4423)
URL: http://doi.acm.org/10.1145/2631775.2631815

Yu Zheng, Abhishek Basak, Swarup Bhunia; CACI: Dynamic Current Analysis Towards Robust Recycled Chip Identification; DAC '14 Proceedings of the 51st Annual Design Automation Conference, June 2014, Pages 1-6. Doi: 10.1145/2593069.2593102 Abstract: Rising incidences of counterfeit chips in the supply chain have posed a serious threat to the semiconductor industry. Recycling of used chips constitutes a major form of counterfeiting attacks. If undetected, they can lead to serious consequences including system performance/reliability issues during field operation and potential revenue/reputation loss for a trusted manufacturer. Existing validation approaches based on path delay analysis suffer from reduced robustness and sensitivity under large process variations. On the other hand, existing design solutions based on aging sensors require additional design/verification efforts and cannot be applied to legacy chips. In this paper, we present a novel recycled chip identification approach, CACI, that exploits differential aging in self-similar modules (e.g., different parts of an adder) to isolate aged chips under large inter- and intra-die process variations. It compares dynamic current (IDDT) signatures between two adjacent similar circuit structures in a chip. We derive an isolation metric based on multiple current comparisons to provide high level of confidence. CACI does not rely on any embedded structures for authentication, thus it comes at virtually zero design overhead and can be applied to chips already in the market. Through extensive simulations, we show that for 15% inter- and 10% intra-die variations in threshold voltage for a 45nm CMOS process, over 97% of recycled chips can be reliably identified.
Keywords: BTI, Counterfeiting attack, Hardware security, Recycled chip (ID#: 15-4424)
URL: http://doi.acm.org/10.1145/2593069.2593102

Ron Eyal, Avi Rosenfeld, Sigal Sina, Sarit Kraus; Predicting and Identifying Missing Node Information in Social Networks; ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 8 Issue 3, June 2014, Article No. 14. Doi: 10.1145/2536775 Abstract: In recent years, social networks have surged in popularity. One key aspect of social network research is identifying important missing information that is not explicitly represented in the network, or is not visible to all. To date, this line of research typically focused on finding the connections that are missing between nodes, a challenge typically termed as the link prediction problem. This article introduces the missing node identification problem, where missing members in the social network structure must be identified. In this problem, indications of missing nodes are assumed to exist. Given these indications and a partial network, we must assess which indications originate from the same missing node and determine the full network structure. Toward solving this problem, we present the missing node identification by spectral clustering algorithm (MISC), an approach based on a spectral clustering algorithm, combined with nodes’ pairwise affinity measures that were adopted from link prediction research. We evaluate the performance of our approach in different problem settings and scenarios, using real-life data from Facebook. The results show that our approach has beneficial results and can be effective in solving the missing node identification problem. In addition, this article also presents R-MISC, which uses a sparse matrix representation, efficient algorithms for calculating the nodes’ pairwise affinity, and a proprietary dimension reduction technique to enable scaling the MISC algorithm to large networks of more than 100,000 nodes. Last, we consider problem settings where some of the indications are unknown. Two algorithms are suggested for this problem: speculative MISC, based on MISC, and missing link completion, based on classical link prediction literature. We show that speculative MISC outperforms missing link completion.
Keywords: Social networks, missing nodes, spectral clustering (ID#: 15-4425)
URL: http://doi.acm.org/10.1145/2536775

Tung Thanh Nguyen, Evelyn Duesterwald, Tim Klinger, P. Santhanam, Tien N. Nguyen; Characterizing Defect Trends in Software Support; ICSE Companion 2014 Companion Proceedings of the 36th International Conference on Software Engineering, May 2014, pages 508-511. Doi: 10.1145/2591062.2591112 Abstract: We present an empirical analysis of defect arrival data in the operational phase of multiple software products. We find that the shape of the defect curves is sufficiently determined by three external and readily available release cycle attributes: the product type, the license model, and the cycle time between releases. This finding provides new insights into the driving forces affecting the specifics of defect curves and opens up new opportunities for software support organizations to reduce the cost of maintaining defect arrival models for individual products. In addition, it allows the possibility of predicting the defect arrival rate of one product from another with similar known attributes.
Keywords: Empirical study, operational phase, post release defects modeling (ID#: 15-4426)
URL: http://doi.acm.org/10.1145/2591062.2591112

Emre Sarigol, David Garcia, Frank Schweitzer; Online Privacy as a Collective Phenomenon; COSN '14 Proceedings of the Second ACM Conference On Online Social Networks, October 2014, Pages 95-106. Doi: 10.1145/2660460.2660470 Abstract: The problem of online privacy is often reduced to individual decisions to hide or reveal personal information in online social networks (OSNs). However, with the increasing use of OSNs, it becomes more important to understand the role of the social network in disclosing personal information that a user has not revealed voluntarily: How much of our private information do our friends disclose about us, and how much of our privacy is lost simply because of online social interaction? Without strong technical effort, an OSN may be able to exploit the assortativity of human private features, this way constructing shadow profiles with information that users chose not to share. Furthermore, because many users share their phone and email contact lists, this allows an OSN to create full shadow profiles for people who do not even have an account for this OSN. We empirically test the feasibility of constructing shadow profiles of sexual orientation for users and non-users, using data from more than 3 Million accounts of a single OSN. We quantify a lower bound for the predictive power derived from the social network of a user, to demonstrate how the predictability of sexual orientation increases with the size of this network and the tendency to share personal information. This allows us to define a privacy leak factor that links individual privacy loss with the decision of other individuals to disclose information. Our statistical analysis reveals that some individuals are at a higher risk of privacy loss, as prediction accuracy increases for users with a larger and more homogeneous first- and second-order neighborhood of their social network. While we do not provide evidence that shadow profiles exist at all, our results show that disclosing of private information is not restricted to an individual choice, but becomes a collective decision that has implications for policy and privacy regulation.
Keywords: prediction, privacy, shadow profiles (ID#: 15-4427)
URL: http://doi.acm.org/10.1145/2660460.2660470

Vasileios Kagklis, Vassilios S. Verykios, Giannis Tzimas, Athanasios K. Tsakalidis; Knowledge Sanitization on the Web; WIMS '14 Proceedings of the 4th International Conference on Web Intelligence, Mining and Semantics (WIMS14), June 2014, Article No. 4. Doi: 10.1145/2611040.2611044 Abstract: The widespread use of the Internet caused the rapid growth of data on the Web. But as data on the Web grew larger in numbers, so did the perils due to the applications of data mining. Privacy preserving data mining (PPDM) is the field that investigates techniques to preserve the privacy of data and patterns. Knowledge Hiding, a subfield of PPDM, aims at preserving the sensitive patterns included in the data, which are going to be published. A wide variety of techniques fall under the umbrella of Knowledge Hiding, such as frequent pattern hiding, sequence hiding, classification rule hiding and so on. In this tutorial we create a taxonomy for the frequent itemset hiding techniques. We also provide as examples for each category representative works that appeared recently and fall into each one of these categories. Then, we focus on the detailed overview of a specific category, the so called linear programming-based techniques. Finally, we make a quantitative and qualitative comparison among some of the existing techniques that are classified into this category.
Keywords: Frequent Itemset Hiding, Knowledge Hiding, LP-Based Hiding Approaches, Privacy Preserving Data Mining; (ID#: 15-4428)
URL: http://doi.acm.org/10.1145/2611040.2611044

Fei Xing, Haihang You; Workload Aware Utilization Optimization for a Petaflop Supercomputer: Evidence Based Assessment Using Statistical Methods; XSEDE '14 Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, July 2014, Article No. 50. Doi: 10.1145/2616498.2616536 Abstract: Nowadays, computing resources like supercomputers are shared by many users. Most systems are equipped with batch systems as their resource managers. From a user's perspective, the overall turnaround of each submitted job is measured by time-to-solution which consists of the sum of batch queuing time and execution time. On a busy machine, most jobs spend more time waiting in the batch queue than their real job executions. And rarely this is a topic of performance tuning and optimization of parallel computing. we propose a workload aware method systematically to predict jobs' batch queue waiting time patterns. Consequently, it will help user to optimize utilization and improve productivity. With workload data gathered from a supercomputer, we apply Bayesian framework to predict the temporal trend of long-time batch queue waiting probability. Thus, the workload of the machine not only can be predicted, we are able to provide users with a monthly updated reference chart to suggest job submission assembled with better chosen number of CPU and running time requests, which will avoid long-time waiting in batch queue. Our experiment shows that the model could make over 89% correct predictions for all cases we have tested.
Keywords: Kraken, near repeat, queuing time, workload (ID#: 15-4429)
URL: http://doi.acm.org/10.1145/2616498.2616536

Stratis Ioannidis, Andrea Montanari, Udi Weinsberg, Smriti Bhagat, Nadia Fawaz, Nina Taft; Privacy Tradeoffs in Predictive Analytics; SIGMETRICS '14 The 2014 ACM International Conference On Measurement And Modeling Of Computer Systems, June 2014, Pages 57-69 Doi: 10.1145/2591971.2592011 Abstract: Online services routinely mine user data to predict user preferences, make recommendations, and place targeted ads. Recent research has demonstrated that several private user attributes (such as political affiliation, sexual orientation, and gender) can be inferred from such data. Can a privacy-conscious user benefit from personalization while simultaneously protecting her private attributes? We study this question in the context of a rating prediction service based on matrix factorization. We construct a protocol of interactions between the service and users that has remarkable optimality properties: it is privacy-preserving, in that no inference algorithm can succeed in inferring a user's private attribute with a probability better than random guessing; it has maximal accuracy, in that no other privacy-preserving protocol improves rating prediction; and, finally, it involves a minimal disclosure, as the prediction accuracy strictly decreases when the service reveals less information. We extensively evaluate our protocol using several rating datasets, demonstrating that it successfully blocks the inference of gender, age and political affiliation, while incurring less than 5% decrease in the accuracy of rating prediction.
Keywords: matrix factorization, privacy-preserving protocols (ID#: 15-4430)
URL: http://doi.acm.org/10.1145/2591971.2592011

Ramin Moazeni, Daniel Link, Celia Chen, Barry Boehm; Software Domains in Incremental Development Productivity Decline; ICSSP 2014 Proceedings of the 2014 International Conference on Software and System Process, May 2014, Pages 75-83. Doi: 10.1145/2600821.2600830 Abstract: This research paper expands on a previously introduced phenomenon called Incremental Development Productivity Decline (IDPD) that is presumed to be present in all incremental software projects to some extent. Incremental models are now being used by many organizations in order to reduce development risks. Incremental development has become the most common method of software development. Therefore its characteristics inevitably influence the productivity of projects. Based on their observed IDPD, incrementally developed projects are split into several major IDPD categories. Different ways of measuring productivity are presented and evaluated in order to come to a definition or set of definitions that is suitable to these categories of projects. Data has been collected and analyzed, indicating the degree of IDPD associated with each category. Several hypotheses have undergone preliminary evaluations regarding the existence, stability and category-dependence of IDPD with encouraging results. Further data collection and hypothesis testing is underway.
Keywords: Software engineering, incremental development, productivity decline, statistics (ID#: 15-4431)
URL: http://doi.acm.org/10.1145/2600821.2600830

Sangho Lee, Changhee Jung, Santosh Pande; Detecting Memory Leaks Through Introspective Dynamic Behavior Modelling Using Machine Learning; ICSE 2014 Proceedings of the 36th International Conference on Software Engineering, May 2014, Pages 814-824. Doi: 10.1145/2568225.2568307 Abstract: This paper expands staleness-based memory leak detection by presenting a machine learning-based framework. The proposed framework is based on an idea that object staleness can be better leveraged in regard to similarity of objects; i.e., an object is more likely to have leaked if it shows significantly high staleness not observed from other similar objects with the same allocation context. A central part of the proposed framework is the modeling of heap objects. To this end, the framework observes the staleness of objects during a representative run of an application. From the observed data, the framework generates training examples, which also contain instances of hypothetical leaks. Via machine learning, the proposed framework replaces the error-prone user-definable staleness predicates used in previous research with a model-based prediction. The framework was tested using both synthetic and real-world examples. Evaluation with synthetic leakage workloads of SPEC2006 benchmarks shows that the proposed method achieves the optimal accuracy permitted by staleness-based leak detection. Moreover, by incorporating allocation context into the model, the proposed method achieves higher accuracy than is possible with object staleness alone. Evaluation with real-world memory leaks demonstrates that the proposed method is effective for detecting previously reported bugs with high accuracy.
Keywords: Machine learning, Memory leak detection, Runtime analysis (ID#: 15-4432)
URL: http://doi.acm.org/10.1145/2568225.2568307

Mardé Helbig, Andries P. Engelbrecht; Benchmarks for Dynamic Multi-Objective Optimisation Algorithms; ACM Computing Surveys (CSUR), Volume 46 Issue 3, January 2014, Article No. 37. Doi: 10.1145/2517649 Abstract: Algorithms that solve Dynamic Multi-Objective Optimisation Problems (DMOOPs) should be tested on benchmark functions to determine whether the algorithm can overcome specific difficulties that can occur in real-world problems. However, for Dynamic Multi-Objective Optimisation (DMOO), no standard benchmark functions are used. A number of DMOOPs have been proposed in recent years. However, no comprehensive overview of DMOOPs exist in the literature. Therefore, choosing which benchmark functions to use is not a trivial task. This article seeks to address this gap in the DMOO literature by providing a comprehensive overview of proposed DMOOPs, and proposing characteristics that an ideal DMOO benchmark function suite should exhibit. In addition, DMOOPs are proposed for each characteristic. Shortcomings of current DMOOPs that do not address certain characteristics of an ideal benchmark suite are highlighted. These identified shortcomings are addressed by proposing new DMOO benchmark functions with complicated Pareto-Optimal Sets (POSs), and approaches to develop DMOOPs with either an isolated or deceptive Pareto-Optimal Front (POF). In addition, DMOO application areas and real-world DMOOPs are discussed.
Keywords: Dynamic multi-objective optimisation, benchmark functions, complex Pareto-optimal set, deceptive Pareto-optimal front, ideal benchmark function suite, isolated Pareto-optimal front (ID#: 15-4433)
URL: http://doi.acm.org/10.1145/2517649

Yu Zhang, Daby Sow, Deepak Turaga, Mihaela van der Schaar; A Fast Online Learning Algorithm for Distributed Mining of BigData; ACM SIGMETRICS Performance Evaluation Review, Volume 41 Issue 4, March 2014, Pages 90-93. Doi: 10.1145/2627534.2627562 Abstract: BigData analytics require that distributed mining of numerous data streams is performed in real-time. Unique challenges associated with designing such distributed mining systems are: online adaptation to incoming data characteristics, online processing of large amounts of heterogeneous data, limited data access and communication capabilities between distributed learners, etc. We propose a general framework for distributed data mining and develop an efficient online learning algorithm based on this. Our framework consists of an ensemble learner and multiple local learners, which can only access different parts of the incoming data. By exploiting the correlations of the learning models among local learners, our proposed learning algorithms can optimize the prediction accuracy while requiring significantly less information exchange and computational complexity than existing state-of-the-art learning solutions.
Keywords: (not provided) (ID#: 15-4434)
URL: http://doi.acm.org/10.1145/2627534.2627562

Sudarshan Srinivasan, Victor Hazlewood, Gregory D. Peterson; Descriptive Data Analysis of File Transfer Data; XSEDE '14 Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, July 2014, Article No. 37. Doi: 10.1145/2616498.2616550 Abstract: There are millions of files and multi-terabytes of data transferred to and from the University of Tennessee's National Institute for Computational Sciences each month. New capabilities available with GridFTP version 5.2.2 include additional transfer log information previously unavailable in prior versions implemented within XSEDE. The transfer log data now available includes identification of source and destination endpoints which unlocks a wealth of information that can be used to detail GridFTP activities across the Internet. This information can be used for a wide variety of reports of interest to individual XSEDE Service Providers and to XSEDE Operations. In this paper, we discuss the new capabilities available for transfer logs in GridFTP 5.2.2, our initial attempt to organize, analyze, and report on this file transfer data for NICS, and its applicability to XSEDE Service Providers. Analysis of this new information can provide insight into effective and efficient utilization of GridFTP resources including identification of potential areas of GridFTP file transfer improvement (e.g., network and server tuning) and potential predictive analysis to improve efficiency.
Keywords: Log transfer, data analysis, database loading (ID#: 15-4435)
URL: http://doi.acm.org/10.1145/2616498.2616550

Robert Lagerström, Mathias Ekstedt; Extending a General Theory of Software to Engineering; GTSE 2014 Proceedings of the 3rd SEMAT Workshop on General Theories of Software Engineering, June 2014, Pages 36-39. Doi: 10.1145/2593752.2593759 Abstract: In this paper, we briefly describe a general theory of software used in order to model and predict the current and future quality of software systems and their environment. The general theory is described using a class model containing classes such as application component, business service, and infrastructure function as well as attributes such as modifiability, cost, and availability. We also elaborate how this general theory of software can be extended into a general theory of software engineering by adding engineering activities, roles, and requirements.
Keywords: General theory, Software engineering, Software systems, and Software quality prediction (ID#: 15-4436)
URL: http://doi.acm.org/10.1145/2593752.2593759

Xiuchao Wu, Kenneth N. Brown, Cormac J. Sreenan; Data Pre-Forwarding for Opportunistic Data Collection in Wireless Sensor Networks; ACM Transactions on Sensor Networks (TOSN), Volume 11 Issue 1, November 2014, Article No. 8. Doi: 10.1145/2629369 Abstract: Opportunistic data collection in wireless sensor networks uses passing smartphones to collect data from sensor nodes, thus avoiding the cost of multiple static sink nodes. Based on the observed mobility patterns of smartphone users, sensor data should be preforwarded to the nodes that are visited more frequently with the aim of improving network throughput. In this article, we construct a formal network model and an associated theoretical optimization problem to maximize the throughput subject to energy constraints of sensor nodes. Since a centralized controller is not available in opportunistic data collection, data pre-forwarding (DPF) must operate as a distributed mechanism in which each node decides when and where to forward data based on local information. Hence, we develop a simple distributed DPF mechanism with two heuristic algorithms, implement this proposal in Contiki-OS, and evaluate it thoroughly. We demonstrate empirically, in simulations, that our approach is close to the optimal solution obtained by a centralized algorithm. We also demonstrate that this approach performs well in scenarios based on real mobility traces of smartphone users. Finally, we evaluate our proposal on a small laboratory testbed, demonstrating that the distributed DPF mechanism with heuristic algorithms performs as predicted by simulations, and thus that it is a viable technique for opportunistic data collection through smartphones.
Keywords: Wireless sensor network, data pre-forwarding, human mobility, opportunistic data collection, routing, smartphone (ID#: 15-4437)
URL: http://doi.acm.org/10.1145/2629369

Qiang Fu, Jieming Zhu, Wenlu Hu, Jian-Guang Lou, Rui Ding, Qingwei Lin, Dongmei Zhang, Tao Xie; Where Do Developers Log? An Empirical Study on Logging Practices in Industry; ICSE Companion 2014 Companion Proceedings of the 36th International Conference on Software Engineering, May 2014, Pages 24-33. Doi: 10.1145/2591062.2591175 Abstract: System logs are widely used in various tasks of software system management. It is crucial to avoid logging too little or too much. To achieve so, developers need to make informed decisions on where to log and what to log in their logging practices during development. However, there exists no work on studying such logging practices in industry or helping developers make informed decisions. To fill this significant gap, in this paper, we systematically study the logging practices of developers in industry, with focus on where developers log. We obtain six valuable findings by conducting source code analysis on two large industrial systems (2.5M and 10.4M LOC, respectively) at Microsoft. We further validate these findings via a questionnaire survey with 54 experienced developers in Microsoft. In addition, our study demonstrates the high accuracy of up to 90% F-Score in predicting where to log.
Keywords: Logging practice, automatic logging, developer survey (ID#: 15-4438)
URL: http://doi.acm.org/10.1145/2591062.2591175

Martina Maggio, Federico Terraneo, Alberto Leva; Task Scheduling: A Control-Theoretical Viewpoint for a General and Flexible Solution; ACM Transactions on Embedded Computing Systems (TECS) - Regular Papers, Volume 13 Issue 4, November 2014, Article No. 76. Doi: 10.1145/2560015 Abstract: This article presents a new approach to the design of task scheduling algorithms, where system-theoretical methodologies are used throughout. The proposal implies a significant perspective shift with respect to mainstream design practices, but yields large payoffs in terms of simplicity, flexibility, solution uniformity for different problems, and possibility to formally assess the results also in the presence of unpredictable run-time situations. A complete implementation example is illustrated, together with various comparative tests, and a methodological treatise of the matter.
Keywords: Task scheduling, control-based system design, discrete-time dynamic systems, feedback control, formal assessment (ID#: 15-4439)
URL: http://doi.acm.org/10.1145/2560015

Sebastian Zander, Lachlan L.H. Andrew, Grenville Armitage; Capturing Ghosts: Predicting the Used IPv4 Space by Inferring Unobserved Addresses; IMC '14 Proceedings of the 2014 Conference on Internet Measurement Conference, November 2014, Pages 319-332. Doi: 10.1145/2663716.2663718 Abstract: The pool of unused routable IPv4 prefixes is dwindling, with less than 4% remaining for allocation at the end of June 2014. Yet the adoption of IPv6 remains slow. We demonstrate a new capture-recapture technique for improved estimation of the size of "IPv4 reserves" (allocated yet unused IPv4 addresses or routable prefixes) from multiple incomplete data sources. A key contribution of our approach is the plausible estimation of both observed and unobserved-yet-active (ghost) IPv4 address space. This significantly improves our community's understanding of IPv4 address space exhaustion and likely pressure for IPv6 adoption. Using "ping scans", network traces and server logs we estimate that 6.3 million /24 subnets and 1.2 billion IPv4 addresses are currently in use (roughly 60% and 45% of the publicly routed space respectively). We also show how utilisation has changed over the last 2--3 years and provide an up-to-date estimate of potentially-usable remaining IPv4 space.
Keywords: capture-recapture, used ipv4 space (ID#: 15-4440)
URL: http://doi.acm.org/10.1145/2663716.2663718

Note:

Articles listed on these pages have been found on publicly available internet pages and are cited with links to those pages. Some of the information included herein has been reprinted with permission from the authors or data repositories. Direct any requests via Email to news@scienceofsecurity.net for removal of the links or modifications to specific citations. Please include the ID# of the specific citation in your correspondence.

Printer-friendly version

Cyber Security