Biblio
Phishing is one of the most dangerous information security threats present in the world today, with losses toping 5.9 billion dollars in 2013. Evolving from the original concept of phishing, spear phishing also attempts to scam individuals online, however it uses personalized mail to yield a far higher success rate. This paper suggests an increased threat of spear phishing success due to the presence of social media. Assessing this new threat is important not only to the individuals, but also to companies whose employees may specifically be targeted through their social media accounts. The paper presents the design and implementation of an architecture to determine phishing susceptibility of a user through their social media accounts, and methods to reduce the threat. Preliminary testing shows that social media provides a publicly accessible resource to assess targeted individuals for phishing attacks through their accounts.
Mobile malware has recently become an acute problem. Existing solutions either base static reasoning on syntactic properties, such as exception handlers or configuration fields, or compute data-flow reachability over the program, which leads to scalability challenges. We explore a new and complementary category of features, which strikes a middleground between the above two categories. This new category focuses on security-relevant operations (communcation, lifecycle, etc) –- and in particular, their multiplicity and happens-before order –- as a means to distinguish between malicious and benign applications. Computing these features requires semantic, yet lightweight, modeling of the program's behavior. We have created a malware detection system for Android, MassDroid, that collects traces of security-relevant operations from the call graph via a scalable form of data-flow analysis. These are reduced to happens-before and multiplicity features, then fed into a supervised learning engine to obtain a malicious/benign classification. MassDroid also embodies a novel reporting interface, containing pointers into the code that serve as evidence supporting the determination. We have applied MassDroid to 35,000 Android apps from the wild. The results are highly encouraging with an F-score of 95% in standard testing, and textgreater90% when applied to previously unseen malware signatures. MassDroid is also efficient, requiring about two minutes per app. MassDroid is publicly available as a cloud service for malware detection.
Cryptographic hash functions are used to protect the integrity of information. Hash functions are designed by using existing block ciphers as compression functions. This is due to challenges and difficulties that are encountered in constructing new hash functions from the scratch. However, the key generations for encryption process result to huge computational cost which affects the efficiency of the hash function. This paper proposes a new, secure and efficient compression function based on a pseudorandom function, that takes in two 2n-bits inputs and produce one n-bit output (2n-to-n bit). In addition, a new keyed hash function with three variants is proposed (PinTar 128 bits, 256 bits and 512 bits) which uses the proposed compression as its underlying building block. Statistical analysis shows that the compression function is an efficient one way random function. Similarly, statistical analysis of the keyed hash function shows that the proposed keyed function has strong avalanche property and is resistant to key exhaustive search attack. The proposed key hash function can be used as candidate for developing security systems.
Recent years have seen an exponential growth of the collection and processing of data from heterogeneous sources for a variety of purposes. Several methods and techniques have been proposed to transform and fuse data into "useful" information. However, the security aspects concerning the fusion of sensitive data are often overlooked. This paper investigates the problem of data fusion and derived data control. In particular, we identify the requirements for regulating the fusion process and eliciting restrictions on the access and usage of derived data. Based on these requirements, we propose an attribute-based policy framework to control the fusion of data from different information sources and under the control of different authorities. The framework comprises two types of policies: access control policies, which define the authorizations governing the resources used in the fusion process, and fusion policies, which define constraints on allowed fusion processes. We also discuss how such policies can be obtained for derived data.
The collaborative nature of content development has given rise to the novel problem of multiple ownership in access control, such that a shared resource is administrated simultaneously by co-owners who may have conflicting privacy preferences and/or sharing needs. Prior work has focused on the design of unsupervised conflict resolution mechanisms. Driven by the need for human consent in organizational settings, this paper explores interactive policy negotiation, an approach complementary to that of prior work. Specifically, we propose an extension of Relationship-Based Access Control (ReBAC) to support multiple ownership, in which a policy negotiation protocol is in place for co-owners to come up with and give consent to an access control policy in a structured manner. During negotiation, the draft policy is assessed by formally defined availability criteria: to the second level of the polynomial hierarchy. We devised two algorithms for verifying policy satisfiability, both employing a modern SAT solver for solving subproblems. The performance is found to be adequate for mid-sized organizations.
With data becoming available in larger quantities and at higher rates, new data processing paradigms have been proposed to handle high-volume, fast-moving data. Data Stream Processing is one such paradigm wherein transient data streams flow through sets of continuous queries, only returning results when data is of interest to the querier. To avoid the large costs associated with maintaining the infrastructure required for processing these data streams, many companies will outsource their computation to third-party cloud services. This outsourcing, however, can lead to private data being accessed by parties that a data provider may not trust. The literature offers solutions to this confidentiality and access control problem but they have fallen short of providing a complete solution to these problems, due to either immense overheads or trust requirements placed on these third-party services. To address these issues, we have developed PolyStream, an enhancement to existing data stream management systems that enables data providers to specify attribute-based access control policies that are cryptographically enforced while simultaneously allowing many types of in-network data processing. We detail the access control models and mechanisms used by PolyStream, and describe a novel use of security punctuations that enables flexible, online policy management and key distribution. We detail how queries are submitted and executed using an unmodified Data Stream Management System, and show through an extensive evaluation that PolyStream yields a 550x performance gain versus the state-of-the-art system StreamForce in CODASPY 2014, while providing greater functionality to the querier.
Multi-core is widely used for mobile devices due to high performance and good energy efficiency. For maintaining cores' cache coherency, mobile multi-core integrated new hardware ARM CCI. In this study, we focus on the security aspect of mobile multi-core. We monitor cache coherency operations that occur among PSL related processes' inter-core communication. After simple analysis, we can sneak android PSL information. Some preliminary results show that we could efficiently identify PSL pattern. This is a significant security violation in terms of confidentiality. In addition, mobile multi-cores are already prevalent, the attack is practical, and it can be easily spread.
In recent years, the number of new examples of malware has continued to increase. To create effective countermeasures, security specialists often must manually inspect vast sandbox logs produced by the dynamic analysis method. Conversely, antivirus vendors usually publish malware analysis reports on their website. Because malware analysis reports and sandbox logs do not have direct connections, when analyzing sandbox logs, security specialists can not benefit from the information described in such expert reports. To address this issue, we developed a system called ReGenerator that automates the generation of reports related to sandbox logs by making use of existing reports published by antivirus vendors. Our system combines several techniques, including the Jaccard similarity, Natural Language Processing (NLP), and Generation (NLG), to produce concise human-readable reports describing malicious behavior for security specialists.
This paper presents a new type of online password guessing attack called "WiPING" (Wi-Fi signal-based PIN Guessing attack) to guess a victim's PIN (Personal Identification Number) within a small number of unlock attempts. WiPING uses wireless signal patterns identified from observing sequential finger movements involved in typing a PIN to unlock a mobile device. A list of possible PIN candidates is generated from the wireless signal patterns, and is used to improve performance of PIN guessing attacks. We implemented a proof-of-concept attack to demonstrate the feasibility of WiPING. Our results showed that WiPING could be practically effective: while pure guessing attacks failed to guess all 20 PINs, WiPING successfully guessed two PINs.
In many domains, a plethora of textual information is available on the web as news reports, blog posts, community portals, etc. Information extraction (IE) is the default technique to turn unstructured text into structured fact databases, but systematically applying IE techniques to web input requires highly complex systems, starting from focused crawlers over quality assurance methods to cope with the HTML input to long pipelines of natural language processing and IE algorithms. Although a number of tools for each of these steps exists, their seamless, flexible, and scalable combination into a web scale end-to-end text analytics system still is a true challenge. In this paper, we report our experiences from building such a system for comparing the "web view" on health related topics with that derived from a controlled scientific corpus, i.e., Medline. The system combines a focused crawler, applying shallow text analysis and classification to maintain focus, with a sophisticated text analytic engine inside the Big Data processing system Stratosphere. We describe a practical approach to seed generation which led us crawl a corpus of \textasciitilde1 TB web pages highly enriched for the biomedical domain. Pages were run through a complex pipeline of best-of-breed tools for a multitude of necessary tasks, such as HTML repair, boilerplate detection, sentence detection, linguistic annotation, parsing, and eventually named entity recognition for several types of entities. Results are compared with those from running the same pipeline (without the web-related tasks) on a corpus of 24 million scientific abstracts and a third corpus made of \textasciitilde250K scientific full texts. We evaluate scalability, quality, and robustness of the employed methods and tools. The focus of this paper is to provide a large, real-life use case to inspire future research into robust, easy-to-use, and scalable methods for domain-specific IE at web scale.
We study a sensor network setting in which samples are encrypted individually using different keys and maintained on a cloud storage. For large systems, e.g. those that generate several millions of samples per day, fine-grained sharing of encrypted samples is challenging. Existing solutions, such as Attribute-Based Encryption (ABE) and Key Aggregation Cryptosystem (KAC), can be utilized to address the challenge, but only to a certain extent. They are often computationally expensive and thus unlikely to operate at scale. We propose an algorithmic enhancement and two heuristics to improve KAC's key reconstruction cost, while preserving its provable security. The improvement is particularly significant for range and down-sampling queries – accelerating the reconstruction cost from quadratic to linear running time. Experimental study shows that for queries of size 32k samples, the proposed fast reconstruction techniques speed-up the original KAC by at least 90 times on range and down-sampling queries, and by eight times on general (arbitrary) queries. It also shows that at the expense of splitting the query into 16 sub-queries and correspondingly issuing that number of different aggregated keys, reconstruction time can be reduced by 19 times. As such, the proposed techniques make KAC more applicable in practical scenarios such as sensor networks or the Internet of Things.
We study a sensor network setting in which samples are encrypted individually using different keys and maintained on a cloud storage. For large systems, e.g. those that generate several millions of samples per day, fine-grained sharing of encrypted samples is challenging. Existing solutions, such as Attribute-Based Encryption (ABE) and Key Aggregation Cryptosystem (KAC), can be utilized to address the challenge, but only to a certain extent. They are often computationally expensive and thus unlikely to operate at scale. We propose an algorithmic enhancement and two heuristics to improve KAC's key reconstruction cost, while preserving its provable security. The improvement is particularly significant for range and down-sampling queries – accelerating the reconstruction cost from quadratic to linear running time. Experimental study shows that for queries of size 32k samples, the proposed fast reconstruction techniques speed-up the original KAC by at least 90 times on range and down-sampling queries, and by eight times on general (arbitrary) queries. It also shows that at the expense of splitting the query into 16 sub-queries and correspondingly issuing that number of different aggregated keys, reconstruction time can be reduced by 19 times. As such, the proposed techniques make KAC more applicable in practical scenarios such as sensor networks or the Internet of Things.
With the prevalence of personal Bluetooth devices, potential breach of user privacy has been an increasing concern. To date, sniffing Bluetooth traffic has been widely considered an extremely intricate task due to Bluetooth's indiscoverable mode, vendor-dependent adaptive hopping behavior, and the interference in the open 2.4 GHz band. In this paper, we present BlueEar -a practical Bluetooth traffic sniffer. BlueEar features a novel dual-radio architecture where two Bluetooth-compliant radios coordinate with each other on learning the hopping sequence of indiscoverable Bluetooth networks, predicting adaptive hopping behavior, and mitigating the impacts of RF interference. Experiment results show that BlueEar can maintain a packet capture rate higher than 90% consistently in real-world environments, where the target Bluetooth network exhibits diverse hopping behaviors in the presence of dynamic interference from coexisting Wi-Fi devices. In addition, we discuss the privacy implications of the BlueEar system, and present a practical countermeasure that effectively reduces the packet capture rate of the sniffer to 20%. The proposed countermeasure can be easily implemented on the Bluetooth master device while requiring no modification to slave devices like keyboards and headsets.
Pseudo-random number generators (PRNGs) are a critical infrastructure for cryptography and security of many computer applications. At the same time, PRNGs are surprisingly difficult to design, implement, and debug. This paper presents the first static analysis technique specifically for quality assurance of cryptographic PRNG implementations. The analysis targets a particular kind of implementation defect, the entropy loss. Entropy loss occurs when the entropy contained in the PRNG seed is not utilized to the full extent for generating the pseudo-random output stream. The Debian OpenSSL disaster, probably the most prominent PRNG-related security incident, was one but not the only manifestation of such a defect. Together with the static analysis technique, we present its implementation, a tool named Entroposcope. The tool offers a high degree of automation and practicality. We have applied the tool to five real-world PRNGs of different designs and show that it effectively detects both known and previously unknown instances of entropy loss.
Here we model the indirect costs of deploying security controls in small-to-medium enterprises (SMEs) to manage cyber threats. SMEs may not have the in-house skills and collective capacity to operate controls efficiently, resulting in inadvertent data leakage and exposure to compromise. Aside from financial costs, attempts to maintain security can impact morale, system performance, and retraining requirements, which are modelled here. Managing the overall complexity and effectiveness of an SME's security controls has the potential to reduce unintended leakage. The UK Cyber Essentials Scheme informs basic control definitions, and Available Responsibility Budget (ARB) is modelled to understand how controls can be prioritised for both security and usability. Human factors of security and practical experience of security management for SMEs inform the modelling of deployment challenges across a set of SME archetypes differing in size, complexity, and use of IT. Simple combinations of controls are matched to archetypes, balancing capabilities to protect data assets with the effort demands placed upon employees. Experiments indicate that two-factor authentication can be readily adopted by many SMEs and their employees to protect core assets, followed by correct access privileges and anti-malware software. Service and technology providers emerge as playing an important role in improving access to usable security controls for SMEs.
As everyone knows vulnerability detection is a very difficult and time consuming work, so taking advantage of the unlabeled data sufficiently is needed and helpful. According the above reality, in this paper a method is proposed to predict buffer overflow based on semi-supervised learning. We first employ Antlr to extract AST from C/C++ source files, then according to the 22 buffer overflow attributes taxonomies, a 22-dimension vector is extracted from every function in AST, at last, the vector is leveraged to train a classifier to predict buffer overflow vulnerabilities. The experiment and evaluation indicate our method is correct and efficient.
Malware evolves perpetually and relies on increasingly so- phisticated attacks to supersede defense strategies. Data-driven approaches to malware detection run the risk of becoming rapidly antiquated. Keeping pace with malware requires models that are periodically enriched with fresh knowledge, commonly known as retraining. In this work, we propose the use of Venn-Abers predictors for assessing the quality of binary classification tasks as a first step towards identifying antiquated models. One of the key benefits behind the use of Venn-Abers predictors is that they are automatically well calibrated and offer probabilistic guidance on the identification of nonstationary populations of malware. Our framework is agnostic to the underlying classification algorithm and can then be used for building better retraining strategies in the presence of concept drift. Results obtained over a timeline-based evaluation with about 90K samples show that our framework can identify when models tend to become obsolete.
Vehicular users are expected to consume large amounts of data, for both entertainment and navigation purposes. This will put a strain on cellular networks, which will be able to cope with such a load only if proper caching is in place; this in turn begs the question of which caching architecture is the best-suited to deal with vehicular content consumption. In this paper, we leverage a large-scale, crowd-sourced trace to (i) characterize the vehicular traffic demand, in terms of overall magnitude and content breakup; (ii) assess how different caching approaches perform against such a real-world load; (iii) study the effect of recommendation systems and local content items. We define a price-of-fog metric, expressing the additional caching capacity to deploy when moving from traditional, centralized caching architectures to a "fog computing" approach, where caches are closer to the network edge. We find that for location-specific items, such as the ones that vehicular users are most likely to request, such a price almost disappears. Vehicular networks thus make a strong case for the adoption of mobile-edge caching, as we are able to reap the benefit thereof – including a reduction in the distance travelled by data, within the core network – with little or none of the associated disadvantages.
Popular anonymity mechanisms such as Tor provide low communication latency but are vulnerable to traffic analysis attacks that can de-anonymize users. Moreover, known traffic-analysis-resistant techniques such as Dissent are impractical for use in latency-sensitive settings such as wireless networks. In this paper, we propose PriFi, a low-latency protocol for anonymous communication in local area networks that is provably secure against traffic analysis attacks. This allows members of an organization to access the Internet anonymously while they are on-site, via privacy-preserving WiFi networking, or off-site, via privacy-preserving virtual private networking (VPN). PriFi reduces communication latency using a client/relay/server architecture in which a set of servers computes cryptographic material in parallel with the clients to minimize unnecessary communication latency. We also propose a technique for protecting against equivocation attacks, with which a malicious relay might de-anonymize clients. This is achieved without adding extra latency by encrypting client messages based on the history of all messages they have received so far. As a result, any equivocation attempt makes the communication unintelligible, preserving clients' anonymity while holding the servers accountable.
Workflow-centric tracing captures the workflow of causally-related events (e.g., work done to process a request) within and among the components of a distributed system. As distributed systems grow in scale and complexity, such tracing is becoming a critical tool for understanding distributed system behavior. Yet, there is a fundamental lack of clarity about how such infrastructures should be designed to provide maximum benefit for important management tasks, such as resource accounting and diagnosis. Without research into this important issue, there is a danger that workflow-centric tracing will not reach its full potential. To help, this paper distills the design space of workflow-centric tracing and describes key design choices that can help or hinder a tracing infrastructures utility for important tasks. Our design space and the design choices we suggest are based on our experiences developing several previous workflow-centric tracing infrastructures.
Processing smart grid data for analytics purposes brings about a series of privacy-related risks. In order to allow for the most suitable mitigation strategies, reasonable privacy risks need to be addressed by taking into consideration the perspective of each smart grid stakeholder separately. In this context, we use the notion of privacy concerns to reflect potential privacy risks from the perspective of different smart grid stakeholders. Privacy concerns help to derive privacy goals, which we represent using the goals structuring notation. Thus represented goals can more comprehensibly be addressed through technical and non-technical strategies and solutions. The thread of argumentation - from concerns to goals to strategies and solutions - is presented in form of a privacy case, which is analogous to the safety case used in the automotive domain. We provide an exemplar privacy case for the smart grid developed as part of the Aspern Smart City Research project.
This last decade has witnessed a wide adoption of connected mobile devices able to capture the context of their owners from embedded sensors (GPS, Wi-Fi, Bluetooth, accelerometers). The advent of mobile and pervasive computing has enabled rich social and contextual applications, but the use of such technologies raises severe privacy issues and challenges. The privacy threats come from diverse adversaries, ranging from curious service providers and other users of the same service to eavesdroppers and curious applications running on the device. The information that can be collected from mobile device owners includes their locations, their social relationships, and their current activity. All of this, once analyzed and combined together through inference, can be very telling about the users' private lives. In this talk, we will describe privacy threats in mobile and pervasive networks. We will also show how to quantify the privacy of the users of such networks and explain how information on co-location can be taken into account. We will describe the role that privacy enhancing technologies (PETs) can play and describe some of them. We will also explain how to prevent apps from sifting too many personal data under Android. We will conclude by mentioning the privacy and security challenges raised by the quantified self and digital medicine
The mainstream approach to protecting the privacy of mobile users in location-based services (LBSs) is to alter (e.g., perturb, hide, and so on) the users’ actual locations in order to reduce exposed sensitive information. In order to be effective, a location-privacy preserving mechanism must consider both the privacy and utility requirements of each user, as well as the user’s overall exposed locations (which contribute to the adversary’s background knowledge). In this article, we propose a methodology that enables the design of optimal user-centric location obfuscation mechanisms respecting each individual user’s service quality requirements, while maximizing the expected error that the optimal adversary incurs in reconstructing the user’s actual trace. A key advantage of a user-centric mechanism is that it does not depend on third-party proxies or anonymizers; thus, it can be directly integrated in the mobile devices that users employ to access LBSs. Our methodology is based on the mutual optimization of user/adversary objectives (maximizing location privacy versus minimizing localization error) formalized as a Stackelberg Bayesian game. This formalization makes our solution robust against any location inference attack, that is, the adversary cannot decrease the user’s privacy by designing a better inference algorithm as long as the obfuscation mechanism is designed according to our privacy games. We develop two linear programs that solve the location privacy game and output the optimal obfuscation strategy and its corresponding optimal inference attack. These linear programs are used to design location privacy–preserving mechanisms that consider the correlation between past, current, and future locations of the user, thus can be tuned to protect different privacy objectives along the user’s location trace. We illustrate the efficacy of the optimal location privacy–preserving mechanisms obtained with our approach against real location traces, showing their performance in protecting users’ different location privacy objectives.
A major challenge for utilities is energy theft, wherein malicious actors steal energy for financial gain. One such form of theft in the smart grid is the fraudulent amplification of energy generation measurements from DERs, such as photo-voltaics. It is important to detect this form of malicious activity, but in a way that ensures the privacy of customers. Not considering privacy aspects could result in a backlash from customers and a heavily curtailed deployment of services, for example. In this short paper, we present a novel privacy-preserving approach to the detection of manipulated DER generation measurements.
As information systems become increasingly interdependent, there is an increased need to share cybersecurity data across government agencies and companies, and within and across industrial sectors. This sharing includes threat, vulnerability and incident reporting data, among other data. For cyberattacks that include sociotechnical vectors, such as phishing or watering hole attacks, this increased sharing could expose customer and employee personal data to increased privacy risk. In the US, privacy risk arises when the government voluntarily receives data from companies without meaningful consent from individuals, or without a lawful procedure that protects an individual's right to due process. In this paper, we describe a study to examine the trade-off between the need for potentially sensitive data, which we call incident data usage, and the perceived privacy risk of sharing that data with the government. The study is comprised of two parts: a data usage estimate built from a survey of 76 security professionals with mean eight years' experience; and a privacy risk estimate that measures privacy risk using an ordinal likelihood scale and nominal data types in factorial vignettes. The privacy risk estimate also factors in data purposes with different levels of societal benefit, including terrorism, imminent threat of death, economic harm, and loss of intellectual property. The results show which data types are high-usage, low-risk versus those that are low-usage, high-risk. We discuss the implications of these results and recommend future work to improve privacy when data must be shared despite the increased risk to privacy.