Biblio
Recent advancement of smart devices and wearable tech-nologies greatly enlarges the variety of personal data people can track. Applications and services can leverage such data to provide better life support, but also impose privacy and security threats. Obfuscation schemes, consequently, have been developed to retain data access while mitigate risks. Compared to offering choices of releasing raw data and not releasing at all, we examine the effect of adding a data obfuscation option on users' disclosure decisions when configuring applications' access, and how that effect varies with data types and application contexts. Our online user experiment shows that users are less likely to block data access when the obfuscation option is available except for locations. This effect significantly differs between applications for domain-specific dynamic tracking data, but not for generic personal traits. We further unpack the role of context and discuss the design opportunities.
Mobile devices, such as smarthphones, became a common tool in our daily routine. Mobile Applications (a.k.a. apps) are demanding access to contextual information increasingly. For instance, apps require user's environment data as well as their profiles in order to adapt themselves (interfaces, services, content) according to this context data. Mobile apps with this behavior are known as context-aware applications (CAS). Several software infrastructures have been created to help the development of CAS. However, most of them do not store the contextual data, once mobile devices are resource constrained. They are not built taking into account the privacy of contextual data either, due the fact that apps may expose contextual data, without user consent. This paper addresses these topics by extending an existing middleware platform that help the development of mobile context-aware applications. Our extension aims at store and process the contextual data generated from several mobile devices, using the computational power of the cloud, and the definition of privacy policies, which avoid dissemination of unauthorized contextual data.
One of the main concerns for smartphone users is the quality of apps they download. Before installing any app from the market, users first check its rating and reviews. However, these ratings are not computed by experts and most times are not associated with malicious behavior. In this work, we present an IDS/rating system based on a game theoretic model with crowdsourcing. Our results show that, with minor control over the error in categorizing users and the fraction of experts in the crowd, our system provides proper ratings while flagging all malicious apps.
Increasing cyber-security presents an ongoing challenge to security professionals. Research continuously suggests that online users are a weak link in information security. This research explores the relationship between cyber-security and cultural, personality and demographic variables. This study was conducted in four different countries and presents a multi-cultural view of cyber-security. In particular, it looks at how behavior, self-efficacy and privacy attitude are affected by culture compared to other psychological and demographics variables (such as gender and computer expertise). It also examines what kind of data people tend to share online and how culture affects these choices. This work supports the idea of developing personality based UI design to increase users' cyber-security. Its results show that certain personality traits affect the user cyber-security related behavior across different cultures, which further reinforces their contribution compared to cultural effects.
Large scale datacenters are becoming the compute and data platform of large enterprises, but their scale makes them difficult to secure applications running within. We motivate this setting using a real world complex scenario, and propose a data-driven approach to taming this complexity. We discuss several machine learning problems that arise, in particular focusing on inducing so-called whitelist communication policies, from observing masses of communications among networked computing nodes. Briefly, a whitelist policy specifies which machine, or groups of machines, can talk to which. We present some of the challenges and opportunities, such as noisy and incomplete data, non-stationarity, lack of supervision, challenges of evaluation, and describe some of the approaches we have found promising.
We develop and evaluate a data hiding method that enables smartphones to encrypt and embed sensitive information into carrier streams of sensor data. Our evaluation considers multiple handsets and a variety of data types, and we demonstrate that our method has a computational cost that allows real-time data hiding on smartphones with negligible distortion of the carrier stream. These characteristics make it suitable for smartphone applications involving privacy-sensitive data such as medical monitoring systems and digital forensics tools.
Induction is a successful approach for verification of hardware and software systems. A common practice is to model a system using logical formulas, and then use a decision procedure to verify that some logical formula is an inductive safety invariant for the system. A key ingredient in this approach is coming up with the inductive invariant, which is known as invariant inference. This is a major difficulty, and it is often left for humans or addressed by sound but incomplete abstract interpretation. This paper is motivated by the problem of inductive invariants in shape analysis and in distributed protocols. This paper approaches the general problem of inferring first-order inductive invariants by restricting the language L of candidate invariants. Notice that the problem of invariant inference in a restricted language L differs from the safety problem, since a system may be safe and still not have any inductive invariant in L that proves safety. Clearly, if L is finite (and if testing an inductive invariant is decidable), then inferring invariants in L is decidable. This paper presents some interesting cases when inferring inductive invariants in L is decidable even when L is an infinite language of universal formulas. Decidability is obtained by restricting L and defining a suitable well-quasi-order on the state space. We also present some undecidability results that show that our restrictions are necessary. We further present a framework for systematically constructing infinite languages while keeping the invariant inference problem decidable. We illustrate our approach by showing the decidability of inferring invariants for programs manipulating linked-lists, and for distributed protocols.
The safety of information inside of cloud networks is of interest to the network administrators. In a new insider attack, inside attackers merge confidential information with videos using digital video steganography. The video can be uploaded to video websites, where the information can be distributed online, where it can cost firms millions in damages. Standard behavior based exfiltration detection does not always prevent these attacks. This form of steganography is almost invisible. Existing compressed video steganalysis only detects small-payload watermarks. We develop such a strategy using distributed algorithms and classify videos, then compare existing algorithms to new ones. We find our approach improves on behavior based exfiltration detection, and on the existing online video steganalysis.
A normal computer infected with malware is difficult to detect. There have been several approaches in the last years which analyze the behavior of malware and obtain good results. The malware traffic may be detected, but it is very common to miss-detect normal traffic as malicious and generate false positives. This is specially the case when the methods are tested in real and large networks. The detection errors are generated due to the malware changing and rapidly adapting its domains and patterns to mimic normal connections. To better detect malware infections and separate them from normal traffic we propose to detect the behavior of the group of connections generated by the malware. It is known that malware usually generates various related connections simultaneously and therefore it shows a group pattern. Based on previous experiments, this paper suggests that the behavior of a group of connections can be modelled as a directed cyclic graph with special properties, such as its internal patterns, relationships, frequencies and sequences of connections. By training the group models on known traffic it may be possible to better distinguish between a malware connection and a normal connection.
This paper describes an approach for detecting the presence of domain name system (DNS) tunnels in network traffic. DNS tunneling is a common technique hackers use to establish command and control nodes and to exfiltrate data from networks. To generate the training data sufficient to build models to detect DNS tunneling activity, a penetration testing effort was employed. We extracted features from this data and trained random forest classifiers to distinguish normal DNS activity from tunneling activity. The classifiers successfully detected the presence of tunnels we trained on, and four other types of tunnels that were not a part of the training set.
Steganography is the science of hiding data within data. Either for the good purpose of secret communication or for the bad intention of leaking sensitive confidential data or embedding malicious code or URL. However, many different carrier file formats can be used to hide these data (network, audio, image..etc) but the most common steganography carrier is embedding secret data within images as it is considered to be the best and easiest way to hide all types of files (secret files) within an image using different formats (another image, text, video, virus, URL..etc). To the human eye, the changes in the image appearance with the hidden data can be imperceptible. In fact, images can be more than what we see with our eyes. Therefore, many solutions where proposed to help in detecting these hidden data but each solution have their own strong and weak points either by the limitation of resolving one type of image along with specific hiding technique and or most likely without extracting the hidden data. This paper intends to propose a novel detection approach that will concentrate on detecting any kind of hidden URL in all types of images and extract the hidden URL from the carrier image that used the LSB least significant bit hiding technique.
We present D-ForenRIA, a distributed forensic tool to automatically reconstruct user-sessions in Rich Internet Applications (RIAs), using solely the full HTTP traces of the sessions as input. D-ForenRIA recovers automatically each browser state, reconstructs the DOMs and re-creates screenshots of what was displayed to the user. The tool also recovers every action taken by the user on each state, including the user-input data. Our application domain is security forensics, where sometimes months-old sessions must be quickly reconstructed for immediate inspection. We will demonstrate our tool on a series of RIAs, including a vulnerable banking application created by IBM Security for testing purposes. In that case study, the attacker visits the vulnerable web site, and exploits several vulnerabilities (SQL-injections, XSS...) to gain access to private information and to perform unauthorized transactions. D-ForenRIA can reconstruct the session, including screenshots of all pages seen by the hacker, DOM of each page and the steps taken for unauthorized login and the inputs hacker exploited for the SQL-injection attack. D-ForenRIA is made efficient by applying advanced reconstruction techniques and by using several browsers concurrently to speed up the reconstruction process. Although we developed D-ForenRIA in the context of security forensics, the tool can also be useful in other contexts such as aided RIAs debugging and automated RIAs scanning.
Collaborative filtering plays an essential role in a recommender system, which recommends a list of items to a user by learning behavior patterns from user rating matrix. However, if an attacker has some auxiliary knowledge about a user purchase history, he/she can infer more information about this user. This brings great threats to user privacy. Some methods adopt differential privacy algorithms in collaborative filtering by adding noises to a rating matrix. Although they provide theoretically private results, the influence on recommendation accuracy are not discussed. In this paper, we solve the privacy problem in recommender system in a different way by applying the differential privacy method into the procedure of recommendation. We design two differentially private recommender algorithms with sampling, named Differentially Private Item Based Recommendation with sampling (DP-IR for short) and Differentially Private User Based Recommendation with sampling(DP-UR for short). Both algorithms are based on the exponential mechanism with a carefully designed quality function. Theoretical analyses on privacy of these algorithms are presented. We also investigate the accuracy of the proposed method and give theoretical results. Experiments are performed on real datasets to verify our methods.
As our ground transportation infrastructure modernizes, the large amount of data being measured, transmitted, and stored motivates an analysis of the privacy aspect of these emerging cyber-physical technologies. In this paper, we consider privacy in the routing game, where the origins and destinations of drivers are considered private. This is motivated by the fact that this spatiotemporal information can easily be used as the basis for inferences for a person's activities. More specifically, we consider the differential privacy of the mapping from the amount of flow for each origin-destination pair to the traffic flow measurements on each link of a traffic network. We use a stochastic online learning framework for the population dynamics, which is known to converge to the Nash equilibrium of the routing game. We analyze the sensitivity of this process and provide theoretical guarantees on the convergence rates as well as differential privacy values for these models. We confirm these with simulations on a small example.
There are two broad approaches for differentially private data analysis. The interactive approach aims at developing customized differentially private algorithms for various data mining tasks. The non-interactive approach aims at developing differentially private algorithms that can output a synopsis of the input dataset, which can then be used to support various data mining tasks. In this paper we study the effectiveness of the two approaches on differentially private k-means clustering. We develop techniques to analyze the empirical error behaviors of the existing interactive and non-interactive approaches. Based on the analysis, we propose an improvement of DPLloyd which is a differentially private version of the Lloyd algorithm. We also propose a non-interactive approach EUGkM which publishes a differentially private synopsis for k-means clustering. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of our improvement on DPLloyd and the proposed EUGkM algorithm.
In settings where data instances are generated sequentially or in streaming fashion, online learning algorithms can learn predictors using incremental training algorithms such as stochastic gradient descent. In some security applications such as training anomaly detectors, the data streams may consist of private information or transactions and the output of the learning algorithms may reveal information about the training data. Differential privacy is a framework for quantifying the privacy risk in such settings. This paper proposes two differentially private strategies to mitigate privacy risk when training a classifier for anomaly detection in an online setting. The first is to use a randomized active learning heuristic to screen out uninformative data points in the stream. The second is to use mini-batching to improve classifier performance. Experimental results show how these two strategies can trade off privacy, label complexity, and generalization performance.
Location entropy (LE) is a popular metric for measuring the popularity of various locations (e.g., points-of-interest). Unlike other metrics computed from only the number of (unique) visits to a location, namely frequency, LE also captures the diversity of the users' visits, and is thus more accurate than other metrics. Current solutions for computing LE require full access to the past visits of users to locations, which poses privacy threats. This paper discusses, for the first time, the problem of perturbing location entropy for a set of locations according to differential privacy. The problem is challenging because removing a single user from the dataset will impact multiple records of the database; i.e., all the visits made by that user to various locations. Towards this end, we first derive non-trivial, tight bounds for both local and global sensitivity of LE, and show that to satisfy ε-differential privacy, a large amount of noise must be introduced, rendering the published results useless. Hence, we propose a thresholding technique to limit the number of users' visits, which significantly reduces the perturbation error but introduces an approximation error. To achieve better utility, we extend the technique by adopting two weaker notions of privacy: smooth sensitivity (slightly weaker) and crowd-blending (strictly weaker). Extensive experiments on synthetic and real-world datasets show that our proposed techniques preserve original data distribution without compromising location privacy.
We present an overview of the field of malware analysis with emphasis on issues related to document engineering. We will introduce the field with a discussion of the types of malware, including executable binaries, malicious PDFs, polymorphic malware, ransomware, and exploit kits. We will conclude with our view of important research questions in the field. This is an updated version of last year's tutorial, with more information about web-based malware and malware targeting the Android market.
The wide use of cloud computing and of data outsourcing rises important concerns with regards to data security resulting thus in the necessity of protection mechanisms such as encryption of sensitive data. The recent major theoretical breakthrough of finding the Holy Grail of encryption, i.e. fully homomorphic encryption guarantees the privacy of queries and their results on encrypted data. However, there are only a few studies proposing a practical performance evaluation of the use of homomorphic encryption schemes in order to perform database queries. In this paper, we propose and analyse in the context of a secure framework for a generic database query interpreter two different methods in which client requests are dynamically executed on homomorphically encrypted data. Dynamic compilation of the requests allows to take advantage of the different optimizations performed during an off-line step on an intermediate code representation, taking the form of boolean circuits, and, moreover, to specialize the execution using runtime information. Also, for the returned encrypted results, we assess the complexity and the efficiency of the different protocols proposed in the literature in terms of overall execution time, accuracy and communication overhead.
Using the sparse feature of the signal, compressed sensing theory can take a sample to compress data at a rate lower than the Nyquist sampling rate. The signal must be represented by the sparse matrix, however. Based on the above theory, this article puts forward a sparse degree of adaptive algorithms which can be used for the detection and reconstruction of the underwater target radiation signal. The received underwater target radiation signal, at first, transits the noise energy into signal energy under test by the stochastic resonance system, and then based on Gerschgorin disk criterion, it can make out the number of underwater target radiation signals in order to determine the optimal sparse degree of compressed sensing, and finally, the detection and reconstruction of the original signal can be realized by utilizing the compressed sensing technique. The simulation results show that this method can effectively detect underwater target radiation signals, and they can also be detected quite well under low signal-to-noise ratio(SNR).
Outsourcing a huge amount of local data to remote cloud servers that has been become a significant trend for industries. Leveraging the considerable cloud storage space, industries can also put forward the outsourced data to cloud computing. How to collect the data for computing without loss of privacy and confidentiality is one of the crucial security problems. Searchable encryption technique has been proposed to protect the confidentiality of the outsourced data and the privacy of the corresponding data query. This technique, however, only supporting search functionality, may not be fully applicable to real-world cloud computing scenario whereby secure data search, share as well as computation are needed. This work presents a novel encrypted cloud-based data share and search system without loss of user privacy and data confidentiality. The new system enables users to make conjunctive keyword query over encrypted data, but also allows encrypted data to be efficiently and multiply shared among different users without the need of the "download-decrypt-then-encrypt" mode. As of independent interest, our system provides secure keyword update, so that users can freely and securely update data's keyword field. It is worth mentioning that all the above functionalities do not incur any expansion of ciphertext size, namely, the size of ciphertext remains constant during being searched, shared and keyword-updated. The system is proven secure and meanwhile, the efficiency analysis shows its great potential in being used in large-scale database.
Resource discovery in unstructured peer-to-peer networks causes a search query to be flooded throughout the network via random nodes, leading to security and privacy issues. The owner of the search query does not have control over the transmission of its query through the network. Although algorithms have been proposed for policy-compliant query or data routing in a network, these algorithms mainly deal with authentic route computation and do not provide mechanisms to actually verify the network paths taken by the query. In this work, we propose an approach to deal with the problem of verifying network paths taken by a search query during resource discovery, and detection of malicious forwarding of search query. Our approach aims at being secure and yet very scalable, even in the presence of huge number of nodes in the network.
Recommendation systems become popular in our daily life. It is well known that the more the release of users' personal data, the better the quality of recommendation. However, such services raise serious privacy concerns for users. In this paper, focusing on matrix factorization-based recommendation systems, we propose the first privacy-preserving matrix factorization using fully homomorphic encryption. On inputs of encrypted users' ratings, our protocol performs matrix factorization over the encrypted data and returns encrypted outputs so that the recommendation system knows nothing on rating values and resulting user/item profiles. It provides a way to obfuscate the number and list of items a user rated without harming the accuracy of recommendation, and additionally protects recommender's tuning parameters for business benefit and allows the recommender to optimize the parameters for quality of service. To overcome performance degradation caused by the use of fully homomorphic encryption, we introduce a novel data structure to perform computations over encrypted vectors, which are essential operations for matrix factorization, through secure 2-party computation in part. With the data structure, the proposed protocol requires dozens of times less computation cost over those of previous works. Our experiments on a personal computer with 3.4 GHz 6-cores 64 GB RAM show that the proposed protocol runs in 1.5 minutes per iteration. It is more efficient than Nikolaenko et al.'s work proposed in CCS 2013, in which it took about 170 minutes on two servers with 1.9 GHz 16-cores 128 GB RAM.
Embedded virtualization has emerged as a valuable way to reduce costs, improve software quality, and decrease design time. Additionally, virtualization can enforce the overall system's security from several perspectives. One is security due to separation, where the hypervisor ensures that one domain does not compromise the execution of other domains. At the same time, the advances in the development of IoT applications opened discussions about the security flaws that were introduced by IoT devices. In a few years, billions of these devices will be connected to the cloud exchanging information. This is an opportunity for hackers to exploit their vulnerabilities, endangering applications connected to such devices. At this point, it is inevitable to consider virtualization as a possible approach for IoT security. In this paper we discuss how embedded virtualization could take place on IoT devices as a sound solution for security.
The emphasis on exhaustive passive capturing of images using wearable cameras like Autographer, which is often known as lifelogging has brought into foreground the challenge of preserving privacy, in addition to presenting the vast amount of images in a meaningful way. In this paper, we present a user-study to understand the importance of an array of factors that are likely to influence the lifeloggers to share their lifelog images in their online circle. The findings are a step forward in the emerging area intersecting HCI, and privacy, to help in exploring design directions for privacy mediating techniques in lifelogging applications.