Biblio
Online-activity-generated digital traces provide opportunities for novel services and unique insights as demonstrated in, for example, research on mining software repositories. The inability to link these traces within and among systems, such as Twitter, GitHub, or Reddit, inhibit the advances in this area. Furthermore, no single approach to integrate data from these disparate sources is likely to work. We aim to design Foreseer, an extensible framework, to design and evaluate identity matching techniques for public, large, and low-accuracy operational data. Foreseer consists of three functionally independent components designed to address the issues of discovery and preparation, storage and representation, and analysis and linking of traces from disparate online sources. The framework includes a domain specific language for manipulating traces, generating insights, and building novel services. We have applied it in a pilot study of roughly 10TB of data from Twitter, Reddit, and StackExchange including roughly 6M distinct entities and, using basic matching techniques, found roughly 83,000 matches among these sources. We plan to add additional entity extraction and identification algorithms, data from other sources, and design tools for facilitating dynamic ingestion and tagging of incoming data on a more robust infrastructure using Apache Spark or another distributed processing framework. We will then evaluate the utility and effectiveness of the framework in applications ranging from identifying malicious contributors in software repositories to the evaluation of the utility of privacy preservation schemes.
The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as "runtime" in core statistical theory and the lack of a role for statistical concepts such as "risk" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.
In ad-hoc networks, data messages are transmitted from a source wireless node to a destination one along a wireless multihop transmission route consisting of a sequence of intermediate wireless nodes. Each intermediate wireless node forwards data messages to its next-hop wireless node. Here, a wireless signal carrying the data message is broadcasted by using an omni antenna and it is not difficult for a eavesdropper wireless node to overhear the wireless signal to get the data message. Some researches show that it is useful to transmit noise wireless signal which collide to the data message wireless signal in order for interfering the overhearing. However, some special devices such as directional antennas and/or high computation power for complicated signal processing are required. For wireless multihop networks with huge number of wireless nodes, small and cheap wireless nodes are mandatory for construction of the network. This paper proposes the method for interfering the overhearing by the eavesdropper wireless nodes where routing protocol and data message transmission protocol with cooperative noise signal transmissions by 1-hop and 2-hop neighbor wireless nodes of each intermediate wireless node.
With the rising interest of expedient, safe, and high-efficient transportation, vehicular ad hoc networks (VANETs) have turned into a critical technology in smart transportation systems. Because of the high mobility of nodes, VANETs are vulnerable to security attacks. In this paper, we propose a novel framework of software-defined VANETs with trust management. Specifically, we separate the forwarding plane in VANETs from the control plane, which is responsible for the control functionality, such as routing protocols and trust management in VANETs. Using the on-demand distance vector routing (TAODV) protocol as an example, we present a routing protocol named software-defined trust based ad hoc on-demand distance vector routing (SD-TAODV). Simulation results are presented to show the effectiveness of the proposed software-defined VANETs with trust management.
Drones have quickly become ubiquitous for both recreational and serious use. As is frequently the case with new technology in general, their rapid adoption already far exceeds our legal, policy, and social ability to cope with such issues as privacy and interference with well-established commercial and military air space. While the FAA has issued rulings, they will almost certainly be challenged in court as disputes arise, for example, when property owners shoot drones down. It is clear that drones will provide a critical role in smart cities and be connected to, if not directly a part of the IoT (Internet of Things). Drones will provide an essential role in providing network relay connectivity and situational awareness, particularly in disaster assessment and recovery scenarios. As is typical for new network technologies, the deployment of the drone hardware far exceeds our research in protocols – extending our previous understanding of MANETs (mobile ad hoc networks) and DTNs (disruption tolerant networks) – and more importantly, management, control, resilience, security, and privacy concerns. This keynote address will discuss these challenges and consider future research directions.
We propose a distributed and adaptive trust evaluation algorithm (DATEA) to calculate the trust between nodes. First, calculate the communication trust by using the number of data packets between nodes, and predict the trust based on the trend of this value, calculate the comprehensive trust by combining the history trust with the predict value; calculate the energy trust based on the residual energy of nodes; calculate the direct trust by using the communication trust and energy trust. Second, calculate the recommendation trust based on the recommendation reliability and the recommendation familiarity; put forward the adaptively weighting method, and calculate the integrate direct trust by combining the direct trust with recommendation trust. Third, according to the integrate direct trust, considering the factor of trust propagation distance, the indirect trust between nodes is calculated. Simulation experiments show that the proposed algorithm can effectively avoid the attacks of malicious nodes, besides, the calculated direct trust and indirect trust about normal nodes are more conformable to the actual situation.
Mobile Ad Hoc Network (MANET) is a multi-hop temporary and autonomic network comprised of a set of mobile nodes. MANETs have the features of non-center, dynamically changing topology, multi-hop routing, mobile nodes, limited resources and so on, which make it face more threats. Trust evaluation is used to support nodes to cooperate in a secure and trustworthy way through evaluating the trust of participating nodes in MANETs. However, many trust evaluation models proposed for MANETs still have many problems and shortcomings. In this paper, we review the existing researches, then analyze and compare the proposed trust evaluation models by presenting and applying uniform criteria in order to point out a number of open issues and challenges and suggest future research trends.
In this paper, we investigate the impact of information-theoretic secrecy constraint on the capacity and delay of mobile ad hoc networks (MANETs) with mobile legitimate nodes and static eavesdroppers whose location and channel state information (CSI) are both unknown. We assume n legitimate nodes move according to the fast i.i.d. mobility pattern and each desires to communicate with one randomly selected destination node. There are also nv static eavesdroppers located uniformly in the network and we assume the number of eavesdroppers is much larger than that of legitimate nodes, i.e., v textgreater 1. We propose a novel simple secure communication model, i.e., the secure protocol model, and prove its equivalence to the widely accepted secure physical model under a few technical assumptions. Based on the proposed model, a framework of analyzing the secrecy capacity and delay in MANETs is established. Given a delay constraint D, we find that the optimal secrecy throughput capacity is [EQUATION](W((D/n))(2/3), where W is the data rate of each link. We observe that: 1) the capacity-delay tradeoff is independent of the number of eavesdroppers, which indicates that adding more eavesdroppers will not degenerate the performance of the legitimate network as long as v textgreater 1; 2) the capacity-delay tradeoff of our paper outperforms the previous result Θ((1/nψe)) in [11], where ψe = nv–1 = ω(1) is the density of the eavesdroppers. Throughout this paper, for functions f(n) and G(n), we denote f(n) = o(g(n)) if limn→∞ (f(n)/g(n)) = 0; f(n) = ω(g(n)) if g(n) = o(f(n)); f(n) = O(g(n)) if there is a positive constant c such that f(n) ≤ cg(n) for sufficiently large n; f(n) = Ω(g(n))if g(n) = O(f(n)); f(n) = Θ(g(n) if both f(n) = O(g(n)) and f(n) = Omega;(g(n)) hold. Besides, the order notation [EQUATION] omits the polylogarithmic factors for better readability.
A Mobile Ad hoc Network (MANET) is a spontaneous network consisting of wireless nodes which are mobile and self-configuring in nature. Devices in MANET can move freely in any direction independently and change its link frequently to other devices. MANET does not have centralized infrastructure and its characteristics makes this network vulnerable to various kinds of attacks. Data transfer is a major problem due to its nature of unreliable wireless medium. Commonly used technique for secure transmission in wireless network is cryptography. Use of cryptography key is often involved in most of cryptographic techniques. Key management is main component in security issues of MANET and various schemes have been proposed for it. In this paper, a study on various kinds of key management techniques in MANET is presented.
Standardization and harmonization efforts have reached a consensus towards using a special-purpose Vehicular Public-Key Infrastructure (VPKI) in upcoming Vehicular Communication (VC) systems. However, there are still several technical challenges with no conclusive answers; one such an important yet open challenge is the acquisition of short-term credentials, pseudonym: how should each vehicle interact with the VPKI, e.g., how frequently and for how long? Should each vehicle itself determine the pseudonym lifetime? Answering these questions is far from trivial. Each choice can affect both the user privacy and the system performance and possibly, as a result, its security. In this paper, we make a novel systematic effort to address this multifaceted question. We craft three generally applicable policies and experimentally evaluate the VPKI system performance, leveraging two large-scale mobility datasets. We consider the most promising, in terms of efficiency, pseudonym acquisition policies; we find that within this class of policies, the most promising policy in terms of privacy protection can be supported with moderate overhead. Moreover, in all cases, this work is the first to provide tangible evidence that the state-of-the-art VPKI can serve sizable areas or domain with modest computing resources.
Denial of service (DoS) attack is a great threaten to privacy-preserving authentication protocols for low-cost devices such as RFID. During such attack, the legal internal states can be consumed by the DoS attack. Then the attacker can observe the behavior of the attacked tag in authentication to break privacy. Due to the inadequate energy and computing power, the low cost devices can hardly defend against the DoS attacks. In this paper, we propose a new insight of the DoS attack on tags and leverage the attacking behavior as a new source of power harvesting. In this way, a low-cost device such as a tag grows more and more powerful under DoS attack. Finally, it can defend against the DoS attack. We further propose a protocol that enables DoS-defending constant-time privacy-preserving authentication.
With the popularity of cloud computing, database outsourcing has been adopted by many companies. However, database owners may not 100% trust their database service providers. As a result, database privacy becomes a key issue for protecting data from the database service providers. Many researches have been conducted to address this issue, but few of them considered the simultaneous transparent support of existing DBMSs (Database Management Systems), applications and RADTs (Rapid Application Development Tools). A transparent framework based on accessing bridge and mobile app for protecting database privacy with PKI (Public Key Infrastructure) is, therefore, proposed to fill the blank. The framework uses PKI as its security base and encrypts sensitive data with data owners' public keys to protect data privacy. Mobile app is used to control private key and decrypt data, so that accessing sensitive data is completely controlled by data owners in a secure and independent channel. Accessing bridge utilizes database accessing middleware standard to transparently support existing DBMSs, applications and RADTs. This paper presents the framework, analyzes its transparency and security, and evaluates its performance via experiments.
Many applications of mobile computing require the computation of dot-product of two vectors. For examples, the dot-product of an individual's genome data and the gene biomarkers of a health center can help detect diseases in m-Health, and that of the interests of two persons can facilitate friend discovery in mobile social networks. Nevertheless, exposing the inputs of dot-product computation discloses sensitive information about the two participants, leading to severe privacy violations. In this paper, we tackle the problem of privacy-preserving dot-product computation targeting mobile computing applications in which secure channels are hardly established, and the computational efficiency is highly desirable. We first propose two basic schemes and then present the corresponding advanced versions to improve efficiency and enhance privacy-protection strength. Furthermore, we theoretically prove that our proposed schemes can simultaneously achieve privacy-preservation, non-repudiation, and accountability. Our numerical results verify the performance of the proposed schemes in terms of communication and computational overheads.
The rise of sensor-equipped smart phones has enabled a variety of classification based applications that provide personalized services based on user data extracted from sensor readings. However, malicious applications aggressively collect sensitive information from inherent user data without permissions. Furthermore, they can mine sensitive information from user data just in the classification process. These privacy threats raise serious privacy concerns. In this paper, we introduce two new privacy concerns which are inherent-data privacy and latent-data privacy. We propose a framework that enables a data-obfuscation mechanism to be developed easily. It preserves latent-data privacy while guaranteeing satisfactory service quality. The proposed framework preserves privacy against powerful adversaries who have knowledge of users' access pattern and the data-obfuscation mechanism. We validate our framework towards a real classification-orientated dataset. The experiment results confirm that our framework is superior to the basic obfuscation mechanism.
The recent proliferation of human-carried mobile devices has given rise to mobile crowd sensing (MCS) systems that outsource the collection of sensory data to the public crowd equipped with various mobile devices. A fundamental issue in such systems is to effectively incentivize worker participation. However, instead of being an isolated module, the incentive mechanism usually interacts with other components which may affect its performance, such as data aggregation component that aggregates workers' data and data perturbation component that protects workers' privacy. Therefore, different from past literature, we capture such interactive effect, and propose INCEPTION, a novel MCS system framework that integrates an incentive, a data aggregation, and a data perturbation mechanism. Specifically, its incentive mechanism selects workers who are more likely to provide reliable data, and compensates their costs for both sensing and privacy leakage. Its data aggregation mechanism also incorporates workers' reliability to generate highly accurate aggregated results, and its data perturbation mechanism ensures satisfactory protection for workers' privacy and desirable accuracy for the final perturbed results. We validate the desirable properties of INCEPTION through theoretical analysis, as well as extensive simulations.
Emergency evacuations during disasters minimize loss of lives and injuries. It is not surprising that emergency evacuation preparedness is mandatory for organizations in many jurisdictions. In the case of corporations, this requirement translates to considerable expenses, consisting of construction costs, equipment, recruitment, retention and training. In addition, required regular evacuation drills cause recurring expenses and loss of productivity. Any automation to assist in these drills and in actual evacuations can mean savings of costs, time and lives. Evacuation assistance systems rely on attendance systems that often fall short in accuracy, particularly in environments with lot of "non-swipers" (customers, visitors, etc.,). A critical question to answer in the case of an emergency is "How many people are still in the building?". This number is calculated by comparing the number of people gathered at assembly point to the last known number of people inside the building. An IoT based system can enhance the answer to that question by providing the number of people in the building, provide their last known locations in an automated fashion and even automate the reconciliation process. Our proposed system detects the people in the building automatically using multiple channels such as WiFi and motion detection. Such a system needs the ability to link specific identifiers to persons reliably. In this paper we present our statistics and heuristics based solutions for linking detected identifiers as belonging to an actual persons in a privacy preserving manner using IoT technologies.
In this paper, we present a new kind of near-optimal double-layered syndrome-trellis codes (STCs) for spatial domain steganography. The STCs can hide longer message or improve the security with the same-length message comparing to the previous double-layered STCs. In our scheme, according to the theoretical deduction we can more precisely divide the secret payload into two parts which will be embedded in the first layer and the second layer of the cover respectively with binary STCs. When embed the message, we encourage to realize the double-layered embedding by ±1 modifications. But in order to further decrease the modifications and improve the time efficient, we allow few pixels to be modified by ±2. Experiment results demonstrate that while applying this double-layered STCs to the adaptive steganographic algorithms, the embedding modifications become more concentrative and the number decreases, consequently the security of steganography is improved.
Cybernetic closed loop regulators are used to model socio-technical systems in adversarial contexts. Cybernetic principles regarding these idealized control loops are applied to show how the incompleteness of system models enables system exploitation. We consider abstractions as a case study of model incompleteness, and we characterize the ways that attackers and defenders interact in such a formalism. We end by arguing that the science of security is most like a military science, whose foundations are analytical and generative rather than normative.
Behavior-based tracking is an unobtrusive technique that allows observers to monitor user activities on the Internet over long periods of time – in spite of changing IP addresses. Previous work has employed supervised classifiers in order to link the sessions of individual users. However, classifiers need labeled training sessions, which are difficult to obtain for observers. In this paper we show how this limitation can be overcome with an unsupervised learning technique. We present a modified k-means algorithm and evaluate it on a realistic dataset that contains the Domain Name System (DNS) queries of 3,862 users. For this purpose, we simulate an observer that tries to track all users, and an Internet Service Provider that assigns a different IP address to every user on every day. The highest tracking accuracy is achieved within the subgroup of highly active users. Almost all sessions of 73% of the users in this subgroup can be linked over a period of 56 days. 19% of the highly active users can be traced completely, i.e., all their sessions are assigned to a single cluster. This fraction increases to 40% for shorter periods of seven days. As service providers may engage in behavior-based tracking to complement their existing profiling efforts, it constitutes a severe privacy threat for users of online services. Users can defend against behavior-based tracking by changing their IP address frequently, but this is cumbersome at the moment.
In this paper we describe and share with the research community, a significant smartphone dataset obtained from an ongoing long-term data collection experiment. The dataset currently contains 10 billion data records from 30 users collected over a period of 1.6 years and an additional 20 users for 6 months (totaling 50 active users currently participating in the experiment). The experiment involves two smartphone agents: SherLock and Moriarty. SherLock collects a wide variety of software and sensor data at a high sample rate. Moriarty perpetrates various attacks on the user and logs its activities, thus providing labels for the SherLock dataset. The primary purpose of the dataset is to help security professionals and academic researchers in developing innovative methods of implicitly detecting malicious behavior in smartphones. Specifically, from data obtainable without superuser (root) privileges. To demonstrate possible uses of the dataset, we perform a basic malware analysis and evaluate a method of continuous user authentication.
Malware evolves perpetually and relies on increasingly so- phisticated attacks to supersede defense strategies. Data-driven approaches to malware detection run the risk of becoming rapidly antiquated. Keeping pace with malware requires models that are periodically enriched with fresh knowledge, commonly known as retraining. In this work, we propose the use of Venn-Abers predictors for assessing the quality of binary classification tasks as a first step towards identifying antiquated models. One of the key benefits behind the use of Venn-Abers predictors is that they are automatically well calibrated and offer probabilistic guidance on the identification of nonstationary populations of malware. Our framework is agnostic to the underlying classification algorithm and can then be used for building better retraining strategies in the presence of concept drift. Results obtained over a timeline-based evaluation with about 90K samples show that our framework can identify when models tend to become obsolete.
Modelling network traffic is gaining importance to counter modern security threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as a prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in the model training phase. We propose a discriminative model that makes decisions based on a computer's all traffic observed during a predefined time window (5 minutes in our case). The model is trained on traffic samples collected over equally-sized time windows for a large number of computers, where the only labels needed are (human) verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training, the model itself learns discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, and demonstrate that the learned traffic patterns can be interpreted as Indicators of Compromise. We implement the discriminative model as a neural network with special structure reflecting two stacked multi instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) that are typically visited by infected computers.
Malicious Android applications pose serious threats to mobile security. They threaten the data confidentiality and system integrity on Android devices. Monitoring runtime activities serves as an important technique for analyzing dynamic app behaviors. We design a triggering relation model for dynamically analyzing network traffic on Android devices. Our model enables one to infer the dependency of outbound network requests from the device. We describe a new machine learning approach for discovering the dependency of network requests. These request-level dependence relations are used to detect stealthy malware activities. Malicious requests are identified due to the lack of dependency with legitimate triggers. Our prototype is evaluated on 14GB network traffic data and system logs collected from an Android tablet. Experimental results show that our solution achieves a high accuracy (99.1%) in detecting malicious requests sent from new malicious apps.
We propose a formalism to model database-driven systems, called database manipulating systems (DMS). The actions of a (DMS) modify the current instance of a relational database by adding new elements into the database, deleting tuples from the relations and adding tuples to the relations. The elements which are modified by an action are chosen by (full) first-order queries. (DMS) is a highly expressive model and can be thought of as a succinct representation of an infinite state relational transition system, in line with similar models proposed in the literature. We propose monadic second order logic (MSO-FO) to reason about sequences of database instances appearing along a run. Unsurprisingly, the linear-time model checking problem of (DMS) against (MSO-FO) is undecidable. Towards decidability, we propose under-approximate model checking of (DMS), where the under-approximation parameter is the "bound on recency". In a k-recency-bounded run, only the most recent k elements in the current active domain may be modified by an action. More runs can be verified by increasing the bound on recency. Our main result shows that recency-bounded model checking of (DMS) against (MSO-FO) is decidable, by a reduction to the satisfiability problem of MSO over nested words.
Software effort estimation (SEE) is a crucial step in software development. Effort data missing usually occurs in real-world data collection. Focusing on the missing data problem, existing SEE methods employ the deletion, ignoring, or imputation strategy to address the problem, where the imputation strategy was found to be more helpful for improving the estimation performance. Current imputation methods in SEE use classical imputation techniques for missing data imputation, yet these imputation techniques have their respective disadvantages and might not be appropriate for effort data. In this paper, we aim to provide an effective solution for the effort data missing problem. Incompletion includes the drive factor missing case and effort label missing case. We introduce the low-rank recovery technique for addressing the drive factor missing case. And we employ the semi-supervised regression technique to perform imputation in the case of effort label missing. We then propose a novel effort data imputation approach, named low-rank recovery and semi-supervised regression imputation (LRSRI). Experiments on 7 widely used software effort datasets indicate that: (1) the proposed approach can obtain better effort data imputation effects than other methods; (2) the imputed data using our approach can apply to multiple estimators well.