Biblio
Modern OS kernels including Windows, Linux, and Mac OS all have adopted kernel Address Space Layout Randomization (ASLR), which shifts the base address of kernel code and data into different locations in different runs. Consequently, when performing introspection or forensic analysis of kernel memory, we cannot use any pre-determined addresses to interpret the kernel events. Instead, we must derandomize the address space layout and use the new addresses. However, few efforts have been made to derandomize the kernel address space and yet there are many questions left such as which approach is more efficient and robust. Therefore, we present the first systematic study of how to derandomize a kernel when given a memory snapshot of a running kernel instance. Unlike the derandomization approaches used in traditional memory exploits in which only remote access is available, with introspection and forensics applications, we can use all the information available in kernel memory to generate signatures and derandomize the ASLR. In other words, there exists a large volume of solutions for this problem. As such, in this paper we examine a number of typical approaches to generate strong signatures from both kernel code and data based on the insight of how kernel code and data is updated, and compare them from efficiency (in terms of simplicity, speed etc.) and robustness (e.g., whether the approach is hard to be evaded or forged) perspective. In particular, we have designed four approaches including brute-force code scanning, patched code signature generation, unpatched code signature generation, and read-only pointer based approach, according to the intrinsic behavior of kernel code and data with respect to kernel ASLR. We have gained encouraging results for each of these approaches and the corresponding experimental results are reported in this paper.
Social Networking is fundamentally shifting the way we communicate, sharing idea and form opinions. All people try to use social media for there need, people from every age group are involved in social media site or e-commerce site. Nowadays almost every illegal activity is happened using the social network and instant messages. It means that present system is not capable to found all suspicious words. In this paper, we provided a brief description of problem and review on the different framework developed so far. Propose a better system which can be indentify criminal activity through social networking more efficiently. Use Ontology Based Information Extraction (OBIE) technique to identify domain of word and Association Rule mining to generate rules. Heuristic method checks in user database for malicious users according to predefine elements and Naïve Bayes method is use to identify the context behind the message or post. The experimental result is used for further action on victim by cyber crime department.
We present D-ForenRIA, a distributed forensic tool to automatically reconstruct user-sessions in Rich Internet Applications (RIAs), using solely the full HTTP traces of the sessions as input. D-ForenRIA recovers automatically each browser state, reconstructs the DOMs and re-creates screenshots of what was displayed to the user. The tool also recovers every action taken by the user on each state, including the user-input data. Our application domain is security forensics, where sometimes months-old sessions must be quickly reconstructed for immediate inspection. We will demonstrate our tool on a series of RIAs, including a vulnerable banking application created by IBM Security for testing purposes. In that case study, the attacker visits the vulnerable web site, and exploits several vulnerabilities (SQL-injections, XSS...) to gain access to private information and to perform unauthorized transactions. D-ForenRIA can reconstruct the session, including screenshots of all pages seen by the hacker, DOM of each page and the steps taken for unauthorized login and the inputs hacker exploited for the SQL-injection attack. D-ForenRIA is made efficient by applying advanced reconstruction techniques and by using several browsers concurrently to speed up the reconstruction process. Although we developed D-ForenRIA in the context of security forensics, the tool can also be useful in other contexts such as aided RIAs debugging and automated RIAs scanning.
The combination of (1) hard to eradicate low-level vulnerabilities, (2) a large trusted computing base written in a memory-unsafe language and (3) a desperate need to provide strong software security guarantees, led to the development of protected-module architectures. Such architectures provide strong isolation of protected modules: Security of code and data depends only on a module's own implementation. In this paper we discuss how such protected modules should be written. From an academic perspective it is clear that the future lies with memory-safe languages. Unfortunately, from a business and management perspective, that is a risky path and will remain so in the near future. The use of well-known but memory-unsafe languages such as C and C++ seem inevitable. We argue that the academic world should take another look at the automatic hardening of software written in such languages to mitigate low-level security vulnerabilities. This is a well-studied topic for full applications, but protected-module architectures introduce a new, and much more challenging environment. Porting existing security measures to a protected-module setting without a thorough security analysis may even harm security of the protected modules they try to protect.
Collaborative filtering plays an essential role in a recommender system, which recommends a list of items to a user by learning behavior patterns from user rating matrix. However, if an attacker has some auxiliary knowledge about a user purchase history, he/she can infer more information about this user. This brings great threats to user privacy. Some methods adopt differential privacy algorithms in collaborative filtering by adding noises to a rating matrix. Although they provide theoretically private results, the influence on recommendation accuracy are not discussed. In this paper, we solve the privacy problem in recommender system in a different way by applying the differential privacy method into the procedure of recommendation. We design two differentially private recommender algorithms with sampling, named Differentially Private Item Based Recommendation with sampling (DP-IR for short) and Differentially Private User Based Recommendation with sampling(DP-UR for short). Both algorithms are based on the exponential mechanism with a carefully designed quality function. Theoretical analyses on privacy of these algorithms are presented. We also investigate the accuracy of the proposed method and give theoretical results. Experiments are performed on real datasets to verify our methods.
Complex networks usually expose community structure with groups of nodes sharing many links with the other nodes in the same group and relatively few with the nodes of the rest. This feature captures valuable information about the organization and even the evolution of the network. Over the last decade, a great number of algorithms for community detection have been proposed to deal with the increasingly complex networks. However, the problem of doing this in a private manner is rarely considered. In this paper, we solve this problem under differential privacy, a prominent privacy concept for releasing private data. We analyze the major challenges behind the problem and propose several schemes to tackle them from two perspectives: input perturbation and algorithm perturbation. We choose Louvain method as the back-end community detection for input perturbation schemes and propose the method LouvainDP which runs Louvain algorithm on a noisy super-graph. For algorithm perturbation, we design ModDivisive using exponential mechanism with the modularity as the score. We have thoroughly evaluated our techniques on real graphs of different sizes and verified that ModDivisive steadily gives the best modularity and avg.F1Score on large graphs while LouvainDP outperforms the remaining input perturbation competitors in certain settings.
Location entropy (LE) is a popular metric for measuring the popularity of various locations (e.g., points-of-interest). Unlike other metrics computed from only the number of (unique) visits to a location, namely frequency, LE also captures the diversity of the users' visits, and is thus more accurate than other metrics. Current solutions for computing LE require full access to the past visits of users to locations, which poses privacy threats. This paper discusses, for the first time, the problem of perturbing location entropy for a set of locations according to differential privacy. The problem is challenging because removing a single user from the dataset will impact multiple records of the database; i.e., all the visits made by that user to various locations. Towards this end, we first derive non-trivial, tight bounds for both local and global sensitivity of LE, and show that to satisfy ε-differential privacy, a large amount of noise must be introduced, rendering the published results useless. Hence, we propose a thresholding technique to limit the number of users' visits, which significantly reduces the perturbation error but introduces an approximation error. To achieve better utility, we extend the technique by adopting two weaker notions of privacy: smooth sensitivity (slightly weaker) and crowd-blending (strictly weaker). Extensive experiments on synthetic and real-world datasets show that our proposed techniques preserve original data distribution without compromising location privacy.
Differential privacy is a precise mathematical constraint meant to ensure privacy of individual pieces of information in a database even while queries are being answered about the aggregate. Intuitively, one must come to terms with what differential privacy does and does not guarantee. For example, the definition prevents a strong adversary who knows all but one entry in the database from further inferring about the last one. This strong adversary assumption can be overlooked, resulting in misinterpretation of the privacy guarantee of differential privacy. Herein we give an equivalent definition of privacy using mutual information that makes plain some of the subtleties of differential privacy. The mutual-information differential privacy is in fact sandwiched between ε-differential privacy and (ε,δ)-differential privacy in terms of its strength. In contrast to previous works using unconditional mutual information, differential privacy is fundamentally related to conditional mutual information, accompanied by a maximization over the database distribution. The conceptual advantage of using mutual information, aside from yielding a simpler and more intuitive definition of differential privacy, is that its properties are well understood. Several properties of differential privacy are easily verified for the mutual information alternative, such as composition theorems.
Hardware support for isolated execution (such as Intel SGX) enables development of applications that keep their code and data confidential even while running in a hostile or compromised host. However, automatically verifying that such applications satisfy confidentiality remains challenging. We present a methodology for designing such applications in a way that enables certifying their confidentiality. Our methodology consists of forcing the application to communicate with the external world through a narrow interface, compiling it with runtime checks that aid verification, and linking it with a small runtime that implements the narrow interface. The runtime includes services such as secure communication channels and memory management. We formalize this restriction on the application as Information Release Confinement (IRC), and we show that it allows us to decompose the task of proving confidentiality into (a) one-time, human-assisted functional verification of the runtime to ensure that it does not leak secrets, (b) automatic verification of the application's machine code to ensure that it satisfies IRC and does not directly read or corrupt the runtime's internal state. We present /CONFIDENTIAL: a verifier for IRC that is modular, automatic, and keeps our compiler out of the trusted computing base. Our evaluation suggests that the methodology scales to real-world applications.
Information retrieval methods, especially considering multimedia data, have evolved towards the integration of multiple sources of evidence in the analysis of the relevance of the items considering a given user search task. In this context, for attenuating the semantic gap between low-level features extracted from the content of the digital objects and high-level semantic concepts (objects, categories, etc.) and making the systems adaptive to different user needs, interactive models have brought the user closer to the retrieval loop allowing user-system interaction mainly through implicit or explicit relevance feedback. Analogously, diversity promotion has emerged as an alternative for tackling ambiguous or underspecified queries. Additionally, several works have addressed the issue of minimizing the required user effort on providing relevance assessments while keeping an acceptable overall effectiveness This thesis discusses, proposes, and experimentally analyzes multimodal and interactive diversity-oriented information retrieval methods. This work, comprehensively covers the interactive information retrieval literature and also discusses about recent advances, the great research challenges, and promising research opportunities. We have proposed and evaluated two relevancediversity trade-off enhancement work-flows, which integrate multiple information from images, such as: visual features, textual metadata, geographic information, and user credibility descriptors. In turn, as an integration of interactive retrieval and diversity promotion techniques, for maximizing the coverage of multiple query interpretations/aspects and speeding up the information transfer between the user and the system, we have proposed and evaluated a multimodal online learning-to-rank method trained with relevance feedback over diversified results Our experimental analysis shows that the joint usage of multiple information sources positively impacted the relevance-diversity balancing algorithms. Our results also suggest that the integration of multimodal-relevance-based filtering and reranking is effective on improving result relevance and also boosts diversity promotion methods. Beyond it, with a thorough experimental analysis we have investigated several research questions related to the possibility of improving result diversity and keeping or even improving relevance in interactive search sessions. Moreover, we analyze how much the diversification effort affects overall search session results and how different diversification approaches behave for the different data modalities. By analyzing the overall and per feedback iteration effectiveness, we show that introducing diversity may harm initial results whereas it significantly enhances the overall session effectiveness not only considering the relevance and diversity, but also how early the user is exposed to the same amount of relevant items and diversity
The optimal design of a fault-tolerant quantum computer involves finding an appropriate balance between the burden of large-scale integration of noisy components and the load of improving the reliability of hardware technology. This balance can be evaluated by quantitatively modeling the execution of quantum logic operations on a realistic quantum hardware containing limited computational resources. In this work, we report a complete performance simulation software tool capable of (1) searching the hardware design space by varying resource architecture and technology parameters, (2) synthesizing and scheduling a fault-tolerant quantum algorithm within the hardware constraints, (3) quantifying the performance metrics such as the execution time and the failure probability of the algorithm, and (4) analyzing the breakdown of these metrics to highlight the performance bottlenecks and visualizing resource utilization to evaluate the adequacy of the chosen design. Using this tool, we investigate a vast design space for implementing key building blocks of Shor’s algorithm to factor a 1,024-bit number with a baseline budget of 1.5 million qubits. We show that a trapped-ion quantum computer designed with twice as many qubits and one-tenth of the baseline infidelity of the communication channel can factor a 2,048-bit integer in less than 5 months.
Smart Transportation applications by nature are examples of Vehicular Ad-hoc Network (VANETs) applications where mobile vehicles, roadside units and transportation infrastructure interplay with one another to provide value added services. While there are abundant researches that focused on the communication aspect of such Mobile Ad-hoc Networks, there are few research bodies that target the development of VANET applications. Among the popular VANET applications, a dominant direction is to leverage Cloud infrastructure to execute and deliver applications and services. Recent studies showed that Cloud Computing is not sufficient for many VANET applications due to the mobility of vehicles and the latency sensitive requirements they impose. To this end, Fog Computing has been proposed to leverage computation infrastructure that is closer to the network edge to compliment Cloud Computing in providing latency-sensitive applications and services. However, applications development in Fog environment is much more challenging than in the Cloud due to the distributed nature of Fog systems. In this paper, we investigate how Smart Transportation applications are developed following Fog Computing approach, their challenges and possible mitigation from the state of the arts.
Moving target defense (MTD) is an emerging paradigm in which system defenses dynamically mutate in order to decrease the overall system attack surface. Though the concept is promising, implementations have not been widely adopted. The field has been actively researched for over ten years, and has only produced a small amount of extensively adopted defenses, most notably, address space layout randomization (ASLR). This is despite the fact that there currently exist a variety of moving target implementations and proofs-of-concept. We suspect that this results from the moving target controls breaking critical system dependencies from the perspectives of users and administrators, as well as making things more difficult for attackers. As a result, the impact of the controls on overall system security is not sufficient to overcome the inconvenience imposed on legitimate system users. In this paper, we analyze a successful MTD approach. We study the control's dependency graphs, showing how we use graph theoretic and network properties to predict the effectiveness of the selected control.
In the absence of formal specifications or test oracles, automating testing is made possible by the fact that a program must satisfy certain requirements set down by the programming language. This work describes Randoop, an automatic unit test generator which checks for invariants specified by the Java API. Randoop is able to detect violations to invariants as specified by the Java API and create error tests that reveal related bugs. Randoop is also able to produce regression tests, meant to be added to regression test suites, that capture expected behavior. We discuss additional extensions that we have made to Randoop which expands its capability for the detection of violation of specified invariants. We also examine an optimization and a heuristic for making the invariant checking process more efficient.
Developers often wonder how to implement a certain functionality (e.g., how to parse XML files) using APIs. Obtaining an API usage sequence based on an API-related natural language query is very helpful in this regard. Given a query, existing approaches utilize information retrieval models to search for matching API sequences. These approaches treat queries and APIs as bags-of-words and lack a deep understanding of the semantics of the query. We propose DeepAPI, a deep learning based approach to generate API usage sequences for a given natural language query. Instead of a bag-of-words assumption, it learns the sequence of words in a query and the sequence of associated APIs. DeepAPI adapts a neural language model named RNN Encoder-Decoder. It encodes a word sequence (user query) into a fixed-length context vector, and generates an API sequence based on the context vector. We also augment the RNN Encoder-Decoder by considering the importance of individual APIs. We empirically evaluate our approach with more than 7 million annotated code snippets collected from GitHub. The results show that our approach generates largely accurate API sequences and outperforms the related approaches.
Security is playing a very important and crucial role in the field of network communication system and Internet. Kerberos Authentication Protocol is designed and developed by Massachusetts Institute of Technology (MIT) and it provides authentication by encrypting information and allow clients to access servers in a secure manner. This paper describes the design and implementation of Kerberos using Data Encryption Standard (DES). Data encryption standard (DES) is a private key cryptography system that provides the security in the communication system. Java Development Tool Kit as the front end and ms access as the back end are used for implementation.
We consider the following natural generalization of Binary Search: in a given undirected, positively weighted graph, one vertex is a target. The algorithm’s task is to identify the target by adaptively querying vertices. In response to querying a node q, the algorithm learns either that q is the target, or is given an edge out of q that lies on a shortest path from q to the target. We study this problem in a general noisy model in which each query independently receives a correct answer with probability p textgreater 1/2 (a known constant), and an (adversarial) incorrect one with probability 1 − p. Our main positive result is that when p = 1 (i.e., all answers are correct), log2 n queries are always sufficient. For general p, we give an (almost information-theoretically optimal) algorithm that uses, in expectation, no more than (1 − δ) logn/1 − H(p) + o(logn) + O(log2 (1/δ)) queries, and identifies the target correctly with probability at leas 1 − δ. Here, H(p) = −(p logp + (1 − p) log(1 − p)) denotes the entropy. The first bound is achieved by the algorithm that iteratively queries a 1-median of the nodes not ruled out yet; the second bound by careful repeated invocations of a multiplicative weights algorithm. Even for p = 1, we show several hardness results for the problem of determining whether a target can be found using K queries. Our upper bound of log2 n implies a quasipolynomial-time algorithm for undirected connected graphs; we show that this is best-possible under the Strong Exponential Time Hypothesis (SETH). Furthermore, for directed graphs, or for undirected graphs with non-uniform node querying costs, the problem is PSPACE-complete. For a semi-adaptive version, in which one may query r nodes each in k rounds, we show membership in Σ2k−1 in the polynomial hierarchy, and hardness for Σ2k−5.
There are many techniques to improve software quality. One is using automatic static analysis tools. We have observed, however, that despite the low-cost help they offer, these tools are underused and often discourage beginners. There is evidence that personality traits influence the perceived usability of a software. Thus, to support beginners better, we need to understand how the workflow of people with different prevalent personality traits using these tools varies. For this purpose, we observed users' solution strategies and correlated them with their prevalent personality traits in an exploratory study with student participants within a controlled experiment. We gathered data by screen capturing and chat protocols as well as a Big Five personality traits test. We found strong correlations between particular personality traits and different strategies of removing the findings of static code analysis as well as between personality and tool utilization. Based on that, we offer take-away improvement suggestions. Our results imply that developers should be aware of these solution strategies and use this information to build tools that are more appealing to people with different prevalent personality traits.
Crowdsourcing is an unique and practical approach to obtain personalized data and content. Its impact is especially significant in providing commentary, reviews and metadata, on a variety of location based services. In this study, we examine reliability of the Waze mapping service, and its vulnerability to a variety of location-based attacks. Our goals are to understand the severity of the problem, shed light on the general problem of location and device authentication, and explore the efficacy of potential defenses. Our preliminary results already show that a single attacker with limited resources can cause havoc on Waze, producing ``virtual'' congestion and accidents, automatically re-routing user traffic, and compromising user privacy by tracking users' precise movements via software while staying undetected. To defend against these attacks, we propose a proximity-based Sybil detection method to filter out malicious devices.
A distributed detection method is proposed to detect single stage multi-point (SSMP) attacks on a Cyber Physical System (CPS). Such attacks aim at compromising two or more sensors or actuators at any one stage of a CPS and could totally compromise a controller and prevent it from detecting the attack. However, as demonstrated in this work, using the flow properties of water from one stage to the other, a neighboring controller was found effective in detecting such attacks. The method is based on physical invariants derived for each stage of the CPS from its design. The attack detection effectiveness of the method was evaluated experimentally against an operational water treatment testbed containing 42 sensors and actuators. Results from the experiments point to high effectiveness of the method in detecting a variety of SSMP attacks but also point to its limitations. Distributing the attack detection code among various controllers adds to the scalability of the proposed method.
Humans can easily find themselves in high cost situations where they must choose between suggestions made by an automated decision aid and a conflicting human decision aid. Previous research indicates that humans often rely on automation or other humans, but not both simultaneously. Expanding on previous work conducted by Lyons and Stokes (2012), the current experiment measures how trust in automated or human decision aids differs along with perceived risk and workload. The simulated task required 126 participants to choose the safest route for a military convoy; they were presented with conflicting information from an automated tool and a human. Results demonstrated that as workload increased, trust in automation decreased. As the perceived risk increased, trust in the human decision aid increased. Individual differences in dispositional trust correlated with an increased trust in both decision aids. These findings can be used to inform training programs for operators who may receive information from human and automated sources. Examples of this context include: air traffic control, aviation, and signals intelligence.
Hashing based methods have attracted considerable attention for efficient cross-modal retrieval on large-scale multimedia data. The core problem of cross-modal hashing is how to effectively integrate heterogeneous features from different modalities to learn hash functions using available supervising information, e.g., class labels. Existing hashing based methods generally project heterogeneous features to a common space for hash codes generation, and the supervising information is incrementally used for improving performance. However, these methods may produce ineffective hash codes, due to the failure to explore the discriminative property of supervising information and to effectively bridge the semantic gap between different modalities. To address these challenges, we propose a novel hashing based method in a linear classification framework, in which the proposed method learns modality-specific hash functions for generating unified binary codes, and these binary codes are viewed as representative features for discriminative classification with class labels. An effective optimization algorithm is developed for the proposed method to jointly learn the modality-specific hash function, the unified binary codes and a linear classifier. Extensive experiments on three benchmark datasets highlight the advantage of the proposed method and show that it achieves the state-of-the-art performance.
The usual approach to security for cloud-hosted applications is strong separation. However, it is often the case that the same data is used by different applications, particularly given the increase in data-driven (`big data' and IoT) applications. We argue that access control for the cloud should no longer be application-specific but should be data-centric, associated with the data that can flow between applications. Indeed, the data may originate outside cloud services from diverse sources such as medical monitoring, environmental sensing etc. Information Flow Control (IFC) potentially offers data-centric, system-wide data access control. It has been shown that IFC can be provided at operating system level as part of a PaaS offering, with an acceptable overhead. In this paper we consider how IFC can be integrated with application-specific access control, transparently from application developers, while building from simple IFC primitives, access control policies that align with the data management obligations of cloud providers and tenants.
IEEE 802.15.4 is widely used as lower layers for not only wellknown wireless communication standards such as ZigBee, 6LoWPAN, and WirelessHART, but also customized protocols developed by manufacturers, particularly for various Internet of Things (IoT) devices. Customized protocols are not usually publicly disclosed nor standardized. Moreover, unlike textual protocols (e.g., HTTP, SMTP, POP3.), customized protocols for IoT devices provide no clues such as strings or keywords that are useful for analysis. Instead, they use bits or bytes to represent header and body information in order to save power and bandwidth. On the other hand, they often do not employ encryption, fragmentation, or authentication to save cost and effort in implementations. In other words, their security relies only on the confidentiality of the protocol itself. In this paper, we introduce a novel methodology to analyze and reconstruct unknown wireless customized protocols over IEEE 802.15.4. Based on this methodology, we develop an automatic analysis and spoofing tool called WPAN automatic spoofer (WASp) that can be used to understand and reconstruct customized protocols to byte-level accuracy, and to generate packets that can be used for verification of analysis results or spoofing attacks. The methodology consists of four phases: packet collection, packet grouping, protocol analysis, and packet generation. Except for the packet collection step, all steps are fully automated. Although the use of customized protocols is also unknown before the collecting phase, we choose two real-world target systems for evaluation: the smart plug system and platform screen door (PSD) to evaluate our methodology and WASp. In the evaluation, 7,299 and 217 packets are used as datasets for both target systems, respectively. As a result, on average, WASp is found to reduce entropy of legitimate message space by 93.77% and 88.11% for customized protocols used in smart plug and PSD systems, respectively. In addition, on average, 48.19% of automatically generated packets are successfully spoofed for the first target systems.