Biblio
The Sensor Web is evolving into a complex information space, where large volumes of sensor observation data are often consumed by complex applications. Provenance has become an important issue in the Sensor Web, since it allows applications to answer “what”, “when”, “where”, “who”, “why”, and “how” queries related to observations and consumption processes, which helps determine the usability and reliability of data products. This paper investigates characteristics and requirements of provenance in the Sensor Web and proposes an interoperable approach to building a provenance model for the Sensor Web. Our provenance model extends the W3C PROV Data Model with Sensor Web domain vocabularies. It is developed using Semantic Web technologies and thus allows provenance information of sensor observations to be exposed in the Web of Data using the Linked Data approach. A use case illustrates the applicability of the approach.
Compute-intensive simulations typically charge substantial workloads on an online simulation platform backed by limited computing clusters and storage resources. Some (or most) of the simulations initiated by users may accompany input parameters/files that have been already provided by other (or same) users in the past. Unfortunately, these duplicate simulations may aggravate the performance of the platform by drastic consumption of the limited resources shared by a number of users on the platform. To minimize or avoid conducting repeated simulations, we present a novel system, called SUPERMAN (SimUlation ProvEnance Recycling MANager) that can record simulation provenances and recycle the results of past simulations. This system presents a great opportunity to not only reutilize existing results but also perform various analytics helpful for those who are not familiar with the platform. The system also offers interoperability across other systems by collecting the provenances in a standardized format. In our simulated experiments we found that over half of past computing jobs could be answered without actual executions by our system.
Data provenance provides a way for scientists to observe how experimental data originates, conveys process history, and explains influential factors such as experimental rationale and associated environmental factors from system metrics measured at runtime. The US Department of Energy Office of Science Integrated end-to-end Performance Prediction and Diagnosis for Extreme Scientific Workflows (IPPD) project has developed a provenance harvester that is capable of collecting observations from file based evidence typically produced by distributed applications. To achieve this, file based evidence is extracted and transformed into an intermediate data format inspired in part by W3C CSV on the Web recommendations, called the Harvester Provenance Application Interface (HAPI) syntax. This syntax provides a general means to pre-stage provenance into messages that are both human readable and capable of being written to a provenance store, Provenance Environment (ProvEn). HAPI is being applied to harvest provenance from climate ensemble runs for Accelerated Climate Modeling for Energy (ACME) project funded under the U.S. Department of Energy's Office of Biological and Environmental Research (BER) Earth System Modeling (ESM) program. ACME informally provides provenance in a native form through configuration files, directory structures, and log files that contain success/failure indicators, code traces, and performance measurements. Because of its generic format, HAPI is also being applied to harvest tabular job management provenance from Belle II DIRAC scheduler relational database tables as well as other scientific applications that log provenance related information.
Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. The need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.
Summary form only given. Strong light-matter coupling has been recently successfully explored in the GHz and THz [1] range with on-chip platforms. New and intriguing quantum optical phenomena have been predicted in the ultrastrong coupling regime [2], when the coupling strength Ω becomes comparable to the unperturbed frequency of the system ω. We recently proposed a new experimental platform where we couple the inter-Landau level transition of an high-mobility 2DEG to the highly subwavelength photonic mode of an LC meta-atom [3] showing very large Ω/ωc = 0.87. Our system benefits from the collective enhancement of the light-matter coupling which comes from the scaling of the coupling Ω ∝ √n, were n is the number of optically active electrons. In our previous experiments [3] and in literature [4] this number varies from 104-103 electrons per meta-atom. We now engineer a new cavity, resonant at 290 GHz, with an extremely reduced effective mode surface Seff = 4 × 10-14 m2 (FE simulations, CST), yielding large field enhancements above 1500 and allowing to enter the few (textless;100) electron regime. It consist of a complementary metasurface with two very sharp metallic tips separated by a 60 nm gap (Fig.1(a, b)) on top of a single triangular quantum well. THz-TDS transmission experiments as a function of the applied magnetic field reveal strong anticrossing of the cavity mode with linear cyclotron dispersion. Measurements for arrays of only 12 cavities are reported in Fig.1(c). On the top horizontal axis we report the number of electrons occupying the topmost Landau level as a function of the magnetic field. At the anticrossing field of B=0.73 T we measure approximately 60 electrons ultra strongly coupled (Ω/ω- textbartextbar
The Internet of Things (IoT) is envisioned to include billions of pervasive and mission-critical sensors and actuators connected to the (public) Internet. This network of smart devices is expected to generate and have access to vast amounts of information, creating unique opportunities for novel applications but, at the same time raising significant privacy and security concerns that impede its further adoption and development. In this paper, we explore the potential of a blockchain-assisted information distribution system for the IoT. We identify key security requirements of such a system and we discuss how they can be satisfied using blockchains and smart contracts. Furthermore, we present a preliminary design of the system and we identify enabling technologies.
Multi-agent simulations are useful for exploring collective patterns of individual behavior in social, biological, economic, network, and physical systems. However, there is no provenance support for multi-agent models (MAMs) in a distributed setting. To this end, we introduce ProvMASS, a novel approach to capture provenance of MAMs in a distributed memory by combining inter-process identification, lightweight coordination of in-memory provenance storage, and adaptive provenance capture. ProvMASS is built on top of the Multi-Agent Spatial Simulation (MASS) library, a framework that combines multi-agent systems with large-scale fine-grained agent-based models, or MAMs. Unlike other environments supporting MAMs, MASS parallelizes simulations with distributed memory, where agents and spatial data are shared application resources. We evaluate our approach with provenance queries to support three use cases and performance measures. Initial results indicate that our approach can support various provenance queries for MAMs at reasonable performance overhead.
Science is conducted collaboratively, often requiring knowledge sharing about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. In this paper, we present the sciunit, a reusable research object in which aggregated content is recomputable. We describe a Git-like client that efficiently creates, stores, and repeats sciunits. We show through analysis that sciunits repeat computational experiments with minimal storage and processing overhead. Finally, we provide an overview of sharing and reproducible cyberinfrastructure based on sciunits gaining adoption in the domain of geosciences.
Provenance counterfeit and packet loss assaults are measured as threats in the large scale wireless sensor networks which are engaged for diverse application domains. The assortments of information source generate necessitate promising the reliability of information such as only truthful information is measured in the decision procedure. Details about the sensor nodes play an major role in finding trust value of sensor nodes. In this paper, a novel lightweight secure provenance method is initiated for improving the security of provenance data transmission. The anticipated system comprises provenance authentication and renovation at the base station by means of Merkle-Hellman knapsack algorithm based protected provenance encoding in the Bloom filter framework. Side Channel Monitoring (SCM) is exploited for noticing the presence of selfish nodes and packet drop behaviors. This lightweight secure provenance method decreases the energy and bandwidth utilization with well-organized storage and secure data transmission. The investigational outcomes establishes the efficacy and competence of the secure provenance secure system by professionally noticing provenance counterfeit and packet drop assaults which can be seen from the assessment in terms of provenance confirmation failure rate, collection error, packet drop rate, space complexity, energy consumption, true positive rate, false positive rate and packet drop attack detection.
One essential requirement for supporting analytics for Big Medical Data systems is the provision of a suitable level of traceability to data or processes ('Items') in large volumes of data. Systems should be designed from the outset to support usage of such Items across the spectrum of medical use and over time in order to promote traceability, to simplify maintenance and to assist analytics. The philosophy proposed in this paper is to design medical data systems using a 'description-driven' approach in which meta-data and the description of medical items are saved alongside the data, simplifying item re-use over time and thereby enabling the traceability of these items over time and their use in analytics. Details are given of a big data system in neuroimaging to demonstrate aspects of provenance data capture, collaborative analysis and longitudinal information traceability. Evidence is presented that the description-driven approach leads to simplicity of design and ease of maintenance following the adoption of a unified approach to Item management.
Provenance for transactional updates is critical for many applications such as auditing and debugging of transactions. Recently, we have introduced MV-semirings, an extension of the semiring provenance model that supports updates and transactions. Furthermore, we have proposed reenactment, a declarative form of replay with provenance capture, as an efficient and non-invasive method for computing this type of provenance. However, this approach is limited to the snapshot isolation (SI) concurrency control protocol while many real world applications apply the read committed version of snapshot isolation (RC-SI) to improve performance at the cost of consistency. We present non trivial extensions of the model and reenactment approach to be able to compute provenance of RC-SI transactions efficiently. In addition, we develop techniques for applying reenactment across multiple RC-SI transactions. Our experiments demonstrate that our implementation in the GProM system supports efficient re-construction and querying of provenance.
We present ReproZip, the recommended packaging tool for the SIGMOD Reproducibility Review. ReproZip was designed to simplify the process of making an existing computational experiment reproducible across platforms, even when the experiment was put together without reproducibility in mind. The tool creates a self-contained package for an experiment by automatically tracking and identifying all its required dependencies. The researcher can share the package with others, who can then use ReproZip to unpack the experiment, reproduce the findings on their favorite operating system, as well as modify the original experiment for reuse in new research, all with little effort. The demo will consist of examples of non-trivial experiments, showing how these can be packed in a Linux machine and reproduced on different machines and operating systems. Demo visitors will also be able to pack and reproduce their own experiments.
Many mobile services consist of two components: a server providing an API, and an application running on smartphones and communicating with the API. An unresolved problem in this design is that it is difficult for the server to authenticate which app is accessing the API. This causes many security problems. For example, the provider of a private network API has to embed secrets in its official app to ensure that only this app can access the API; however, attackers can uncover the secret by reverse-engineering. As another example, malicious apps may send automatic requests to ad servers to commit ad fraud. In this work, we propose a system that allows network API to authenticate the mobile app that sends each request so that the API can make an informed access control decision. Our system, the Mobile Trusted-Origin Policy, consists of two parts: 1) an app provenance mechanism that annotates outgoing HTTP(S) requests with information about which app generated the network traffic, and 2) a code isolation mechanism that separates code within an app that should have different app provenance signatures into mobile origin. As motivation for our work, we present two previously-unknown families of apps that perform click fraud, and examine how the lack of mobile origin information enables the attacks. Based on our observations, we propose Trusted Cross-Origin Requests to handle point (1), which automatically includes mobile origin information in outgoing HTTP requests. Servers may then decide, based on the mobile origin data, whether to process the request or not. We implement a prototype of our system for Android and evaluate its performance, security, and deployability. We find that our system can achieve our security and utility goals with negligible overhead.
Defenders of enterprise networks have a critical need to quickly identify the root causes of malware and data leakage. Increasingly, USB storage devices are the media of choice for data exfiltration, malware propagation, and even cyber-warfare. We observe that a critical aspect of explaining and preventing such attacks is understanding the provenance of data (i.e., the lineage of data from its creation to current state) on USB devices as a means of ensuring their safe usage. Unfortunately, provenance tracking is not offered by even sophisticated modern devices. This work presents ProvUSB, an architecture for fine-grained provenance collection and tracking on smart USB devices. ProvUSB maintains data provenance by recording reads and writes at the block layer and reliably identifying hosts editing those blocks through attestation over the USB channel. Our evaluation finds that ProvUSB imposes a one-time 850 ms overhead during USB enumeration, but approaches nearly-bare-metal runtime performance (90% of throughput) on larger files during normal execution, and less than 0.1% storage overhead for provenance in real-world workloads. ProvUSB thus provides essential new techniques in the defense of computer systems and USB storage devices.
As digital objects become increasingly important in people's lives, people may need to understand the provenance, or lineage and history, of an important digital object, to understand how it was produced. This is particularly important for objects created from large, multi-source collections of personal data. As the metadata describing provenance, Provenance Data, is commonly represented as a labelled directed acyclic graph, the challenge is to create effective interfaces onto such graphs so that people can understand the provenance of key digital objects. This unsolved problem is especially challenging for the case of novice and intermittent users and complex provenance graphs. We tackle this by creating an interface based on a clustering approach. This was designed to enable users to view provenance graphs, and to simplify complex graphs by combining several nodes. Our core contribution is the design of a prototype interface that supports clustering and its analytic evaluation in terms of desirable properties of visualisation interfaces.
Provenance workflows capture movement and transformation of data in complex environments, such as document management in large organizations, content generation and sharing in in social media, scientific computations, etc. Sharing and processing of provenance workflows brings numerous benefits, e.g., improving productivity in an organization, understanding social media interaction patterns, etc. However, directly sharing provenance may also disclose sensitive information such as confidential business practices, or private details about participants in a social network. We propose an algorithm that privately extracts sequential association rules from provenance workflow datasets. Finding such rules has numerous practical applications, such as capacity planning or identifying hot-spots in provenance graphs. Our approach provides good accuracy and strong privacy, by leveraging on the exponential mechanism of differential privacy. We propose an heuristic that identifies promising candidate rules and makes judicious use of the privacy budget. Experimental results show that the our approach is fast and accurate, and clearly outperforms the state-of-the-art. We also identify influential factors in improving accuracy, which helps in choosing promising directions for future improvement.
In this paper, we propose a new approach to diagnosing problems in complex distributed systems. Our approach is based on the insight that many of the trickiest problems are anomalies. For instance, in a network, problems often affect only a small fraction of the traffic (e.g., perhaps a certain subnet), or they only manifest infrequently. Thus, it is quite common for the operator to have “examples” of both working and non-working traffic readily available – perhaps a packet that was misrouted, and a similar packet that was routed correctly. In this case, the cause of the problem is likely to be wherever the two packets were treated differently by the network. We present the design of a debugger that can leverage this information using a novel concept that we call differential provenance. Differential provenance tracks the causal connections between network states and state changes, just like classical provenance, but it can additionally perform root-cause analysis by reasoning about the differences between two provenance trees. We have built a diagnostic tool that is based on differential provenance, and we have used our tool to debug a number of complex, realistic problems in two scenarios: software-defined networks and MapReduce jobs. Our results show that differential provenance can be maintained at relatively low cost, and that it can deliver very precise diagnostic information; in many cases, it can even identify the precise root cause of the problem.
Collecting and processing provenance, i.e., information describing the production process of some end product, is important in various applications, e.g., to assess quality, to ensure reproducibility, or to reinforce trust in the end product. In the past, different types of provenance meta-data have been proposed, each with a different scope. The first part of the proposed tutorial provides an overview and comparison of these different types of provenance. To put provenance to good use, it is essential to be able to interact with and present provenance data in a user-friendly way. Often, users interested in provenance are not necessarily experts in databases or query languages, as they are typically domain experts of the product and production process for which provenance is collected (biologists, journalists, etc.). Furthermore, in some scenarios, it is difficult to use solely queries for analyzing and exploring provenance data. The second part of this tutorial therefore focuses on enabling users to leverage provenance through adapted visualizations. To this end, we will present some fundamental concepts of visualization before we discuss possible visualizations for provenance.
Contactless communications have become omnipresent in our daily lives, from simple access cards to electronic passports. Such systems are particularly vulnerable to relay attacks, in which an adversary relays the messages from a prover to a verifier. Distance-bounding protocols were introduced to counter such attacks. Lately, there has been a very active research trend on improving the security of these protocols, but also on ensuring strong privacy properties with respect to active adversaries and malicious verifiers. In particular, a difficult threat to address is the terrorist fraud, in which a far-away prover cooperates with a nearby accomplice to fool a verifier. The usual defence against this attack is to make it impossible for the accomplice to succeed unless the prover provides him with enough information to recover his secret key and impersonate him later on. However, the mere existence of a long-term secret key is problematic with respect to privacy. In this paper, we propose a novel approach in which the prover does not leak his secret key but a reusable session key along with a group signature on it. This allows the adversary to impersonate him even without knowing his signature key. Based on this approach, we give the first distance-bounding protocol, called SPADE, integrating anonymity, revocability and provable resistance to standard threat models.
The growing popularity and development of data mining technologies bring serious threat to the security of individual,'s sensitive information. An emerging research topic in data mining, known as privacy-preserving data mining (PPDM), has been extensively studied in recent years. The basic idea of PPDM is to modify the data in such a way so as to perform data mining algorithms effectively without compromising the security of sensitive information contained in the data. Current studies of PPDM mainly focus on how to reduce the privacy risk brought by data mining operations, while in fact, unwanted disclosure of sensitive information may also happen in the process of data collecting, data publishing, and information (i.e., the data mining results) delivering. In this paper, we view the privacy issues related to data mining from a wider perspective and investigate various approaches that can help to protect sensitive information. In particular, we identify four different types of users involved in data mining applications, namely, data provider, data collector, data miner, and decision maker. For each type of user, we discuss his privacy concerns and the methods that can be adopted to protect sensitive information. We briefly introduce the basics of related research topics, review state-of-the-art approaches, and present some preliminary thoughts on future research directions. Besides exploring the privacy-preserving approaches for each type of user, we also review the game theoretical approaches, which are proposed for analyzing the interactions among different users in a data mining scenario, each of whom has his own valuation on the sensitive information. By differentiating the responsibilities of different users with respect to security of sensitive information, we would like to provide some useful insights into the study of PPDM.
Assessing the trustworthiness of sensor data and transmitters of this data is critical for quality assurance. Trust evaluation frameworks utilize data provenance along with the sensed data values to compute the trustworthiness of each data item. However, in a sizeable multi-hop sensor network, provenance information requires a large and variable number of bits in each packet, resulting in high energy dissipation due to the extended period of radio communication. In this paper, we design energy-efficient provenance encoding and construction schemes, which we refer to as Probabilistic Provenance Flow (PPF). Our work demonstrates the feasibility of adapting the Probabilistic Packet Marking (PPM) technique in IP traceback to wireless sensor networks. We design two bit-efficient provenance encoding schemes along with a complementary vanilla scheme. Depending on the network size and bit budget, we select the best method based on mathematical approximations and numerical analysis. We integrate PPF with provenance-based trust frameworks and investigate the trade-off between trustworthiness of data items and transmission overhead. We conduct TOSSIM simulations with realistic wireless links, and perform testbed experiments on 15–20 TelosB motes to demonstrate the effectiveness of PPF. Our results show that the encoding schemes of PPF have identical performance with a low bit budget (∼32-bit), requiring 33% fewer packets and 30% less energy than PPM variants to construct provenance. With a twofold increase in bit budget, PPF with the selected encoding scheme reduces energy consumption by 46–60%.
- « first
- ‹ previous
- 1
- 2
- 3