Biblio
Open Science Big Data is emerging as an important area of research and software development. Although there are several high quality frameworks for Big Data, additional capabilities are needed for Open Science Big Data. These include data provenance, citable reusable data, data sources providing links to research literature, relationships to other data and theories, transparent analysis/reproducibility, data privacy, new optimizations/advanced algorithms, data curation, data storage and transfer. An important part of science is explanation of results, ideally leading to theory formation. In this paper, we examine means for supporting the use of theory in big data analytics as well as using big data to assist in theory formation. One approach is to fit data in a way that is compatible with some theory, existing or new. Functional Data Analysis allows precise fitting of data as well as penalties for lack of smoothness or even departure from theoretical expectations. This paper discusses principal differential analysis and related techniques for fitting data where, for example, a time-based process is governed by an ordinary differential equation. Automation in theory formation is also considered. Case studies in the fields of computational economics and finance are considered.
In this work, we introduce a new Markov operator associated with a digraph, which we refer to as a nonlinear Laplacian. Unlike previous Laplacians for digraphs, the nonlinear Laplacian does not rely on the stationary distribution of the random walk process and is well defined on digraphs that are not strongly connected. We show that the nonlinear Laplacian has nontrivial eigenvalues and give a Cheeger-like inequality, which relates the conductance of a digraph and the smallest non-zero eigenvalue of its nonlinear Laplacian. Finally, we apply the nonlinear Laplacian to the analysis of real-world networks and obtain encouraging results.
Social animals as found in fish schools, bird flocks, bee hives, and ant colonies are able to solve highly complex problems in nature. This includes foraging for food, constructing astonishingly complex nests, and evading or defending against predators. Remarkably, these animals in many cases use very simple, decentralized communication mechanisms that do not require a single leader. This makes the animals perform surprisingly well, even in dynamically changing environments. The collective intelligence of such animals is known as swarm intelligence and it has inspired popular and very powerful optimization paradigms, including ant colony optimization (ACO) and particle swarm optimization (PSO). The reasons behind their success are often elusive. We are just beginning to understand when and why swarm intelligence algorithms perform well, and how to use swarm intelligence most effectively. Understanding the fundamental working principles that determine their efficiency is a major challenge. This tutorial will give a comprehensive overview of recent theoretical results on swarm intelligence algorithms, with an emphasis on their efficiency (runtime/computational complexity). In particular, the tutorial will show how techniques for the analysis of evolutionary algorithms can be used to analyze swarm intelligence algorithms and how the performance of swarm intelligence algorithms compares to that of evolutionary algorithms. The results shed light on the working principles of swarm intelligence algorithms, identify the impact of parameters and other design choices on performance, and thus help to use swarm intelligence more effectively. The tutorial will be divided into a first, larger part on ACO and a second, smaller part on PSO. For ACO we will consider simple variants of the MAX-MIN ant system. Investigations of example functions in pseudo-Boolean optimization demonstrate that the choices of the pheromone update strategy and the evaporation rate have a drastic impact on the running time. We further consider the performance of ACO on illustrative problems from combinatorial optimization: constructing minimum spanning trees, solving shortest path problems with and without noise, and finding short tours for the TSP. For particle swarm optimization, the tutorial will cover results on PSO for pseudo-Boolean optimization as well as a discussion of theoretical results in continuous spaces.
Dynamic taint analysis can be used as a defense against low-integrity data in applications with untrusted user interfaces. An important example is defense against XSS and injection attacks in programs with web interfaces. Data sanitization is commonly used in this context, and can be treated as a precondition for endorsement in a dynamic integrity taint analysis. However, sanitization is often incomplete in practice. We develop a model of dynamic integrity taint analysis for Java that addresses imperfect sanitization with an in-depth approach. To avoid false positives, results of sanitization are endorsed for access control (aka prospective security), but are tracked and logged for auditing and accountability (aka retrospective security). We show how this heterogeneous prospective/retrospective mechanism can be specified as a uniform policy, separate from code. We then use this policy to establish correctness conditions for a program rewriting algorithm that instruments code for the analysis. The rewriting itself is a model of existing, efficient Java taint analysis tools.
Traditionally, network and system configurations are static. Attackers have plenty of time to exploit the system's vulnerabilities and thus they are able to choose when to launch attacks wisely to maximize the damage. An unpredictable system configuration can significantly lift the bar for attackers to conduct successful attacks. Recent years, moving target defense (MTD) has been advocated for this purpose. An MTD mechanism aims to introduce dynamics to the system through changing its configuration continuously over time, which we call adaptations. Though promising, the dynamic system reconfiguration introduces overhead to the applications currently running in the system. It is critical to determine the right time to conduct adaptations and to balance the overhead afforded and the security levels guaranteed. This problem is known as the MTD timing problem. Little prior work has been done to investigate the right time in making adaptations. In this paper, we take the first step to both theoretically and experimentally study the timing problem in moving target defenses. For a broad family of attacks including DDoS attacks and cloud covert channel attacks, we model this problem as a renewal reward process and propose an optimal algorithm in deciding the right time to make adaptations with the objective of minimizing the long-term cost rate. In our experiments, both DDoS attacks and cloud covert channel attacks are studied. Simulations based on real network traffic traces are conducted and we demonstrate that our proposed algorithm outperforms known adaptation schemes.
In classical runtime analysis it has been observed that certain working principles of an evolutionary algorithm cannot be understood by only looking at the asymptotic order of the runtime, but that more precise estimates are needed. In this work we demonstrate that the same observation applies to black-box complexity analysis. We prove that the unary unbiased black-box complexity of the classic OneMax function class is n ln(n) – cn ± o(n) for a constant c between 0.2539 and 0.2665. Our analysis yields a simple (1+1)-type algorithm achieving this runtime bound via a fitness-dependent mutation strength. When translated into a fixed-budget perspective, our algorithm with the same budget computes a solution that asymptotically is 13% closer to the optimum (given that the budget is at least 0.2675n).
The migration of many current critical infrastructures, such as power grids and transportations systems, into open publicnetworks has posed many challenges in control systems. Modern control systems face uncertainties not only from the physical world but also from the cyber space. In this paper, we propose a hybrid game-theoretic approach to investigate the coupling between cyber security policy and robust control design. We study in detail the case of cascading failures in industrial control systems and provide a set of coupled optimality criteria in the linear-quadratic case. This approach can be further extended to more general cases of parallel cascading failures.