Biblio

List
Filter

Found 72 results

Filters: Keyword is Foundations [Clear All Filters]

2021-02-16

Hongbin, Z., Wei, W., Wengdong, S.. 2020. Safety and Damage Assessment Method of Transmission Line Tower in Goaf Based on Artificial Intelligence. 2020 IEEE/IAS Industrial and Commercial Power System Asia (I CPS Asia). :1474—1479.

The transmission line tower is affected by the surface subsidence in the mined out area of coal mine, which will appear the phenomenon of subsidence, inclination and even tower collapse, threatening the operation safety of the transmission line tower in the mined out area. Therefore, a Safety and Damage Assessment Method of Transmission Line Tower in Goaf Based on Artificial Intelligence is proposed. Firstly, the geometric model of the coal seam in the goaf and the structural reliability model of the transmission line tower are constructed to evaluate the safety. Then, the random forest algorithm in artificial intelligence is used to evaluate the damage of the tower, so as to take protective measures in time. Finally, a finite element simulation model of tower foundation interaction is built, and its safety (force) and damage identification are experimentally analyzed. The results show that the proposed method can ensure high accuracy of damage assessment and reliable judgment of transmission line tower safety within the allowable error.

2019-02-19

Symons, John. 2018. Brute facts about emergence. Brute Facts.

This chapter explores the relationship between the concept of emergence, the goal of theoretical completeness, and the Principle of Sufficient Reason. Samuel Alexander and C. D. Broad argued for limits to the power of scientific explanation. Chemical explanation played a central role in their thinking. After Schrödinger’s work in the 1920s their examples seem to fall flat. However, there are more general lessons from the emergentists that need to be explored. There are cases where we know that explanation of some phenomenon is impossible. What are the implications of known limits to the explanatory power of science, and the apparent ineliminability of brute facts for emergence? One lesson drawn here is that we must embrace a methodological rather than a metaphysical conception of the Principle of Sufficient Reason.

2018-07-30

Schmittle, Matt, Lukina, Anna, Vacek, Lukas, Das, Jnaneshwar, Buskirk, Christopher P., Rees, Stephen, Sztipanovits, Janos, Grosu, Radu, Kumar, Vijay. 2018. OpenUAV: A UAV Testbed for the CPS and Robotics Community. Proceedings of the 9th ACM/IEEE International Conference on Cyber-Physical Systems. :130–139.

Multirotor Unmanned Aerial Vehicles (UAV) have grown in popularity for research and education, overcoming challenges associated with fixed wing and ground robots. Unfortunately, extensive physical testing can be expensive and time consuming because of short flight times due to battery constraints and safety precautions. Simulation tools offer a low barrier to entry and enable testing and validation before field trials. However, most of the well-known simulators today have a high barrier to entry due to the need for powerful computers and the time required for initial set up. In this paper, we present OpenUAV, an open source test bed for UAV education and research that overcomes these barriers. We leverage the Containers as a Service (CaaS) technology to enable students and researchers carry out simulations on the cloud. We have based our framework on open-source tools including ROS, Gazebo, Docker, PX4, and Ansible, we designed the simulation framework so that it has no special hardware requirements. Two use-cases are presented. First, we show how a UAV can navigate around obstacles, and second, we test a multi-UAV swarm formation algorithm. To our knowledge, this is the first open-source, cloud-enabled testbed for UAVs. The code is available on GitHub: https://github.com/Open-UAV.

2018-10-12

Heechul Yun, Michael Bechtel, Elise McEllhiney, Minje Kim. 2018. DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car. IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA). :11-21.

We present DeepPicar, a low-cost deep neural network based autonomous car platform. DeepPicar is a small scale replication of a real self-driving car called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN), which takes images from a front-facing camera as input and produces car steering angles as output. DeepPicar uses the same network architecture—9 layers, 27 million connections and 250K parameters—and can drive itself in real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using DeepPicar, we analyze the Pi 3’s computing capabilities to support end-to-end deep learning based real-time control of autonomous vehicles. We also systematically compare other contemporary embedded computing platforms using the DeepPicar’s CNN-based real-time control workload. We find that all tested platforms, including the Pi 3, are capable of supporting the CNN-based real-time control, from 20 Hz up to 100 Hz, depending on hardware platform. However, we find that shared resource contention remains an important issue that must be considered in applying CNN models on shared memory based embedded computing platforms; we observe up to 11.6X execution time increase in the CNN based control loop due to shared resource contention. To protect the CNN workload, we also evaluate state-of-the-art cache partitioning and memory bandwidth throttling techniques on the Pi 3. We find that cache partitioning is ineffective, while memory bandwidth throttling is an effective solution.

2018-08-07

Platzer, Andre. 2017. Logical Foundations of Cyber-Physical Systems.

Cyber-physical systems (CPSs) combine cyber capabilities, such as computation or communication, with physical capabilities, such as motion or other physical processes. Cars, aircraft, and robots are prime examples, because they move physically in space in a way that is determined by discrete computerized control algorithms. Designing these algorithms is challenging due to their tight coupling with physical behavior, while it is vital that these algorithms be correct because we rely on them for safety-critical tasks.

This textbook teaches undergraduate students the core principles behind CPSs. It shows them how to develop models and controls; identify safety specifications and critical properties; reason rigorously about CPS models; leverage multi-dynamical systems compositionality to tame CPS complexity; identify required control constraints; verify CPS models of appropriate scale in logic; and develop an intuition for operational effects.

The book is supported with homework exercises, lecture videos, and slides.

2017-01-31

Richard T. Carback, David Chaum, Jeremy Clark, Aleksander Essex, Travis Mayberry, Stefan Popoveniuc, Ronald L. Rivest, Emily Shen, Alan T. Sherman, Poorvi L. Vora et al.. 2017. The Scantegrity Voting System and its Use in the Takoma Park Elections. Real-World Electronic Voting: Design, Analysis and Deployment.

2016-04-07

Ke, Liyiming, Li, Bo, Vorobeychik, Yevgeniy. 2016. Behavioral Experiments in Email Filter Evasion.

Despite decades of effort to combat spam, unwanted and even malicious emails, such as phish which aim to deceive recipients into disclosing sensitive information, still routinely find their way into one’s mailbox. To be sure, email filters manage to stop a large fraction of spam emails from ever reaching users, but spammers and phishers have mastered the art of filter evasion, or manipulating the content of email messages to avoid being filtered. We present a unique behavioral experiment designed to study email filter evasion. Our experiment is framed in somewhat broader terms: given the widespread use of machine learning methods for distinguishing spam and non-spam, we investigate how human subjects manipulate a spam template to evade a classification-based filter. We find that adding a small amount of noise to a filter significantly reduces the ability of subjects to evade it, observing that noise does not merely have a short-term impact, but also degrades evasion performance in the longer term. Moreover, we find that greater coverage of an email template by the classifier (filter) features significantly increases the difficulty of evading it. This observation suggests that aggressive feature reduction—a common practice in applied machine learning—can actually facilitate evasion. In addition to the descriptive analysis of behavior, we develop a synthetic model of human evasion behavior which closely matches observed behavior and effectively replicates experimental findings in simulation.

2017-08-02

Chaidos, Pyrros, Cortier, Veronique, Fuchsbauer, Georg, Galindo, David. 2016. BeleniosRF: A Non-interactive Receipt-Free Electronic Voting Scheme. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :1614–1625.

We propose a new voting scheme, BeleniosRF, that offers both receipt-freeness and end-to-end verifiability. It is receipt-free in a strong sense, meaning that even dishonest voters cannot prove how they voted. We provide a game-based definition of receipt-freeness for voting protocols with non-interactive ballot casting, which we name strong receipt-freeness (sRF). To our knowledge, sRF is the first game-based definition of receipt-freeness in the literature, and it has the merit of being particularly concise and simple. Built upon the Helios protocol, BeleniosRF inherits its simplicity and does not require any anti-coercion strategy from the voters. We implement BeleniosRF and show its feasibility on a number of platforms, including desktop computers and smartphones.

2016-09-26

Richeng Jin, Xiaofan He, Huaiyu Dai. 2016. Collaborative IDS Configuration: A Two-layer Game Approach. IEEE Global Conference on Communications (GLOBECOM).

2017-11-13

Böhme, Marcel, Pham, Van-Thuan, Roychoudhury, Abhik. 2016. Coverage-based Greybox Fuzzing As Markov Chain. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :1032–1043.

Coverage-based Greybox Fuzzing (CGF) is a random testing approach that requires no program analysis. A new test is generated by slightly mutating a seed input. If the test exercises a new and interesting path, it is added to the set of seeds; otherwise, it is discarded. We observe that most tests exercise the same few "high-frequency" paths and develop strategies to explore significantly more paths with the same number of tests by gravitating towards low-frequency paths. We explain the challenges and opportunities of CGF using a Markov chain model which specifies the probability that fuzzing the seed that exercises path i generates an input that exercises path j. Each state (i.e., seed) has an energy that specifies the number of inputs to be generated from that seed. We show that CGF is considerably more efficient if energy is inversely proportional to the density of the stationary distribution and increases monotonically every time that seed is chosen. Energy is controlled with a power schedule. We implemented the exponential schedule by extending AFL. In 24 hours, AFLFAST exposes 3 previously unreported CVEs that are not exposed by AFL and exposes 6 previously unreported CVEs 7x faster than AFL. AFLFAST produces at least an order of magnitude more unique crashes than AFL.

2016-04-11

Roy Dong, Walid Krichene, Alexandre M. Bayen, S. Shankar Sastry. 2016. Differential Privacy of Populations in Routing Games. CoRR. abs/1601.04041

As our ground transportation infrastructure modernizes, the large amount of data being measured, transmitted, and stored motivates an analysis of the privacy aspect of these emerging cyber-physical technologies. In this paper, we consider privacy in the routing game, where the origins and destinations of drivers are considered private. This is motivated by the fact that this spatiotemporal information can easily be used as the basis for inferences for a person's activities. More specifically, we consider the differential privacy of the mapping from the amount of flow for each origin-destination pair to the traffic flow measurements on each link of a traffic network. We use a stochastic online learning framework for the population dynamics, which is known to converge to the Nash equilibrium of the routing game. We analyze the sensitivity of this process and provide theoretical guarantees on the convergence rates as well as differential privacy values for these models. We confirm these with simulations on a small example.

2016-04-07

Aron Laszka, Jian Lou, Yevgeniy Vorobeychik. 2016. Multi-Defender Strategic Filtering Against Spear-Phishing Attacks. 30th AAAI Conference on Artificial Intelligence (AAAI).

Spear-phishing attacks pose a serious threat to sensitive computer systems, since they sidestep technical security mechanisms by exploiting the carelessness of authorized users. A common way to mitigate such attacks is to use e-mail filters which block e-mails with a maliciousness score above a chosen threshold. Optimal choice of such a threshold involves a tradeoff between the risk from delivered malicious emails and the cost of blocking benign traffic. A further complicating factor is the strategic nature of an attacker, who may selectively target users offering the best value in terms of likelihood of success and resulting access privileges. Previous work on strategic threshold-selection considered a single organization choosing thresholds for all users. In reality, many organizations are potential targets of such attacks, and their incentives need not be well aligned. We therefore consider the problem of strategic threshold-selection by a collection of independent self-interested users. We characterize both Stackelberg multi-defender equilibria, corresponding to short-term strategic dynamics, as well as Nash equilibria of the simultaneous game between all users and the attacker, modeling long-term dynamics, and exhibit a polynomial-time algorithm for computing short-term (Stackelberg) equilibria. We find that while Stackelberg multi-defender equilibrium need not exist, Nash equilibrium always exists, and remarkably, both equilibria are unique and socially optimal.

2016-04-11

Brad Miller, Alex Kantchelian, Michael Carl Tschantz, Sadia Afroz, Rekha Bachwani, Riyaz Faizullabhoy, Ling Huang, Vaishaal Shankar, Tony Wu, George Yiu et al.. 2015. Back to the Future: Malware Detection with Temporally Consistent Labels. CoRR. abs/1510.07338

The malware detection arms race involves constant change: malware changes to evade detection and labels change as detection mechanisms react. Recognizing that malware changes over time, prior work has enforced temporally consistent samples by requiring that training binaries predate evaluation binaries. We present temporally consistent labels, requiring that training labels also predate evaluation binaries since training labels collected after evaluation binaries constitute label knowledge from the future. Using a dataset containing 1.1 million binaries from over 2.5 years, we show that enforcing temporal label consistency decreases detection from 91% to 72% at a 0.5% false positive rate compared to temporal samples alone.

The impact of temporal labeling demonstrates the potential of improved labels to increase detection results. Hence, we present a detector capable of selecting binaries for submission to an expert labeler for review. At a 0.5% false positive rate, our detector achieves a 72% true positive rate without an expert, which increases to 77% and 89% with 10 and 80 expert queries daily, respectively. Additionally, we detect 42% of malicious binaries initially undetected by all 32 antivirus vendors from VirusTotal used in our evaluation. For evaluation at scale, we simulate the human expert labeler and show that our approach is robust against expert labeling errors. Our novel contributions include a scalable malware detector integrating manual review with machine learning and the examination of temporal label consistency

2015-11-11

Kantchelian, Alex, Tschantz, Michael Carl, Afroz, Sadia, Miller, Brad, Shankar, Vaishaal, Bachwani, Rekha, Joseph, Anthony D., Tygar, J. D.. 2015. Better Malware Ground Truth: Techniques for Weighting Anti-Virus Vendor Labels. Proceedings of the 8th ACM Workshop on Artificial Intelligence and Security. :45–56.

We examine the problem of aggregating the results of multiple anti-virus (AV) vendors' detectors into a single authoritative ground-truth label for every binary. To do so, we adapt a well-known generative Bayesian model that postulates the existence of a hidden ground truth upon which the AV labels depend. We use training based on Expectation Maximization for this fully unsupervised technique. We evaluate our method using 279,327 distinct binaries from VirusTotal, each of which appeared for the rst time between January 2012 and June 2014.

Our evaluation shows that our statistical model is consistently more accurate at predicting the future-derived ground truth than all unweighted rules of the form \k out of n" AV detections. In addition, we evaluate the scenario where partial ground truth is available for model building. We train a logistic regression predictor on the partial label information. Our results show that as few as a 100 randomly selected training instances with ground truth are enough to achieve 80% true positive rate for 0.1% false positive rate. In comparison, the best unweighted threshold rule provides only 60% true positive rate at the same false positive rate.

2015-11-12

Lou, Jian, Vorobeychik, Yevgeniy. 2015. Equilibrium analysis of multi-defender security games. Proceedings of the 24th International Conference on Artificial Intelligence. :596–602.

Stackelberg game models of security have received much attention, with a number of approaches for
computing Stackelberg equilibria in games with a single defender protecting a collection of targets. In contrast, multi-defender security games have received significantly less attention, particularly when each defender protects more than a single target. We fill this gap by considering a multi-defender security game, with a focus on theoretical characterizations of equilibria and the price of anarchy. We present the analysis of three models of increasing generality, two in which each defender protects multiple targets. In all models, we find that the defenders often have the incentive to over protect the targets, at times significantly. Additionally, in the simpler models, we find that the price of anarchy is unbounded, linearly increasing both in the number of defenders and the number of targets per defender. Surprisingly, when we consider a more general model, this results obtains only in a “corner” case in the space of parameters; in most cases, however, the price of anarchy converges to a constant when the number of defenders increases.

Krichene, Walid, Balandat, Maximilian, Tomlin, Claire, Bayen, Alexandre. 2015. The Hedge Algorithm on a Continuum. Proceedings of the 32nd International Conference on Machine Learning (ICML-15). :824-832.

ABSTRACT: We consider an onlinse optimization problem on a compact subset S ⊂ Rⁿ (not necessarily convex), in which a decision maker chooses, at each iteration t, a probability distribution xover S, and seeks to minimize a cumulative expected loss, , where ℓ^(t) is a Lipschitz loss function revealed at the end of iteration t. Building on previous work, we propose a generalized Hedge algorithm and show a bound on the regret when the losses are uniformly Lipschitz and S is uniformly fat (a weaker condition than convexity). Finally, we propose a generalization to the dual averaging method on the set of Lebesgue-continuous distributions over S.

Laszka, Aron, Vorobeychik, Yevgeniy, Koutsoukos, Xenofon. 2015. Integrity Assurance in Resource-bounded Systems Through Stochastic Message Authentication. Proceedings of the 2015 Symposium and Bootcamp on the Science of Security. :1:1–1:12.

Assuring communication integrity is a central problem in security. However, overhead costs associated with cryptographic primitives used towards this end introduce significant practical implementation challenges for resource-bounded systems, such as cyber-physical systems. For example, many control systems are built on legacy components which are computationally limited but have strict timing constraints. If integrity protection is a binary decision, it may simply be infeasible to introduce into such systems; without it, however, an adversary can forge malicious messages, which can cause signicant physical or financial harm. We propose a formal game-theoretic framework for optimal stochastic message authentication, providing provable integrity guarantees for resource-bounded systems based on an existing MAC scheme. We use our framework to investigate attacker deterrence, as well as optimal design of stochastic message authentication schemes when deterrence is impossible. Finally, we provide experimental results on the computational performance of our framework in practice.

2016-04-08

Dahan, Mathieu, Amin, Saurabh. 2015. Network Flow Routing under Strategic Link Disruptions. arXiv preprint arXiv:1512.09335.

This paper considers a 2-player strategic game for network routing under link disruptions. Player 1 (defender) routes flow through a network to maximize her value of effective flow while facing transportation costs. Player 2 (attacker) simultaneously disrupts one or more links to maximize her value of lost flow but also faces cost of disrupting links. This game is strategically equivalent to a zero-sum game. Linear programming duality and the max-flow min-cut theorem are applied to obtain properties that are satisfied in any mixed Nash equilibrium. In any equilibrium, both players achieve identical payoffs. While the defender's expected transportation cost decreases in attacker's marginal value of lost flow, the attacker's expected cost of attack increases in defender's marginal value of effective flow. Interestingly, the expected amount of effective flow decreases in both these parameters. These results can be viewed as a generalization of the classical max-flow with minimum transportation cost problem to adversarial environments.

2016-04-12

Dong Jin, Illinois Institute of Technology, David Nicol, University of Illinois at Urbana-Champaign. 2015. Parallel Simulation and Virtual-machine-based Emulation of Software-defined Network. ACM Transactions on Modeling and Computer Simulation. 26(1)

The emerging software-defined networking (SDN) technology decouples the control plane from the data plane in a computer network with open and standardized interfaces, and hence opens up the network designers’ options and ability to innovate. The wide adoption of SDN in industry has motivated the development of large-scale, high-fidelity testbeds for evaluation of systems that incorporate SDN. In this article, we develop a framework to support OpenFlow-based SDN simulation and distributed emulation, by leveraging our prior work on a hybrid network testbed with a parallel network simulator and a virtual-machine-based emulation system. We show how to exploit typical SDN controller behaviors to handle performance issues caused by the centralized controller in parallel discrete-event simulation. In particular, we develop an asynchronous synchronization algorithm for passive SDN controllers and design a two-level architecture for active SDN controllers. We evaluate the system performance, showing good scalability. Finally, we present a case study, using the testbed, to evaluate network verification applications in an SDN-based data center network. CCS Concepts: Networks→Network simulations; Computing methodologies→Simulation

2015-11-12

Xia, Weiyi, Kantarcioglu, Murat, Wan, Zhiyu, Heatherly, Raymond, Vorobeychik, Yevgeniy, Malin, Bradley. 2015. Process-Driven Data Privacy. Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. :1021–1030.

The quantity of personal data gathered by service providers via our daily activities continues to grow at a rapid pace. The sharing, and the subsequent analysis of, such data can support a wide range of activities, but concerns around privacy often prompt an organization to transform the data to meet certain protection models (e.g., k-anonymity or E-differential privacy). These models, however, are based on simplistic adversarial frameworks, which can lead to both under- and over-protection. For instance, such models often assume that an adversary attacks a protected record exactly once. We introduce a principled approach to explicitly model the attack process as a series of steps. Specically, we engineer a factored Markov decision process (FMDP) to optimally plan an attack from the adversary's perspective and assess the privacy risk accordingly. The FMDP captures the uncertainty in the adversary's belief (e.g., the number of identied individuals that match the de-identified data) and enables the analysis of various real world deterrence mechanisms beyond a traditional protection model, such as a penalty for committing an attack. We present an algorithm to solve the FMDP and illustrate its efficiency by simulating an attack on publicly accessible U.S. census records against a real identied resource of over 500,000 individuals in a voter registry. Our results demonstrate that while traditional privacy models commonly expect an adversary to attack exactly once per record, an optimal attack in our model may involve exploiting none, one, or more indiviuals in the pool of candidates, depending on context.

2016-04-08

Abbas, Waseem, Laszka, Aron, Vorobeychik, Yevgeniy, Koutsoukos, Xenofon. 2015. Scheduling Intrusion Detection Systems in Resource-Bounded Cyber-Physical Systems. Proceedings of the First ACM Workshop on Cyber-Physical Systems-Security and/or PrivaCy. :55–66.

In order to be resilient to attacks, a cyber-physical system (CPS) must be able to detect attacks before they can cause significant damage. To achieve this, \emph{intrusion detection systems} (IDS) may be deployed, which can detect attacks and alert human operators, who can then intervene. However, the resource-constrained nature of many CPS poses a challenge, since reliable IDS can be computationally expensive. Consequently, computational nodes may not be able to perform intrusion detection continuously, which means that we have to devise a schedule for performing intrusion detection. While a uniformly random schedule may be optimal in a purely cyber system, an optimal schedule for protecting CPS must also take into account the physical properties of the system, since the set of adversarial actions and their consequences depend on the physical systems. Here, in the context of water distribution networks, we study IDS scheduling problems in two settings and under the constraints on the available battery supplies. In the first problem, the objective is to design, for a given duration of time $T$, scheduling schemes for IDS so that the probability of detecting an attack is maximized within that duration. We propose efficient heuristic algorithms for this general problem and evaluate them on various networks. In the second problem, our objective is to design scheduling schemes for IDS so that the overall lifetime of the network is maximized while ensuring that an intruder attack is always detected. Various strategies to deal with this problem are presented and evaluated for various networks.

Mathieu Dahan, Saurabh Amin. 2015. Security Games in Network Flow Problems. CoRR. abs/1512.09335

2016-04-07

Gan, Jiarui, An, Bo, Vorobeychik, Yevgeniy. 2015. Security Games with Protection Externalities. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. :914–920.

Stackelberg security games have been widely deployed in recent years to schedule security resources. An assumption in most existing security game models is that one security resource assigned to a target only protects that target. However, in many important real-world security scenarios, when a resource is assigned to a target, it exhibits protection externalities: that is, it also protects other "neighbouring" targets. We investigate such Security Games with Protection Externalities (SPEs). First, we demonstrate that computing a strong Stackelberg equilibrium for an SPE is NP-hard, in contrast with traditional Stackelberg security games which can be solved in polynomial time. On the positive side, we propose a novel column generation based approach—CLASPE—to solve SPEs. CLASPE features the following novelties: 1) a novel mixed-integer linear programming formulation for the slave problem; 2) an extended greedy approach with a constant-factor approximation ratio to speed up the slave problem; and 3) a linear-scale linear programming that efficiently calculates the upper bounds of target-defined subproblems for pruning. Our experimental evaluation demonstrates that CLASPE enable us to scale to realistic-sized SPE problem instances.

2016-04-11

Lina Sela Perelman, Waseem Abbas, Xenofon D. Koutsoukos, Saurabh Amin. 2015. Sensor placement for fault location identification in water networks: A minimum test cover approach. CoRR. abs/1507.07134

This paper focuses on the optimal sensor placement problem for the identification of pipe failure locations in large-scale urban water systems. The problem involves selecting the minimum number of sensors such that every pipe failure can be uniquely localized. This problem can be viewed as a minimum test cover (MTC) problem, which is NP-hard. We consider two approaches to obtain approximate solutions to this problem. In the first approach, we transform the MTC problem to a minimum set cover (MSC) problem and use the greedy algorithm that exploits the submodularity property of the MSC problem to compute the solution to the MTC problem. In the second approach, we develop a new \textit{augmented greedy} algorithm for solving the MTC problem. This approach does not require the transformation of the MTC to MSC. Our augmented greedy algorithm provides in a significant computational improvement while guaranteeing the same approximation ratio as the first approach. We propose several metrics to evaluate the performance of the sensor placement designs. Finally, we present detailed computational experiments for a number of real water distribution networks.

2015-11-12

Li, Bo, Vorobeychik, Yevgeniy, Li, Muqun, Malin, Bradley. 2015. Iterative Classification for Sanitizing Large-Scale Datasets. SIAM International Conference on Data Mining.

Cheap ubiquitous computing enables the collectionof massive amounts of personal data in a wide variety of domains.Many organizations aim to share such data while obscuring fea-tures that could disclose identities or other sensitive information.Much of the data now collected exhibits weak structure (e.g.,natural language text) and machine learning approaches havebeen developed to identify and remove sensitive entities in suchdata. Learning-based approaches are never perfect and relyingupon them to sanitize data can leak sensitive information as aconsequence. However, a small amount of risk is permissiblein practice, and, thus, our goal is to balance the value ofdata published and the risk of an adversary discovering leakedsensitive information. We model data sanitization as a gamebetween 1) a publisher who chooses a set of classifiers to applyto data and publishes only instances predicted to be non-sensitiveand 2) an attacker who combines machine learning and manualinspection to uncover leaked sensitive entities (e.g., personal names). We introduce an iterative greedy algorithm for thepublisher that provably executes no more than a linear numberof iterations, and ensures a low utility for a resource-limitedadversary. Moreover, using several real world natural languagecorpora, we illustrate that our greedy algorithm leaves virtuallyno automatically identifiable sensitive instances for a state-of-the-art learning algorithm, while sharing over 93% of the original data, and completes after at most 5 iterations.

Cyber-Physical Systems Virtual Organization

Read-only archive of site from September 29, 2023.

Biblio