Visible to the public Biblio

Found 6023 results

Filters: Keyword is Scalability  [Clear All Filters]
2017-05-22
Saab, Farah, Elhajj, Imad, Kayssi, Ayman, Chehab, Ali.  2016.  A Crowdsourcing Game-theoretic Intrusion Detection and Rating System. Proceedings of the 31st Annual ACM Symposium on Applied Computing. :622–625.

One of the main concerns for smartphone users is the quality of apps they download. Before installing any app from the market, users first check its rating and reviews. However, these ratings are not computed by experts and most times are not associated with malicious behavior. In this work, we present an IDS/rating system based on a game theoretic model with crowdsourcing. Our results show that, with minor control over the error in categorizing users and the fraction of experts in the crowd, our system provides proper ratings while flagging all malicious apps.

Xu, Haifeng.  2016.  The Mysteries of Security Games: Equilibrium Computation Becomes Combinatorial Algorithm Design. Proceedings of the 2016 ACM Conference on Economics and Computation. :497–514.

The security game is a basic model for resource allocation in adversarial environments. Here there are two players, a defender and an attacker. The defender wants to allocate her limited resources to defend critical targets and the attacker seeks his most favorable target to attack. In the past decade, there has been a surge of research interest in analyzing and solving security games that are motivated by applications from various domains. Remarkably, these models and their game-theoretic solutions have led to real-world deployments in use by major security agencies like the LAX airport, the US Coast Guard and Federal Air Marshal Service, as well as non-governmental organizations. Among all these research and applications, equilibrium computation serves as a foundation. This paper examines security games from a theoretical perspective and provides a unified view of various security game models. In particular, each security game can be characterized by a set system E which consists of the defender's pure strategies; The defender's best response problem can be viewed as a combinatorial optimization problem over E. Our framework captures most of the basic security game models in the literature, including all the deployed systems; The set system E arising from various domains encodes standard combinatorial problems like bipartite matching, maximum coverage, min-cost flow, packing problems, etc. Our main result shows that equilibrium computation in security games is essentially a combinatorial problem. In particular, we prove that, for any set system \$E\$, the following problems can be reduced to each other in polynomial time: (0) combinatorial optimization over E; (1) computing the minimax equilibrium for zero-sum security games over E; (2) computing the strong Stackelberg equilibrium for security games over E; (3) computing the best or worst (for the defender) Nash equilibrium for security games over E. Therefore, the hardness [polynomial solvability] of any of these problems implies the hardness [polynomial solvability] of all the others. Here, by "games over E" we mean the class of security games with arbitrary payoff structures, but a fixed set E of defender pure strategies. This shows that the complexity of a security game is essentially determined by the set system E. We view drawing these connections as an important conceptual contribution of this paper.

Wright, Mason, Venkatesan, Sridhar, Albanese, Massimiliano, Wellman, Michael P..  2016.  Moving Target Defense Against DDoS Attacks: An Empirical Game-Theoretic Analysis. Proceedings of the 2016 ACM Workshop on Moving Target Defense. :93–104.

Distributed denial-of-service attacks are an increasing problem facing web applications, for which many defense techniques have been proposed, including several moving-target strategies. These strategies typically work by relocating targeted services over time, increasing uncertainty for the attacker, while trying not to disrupt legitimate users or incur excessive costs. Prior work has not shown, however, whether and how a rational defender would choose a moving-target method against an adaptive attacker, and under what conditions. We formulate a denial-of-service scenario as a two-player game, and solve a restricted-strategy version of the game using the methods of empirical game-theoretic analysis. Using agent-based simulation, we evaluate the performance of strategies from prior literature under a variety of attacks and environmental conditions. We find evidence for the strategic stability of various proposed strategies, such as proactive server movement, delayed attack timing, and suspected insider blocking, along with guidelines for when each is likely to be most effective.

Zhu, Xue, Sun, Yuqing.  2016.  Differential Privacy for Collaborative Filtering Recommender Algorithm. Proceedings of the 2016 ACM on International Workshop on Security And Privacy Analytics. :9–16.

Collaborative filtering plays an essential role in a recommender system, which recommends a list of items to a user by learning behavior patterns from user rating matrix. However, if an attacker has some auxiliary knowledge about a user purchase history, he/she can infer more information about this user. This brings great threats to user privacy. Some methods adopt differential privacy algorithms in collaborative filtering by adding noises to a rating matrix. Although they provide theoretically private results, the influence on recommendation accuracy are not discussed. In this paper, we solve the privacy problem in recommender system in a different way by applying the differential privacy method into the procedure of recommendation. We design two differentially private recommender algorithms with sampling, named Differentially Private Item Based Recommendation with sampling (DP-IR for short) and Differentially Private User Based Recommendation with sampling(DP-UR for short). Both algorithms are based on the exponential mechanism with a carefully designed quality function. Theoretical analyses on privacy of these algorithms are presented. We also investigate the accuracy of the proposed method and give theoretical results. Experiments are performed on real datasets to verify our methods.

Barthe, Gilles, Fong, Noémie, Gaboardi, Marco, Grégoire, Benjamin, Hsu, Justin, Strub, Pierre-Yves.  2016.  Advanced Probabilistic Couplings for Differential Privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :55–67.

Differential privacy is a promising formal approach to data privacy, which provides a quantitative bound on the privacy cost of an algorithm that operates on sensitive information. Several tools have been developed for the formal verification of differentially private algorithms, including program logics and type systems. However, these tools do not capture fundamental techniques that have emerged in recent years, and cannot be used for reasoning about cutting-edge differentially private algorithms. Existing techniques fail to handle three broad classes of algorithms: 1) algorithms where privacy depends on accuracy guarantees, 2) algorithms that are analyzed with the advanced composition theorem, which shows slower growth in the privacy cost, 3) algorithms that interactively accept adaptive inputs. We address these limitations with a new formalism extending apRHL, a relational program logic that has been used for proving differential privacy of non-interactive algorithms, and incorporating aHL, a (non-relational) program logic for accuracy properties. We illustrate our approach through a single running example, which exemplifies the three classes of algorithms and explores new variants of the Sparse Vector technique, a well-studied algorithm from the privacy literature. We implement our logic in EasyCrypt, and formally verify privacy. We also introduce a novel coupling technique called optimal subset coupling that may be of independent interest.

Day, Wei-Yen, Li, Ninghui, Lyu, Min.  2016.  Publishing Graph Degree Distribution with Node Differential Privacy. Proceedings of the 2016 International Conference on Management of Data. :123–138.

Graph data publishing under node-differential privacy (node-DP) is challenging due to the huge sensitivity of queries. However, since a node in graph data oftentimes represents a person, node-DP is necessary to achieve personal data protection. In this paper, we investigate the problem of publishing the degree distribution of a graph under node-DP by exploring the projection approach to reduce the sensitivity. We propose two approaches based on aggregation and cumulative histogram to publish the degree distribution. The experiments demonstrate that our approaches greatly reduce the error of approximating the true degree distribution and have significant improvement over existing works. We also present the introspective analysis for understanding the factors of publishing the degree distribution with node-DP.

Nguyen, Hiep H., Imine, Abdessamad, Rusinowitch, Michaël.  2016.  Detecting Communities Under Differential Privacy. Proceedings of the 2016 ACM on Workshop on Privacy in the Electronic Society. :83–93.

Complex networks usually expose community structure with groups of nodes sharing many links with the other nodes in the same group and relatively few with the nodes of the rest. This feature captures valuable information about the organization and even the evolution of the network. Over the last decade, a great number of algorithms for community detection have been proposed to deal with the increasingly complex networks. However, the problem of doing this in a private manner is rarely considered. In this paper, we solve this problem under differential privacy, a prominent privacy concept for releasing private data. We analyze the major challenges behind the problem and propose several schemes to tackle them from two perspectives: input perturbation and algorithm perturbation. We choose Louvain method as the back-end community detection for input perturbation schemes and propose the method LouvainDP which runs Louvain algorithm on a noisy super-graph. For algorithm perturbation, we design ModDivisive using exponential mechanism with the modularity as the score. We have thoroughly evaluated our techniques on real graphs of different sizes and verified that ModDivisive steadily gives the best modularity and avg.F1Score on large graphs while LouvainDP outperforms the remaining input perturbation competitors in certain settings.

Qin, Zhan, Yang, Yin, Yu, Ting, Khalil, Issa, Xiao, Xiaokui, Ren, Kui.  2016.  Heavy Hitter Estimation over Set-Valued Data with Local Differential Privacy. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :192–203.

In local differential privacy (LDP), each user perturbs her data locally before sending the noisy data to a data collector. The latter then analyzes the data to obtain useful statistics. Unlike the setting of centralized differential privacy, in LDP the data collector never gains access to the exact values of sensitive data, which protects not only the privacy of data contributors but also the collector itself against the risk of potential data leakage. Existing LDP solutions in the literature are mostly limited to the case that each user possesses a tuple of numeric or categorical values, and the data collector computes basic statistics such as counts or mean values. To the best of our knowledge, no existing work tackles more complex data mining tasks such as heavy hitter discovery over set-valued data. In this paper, we present a systematic study of heavy hitter mining under LDP. We first review existing solutions, extend them to the heavy hitter estimation, and explain why their effectiveness is limited. We then propose LDPMiner, a two-phase mechanism for obtaining accurate heavy hitters with LDP. The main idea is to first gather a candidate set of heavy hitters using a portion of the privacy budget, and focus the remaining budget on refining the candidate set in a second phase, which is much more efficient budget-wise than obtaining the heavy hitters directly from the whole dataset. We provide both in-depth theoretical analysis and extensive experiments to compare LDPMiner against adaptations of previous solutions. The results show that LDPMiner significantly improves over existing methods. More importantly, LDPMiner successfully identifies the majority true heavy hitters in practical settings.

To, Hien, Nguyen, Kien, Shahabi, Cyrus.  2016.  Differentially Private Publication of Location Entropy. Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. :35:1–35:10.

Location entropy (LE) is a popular metric for measuring the popularity of various locations (e.g., points-of-interest). Unlike other metrics computed from only the number of (unique) visits to a location, namely frequency, LE also captures the diversity of the users' visits, and is thus more accurate than other metrics. Current solutions for computing LE require full access to the past visits of users to locations, which poses privacy threats. This paper discusses, for the first time, the problem of perturbing location entropy for a set of locations according to differential privacy. The problem is challenging because removing a single user from the dataset will impact multiple records of the database; i.e., all the visits made by that user to various locations. Towards this end, we first derive non-trivial, tight bounds for both local and global sensitivity of LE, and show that to satisfy ε-differential privacy, a large amount of noise must be introduced, rendering the published results useless. Hence, we propose a thresholding technique to limit the number of users' visits, which significantly reduces the perturbation error but introduces an approximation error. To achieve better utility, we extend the technique by adopting two weaker notions of privacy: smooth sensitivity (slightly weaker) and crowd-blending (strictly weaker). Extensive experiments on synthetic and real-world datasets show that our proposed techniques preserve original data distribution without compromising location privacy.

Cuff, Paul, Yu, Lanqing.  2016.  Differential Privacy As a Mutual Information Constraint. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :43–54.

Differential privacy is a precise mathematical constraint meant to ensure privacy of individual pieces of information in a database even while queries are being answered about the aggregate. Intuitively, one must come to terms with what differential privacy does and does not guarantee. For example, the definition prevents a strong adversary who knows all but one entry in the database from further inferring about the last one. This strong adversary assumption can be overlooked, resulting in misinterpretation of the privacy guarantee of differential privacy. Herein we give an equivalent definition of privacy using mutual information that makes plain some of the subtleties of differential privacy. The mutual-information differential privacy is in fact sandwiched between ε-differential privacy and (ε,δ)-differential privacy in terms of its strength. In contrast to previous works using unconditional mutual information, differential privacy is fundamentally related to conditional mutual information, accompanied by a maximization over the database distribution. The conceptual advantage of using mutual information, aside from yielding a simpler and more intuitive definition of differential privacy, is that its properties are well understood. Several properties of differential privacy are easily verified for the mutual information alternative, such as composition theorems.

Hay, Michael, Machanavajjhala, Ashwin, Miklau, Gerome, Chen, Yan, Zhang, Dan.  2016.  Principled Evaluation of Differentially Private Algorithms Using DPBench. Proceedings of the 2016 International Conference on Management of Data. :139–154.

Differential privacy has become the dominant standard in the research community for strong privacy protection. There has been a flood of research into query answering algorithms that meet this standard. Algorithms are becoming increasingly complex, and in particular, the performance of many emerging algorithms is data dependent, meaning the distribution of the noise added to query answers may change depending on the input data. Theoretical analysis typically only considers the worst case, making empirical study of average case performance increasingly important. In this paper we propose a set of evaluation principles which we argue are essential for sound evaluation. Based on these principles we propose DPBench, a novel evaluation framework for standardized evaluation of privacy algorithms. We then apply our benchmark to evaluate algorithms for answering 1- and 2-dimensional range queries. The result is a thorough empirical study of 15 published algorithms on a total of 27 datasets that offers new insights into algorithm behavior–-in particular the influence of dataset scale and shape–-and a more complete characterization of the state of the art. Our methodology is able to resolve inconsistencies in prior empirical studies and place algorithm performance in context through comparison to simple baselines. Finally, we pose open research questions which we hope will guide future algorithm design.

Krishnan, Sanjay, Wang, Jiannan, Franklin, Michael J., Goldberg, Ken, Kraska, Tim.  2016.  PrivateClean: Data Cleaning and Differential Privacy. Proceedings of the 2016 International Conference on Management of Data. :937–951.

Recent advances in differential privacy make it possible to guarantee user privacy while preserving the main characteristics of the data. However, most differential privacy mechanisms assume that the underlying dataset is clean. This paper explores the link between data cleaning and differential privacy in a framework we call PrivateClean. PrivateClean includes a technique for creating private datasets of numerical and discrete-valued attributes, a formalism for privacy-preserving data cleaning, and techniques for answering sum, count, and avg queries after cleaning. We show: (1) how the degree of privacy affects subsequent aggregate query accuracy, (2) how privacy potentially amplifies certain types of errors in a dataset, and (3) how this analysis can be used to tune the degree of privacy. The key insight is to maintain a bipartite graph relating dirty values to clean values and use this graph to estimate biases due to the interaction between cleaning and privacy. We validate these results on four datasets with a variety of well-studied cleaning techniques including using functional dependencies, outlier filtering, and resolving inconsistent attributes.

2017-05-19
Nahshon, Yoav, Peterfreund, Liat, Vansummeren, Stijn.  2016.  Incorporating Information Extraction in the Relational Database Model. Proceedings of the 19th International Workshop on Web and Databases. :6:1–6:7.

Modern information extraction pipelines are typically constructed by (1) loading textual data from a database into a special-purpose application, (2) applying a myriad of text-analytics functions to the text, which produce a structured relational table, and (3) storing this table in a database. Obviously, this approach can lead to laborious development processes, complex and tangled programs, and inefficient control flows. Towards solving these deficiencies, we embark on an effort to lay the foundations of a new generation of text-centric database management systems. Concretely, we extend the relational model by incorporating into it the theory of document spanners which provides the means and methods for the model to engage the Information Extraction (IE) tasks. This extended model, called Spannerlog, provides a novel declarative method for defining and manipulating textual data, which makes possible the automation of the typical work method described above. In addition to formally defining Spannerlog and illustrating its usefulness for IE tasks, we also report on initial results concerning its expressive power.

Rheinländer, Astrid, Lehmann, Mario, Kunkel, Anja, Meier, Jörg, Leser, Ulf.  2016.  Potential and Pitfalls of Domain-Specific Information Extraction at Web Scale. Proceedings of the 2016 International Conference on Management of Data. :759–771.

In many domains, a plethora of textual information is available on the web as news reports, blog posts, community portals, etc. Information extraction (IE) is the default technique to turn unstructured text into structured fact databases, but systematically applying IE techniques to web input requires highly complex systems, starting from focused crawlers over quality assurance methods to cope with the HTML input to long pipelines of natural language processing and IE algorithms. Although a number of tools for each of these steps exists, their seamless, flexible, and scalable combination into a web scale end-to-end text analytics system still is a true challenge. In this paper, we report our experiences from building such a system for comparing the "web view" on health related topics with that derived from a controlled scientific corpus, i.e., Medline. The system combines a focused crawler, applying shallow text analysis and classification to maintain focus, with a sophisticated text analytic engine inside the Big Data processing system Stratosphere. We describe a practical approach to seed generation which led us crawl a corpus of \textasciitilde1 TB web pages highly enriched for the biomedical domain. Pages were run through a complex pipeline of best-of-breed tools for a multitude of necessary tasks, such as HTML repair, boilerplate detection, sentence detection, linguistic annotation, parsing, and eventually named entity recognition for several types of entities. Results are compared with those from running the same pipeline (without the web-related tasks) on a corpus of 24 million scientific abstracts and a third corpus made of \textasciitilde250K scientific full texts. We evaluate scalability, quality, and robustness of the employed methods and tools. The focus of this paper is to provide a large, real-life use case to inspire future research into robust, easy-to-use, and scalable methods for domain-specific IE at web scale.

Sheeba, J. I., Devaneyan, S. Pradeep.  2016.  Recommendation of Keywords Using Swarm Intelligence Techniques. Proceedings of the International Conference on Informatics and Analytics. :8:1–8:5.

Text mining has developed and emerged as an essential tool for revealing the hidden value in the data. Text mining is an emerging technique for companies around the world and suitable for large enduring analyses and discrete investigations. Since there is a need to track disrupting technologies, explore internal knowledge bases or review enormous data sets. Most of the information produced due to conversation transcripts is an unstructured format. These data have ambiguity, redundancy, duplications, typological errors and many more. The processing and analysis of these unstructured data are difficult task. But, there are several techniques in text mining are available to extract keywords from these unstructured conversation transcripts. Keyword Extraction is the process of examining the most significant word in the context which helps to take decisions in a much faster manner. The main objective of the proposed work is extracting the keywords from meeting transcripts by using the Swarm Intelligence (SI) techniques. Here Stochastic Diffusion Search (SDS) algorithm is used for keyword extraction and Firefly algorithm used for clustering. These techniques will be implemented for an extensive range of optimization problems and produced better results when compared with existing technique.

Menzies, Tim.  2016.  How Not to Do It: Anti-patterns for Data Science in Software Engineering. Proceedings of the 38th International Conference on Software Engineering Companion. :887–887.

Many books and papers describe how to do data science. While those texts are useful, it can also be important to reflect on anti-patterns; i.e. common classes of errors seen when large communities of researchers and commercial software engineers use, and misuse data mining tools. This technical briefing will present those errors and show how to avoid them.

Francis, Leena Mary, Visalatchi, K. C., Sreenath, N..  2016.  End to End Text Recognition from Natural Scene. Proceedings of the International Conference on Informatics and Analytics. :44:1–44:5.

The web world is been flooded with multi-media sources such as images, videos, animations and audios, which has in turn made the computer vision researchers to focus over extracting the content from the sources. Scene text recognition basically involves two major steps namely Text Localization and Text Recognition. This paper provides end-to-end text recognition approach to extract the characters alone from the complex natural scene. Using Maximal Stable Extremal Region (MSER) the various objects are localized, using Canny Edge detection method edges are identified, further binary classification is done using Connected-Component method which segregates the text and nontext objects and finally the stroke analysis method is applied to analyse the style of the character, leading to the character recognization. The Experimental results were obtained by testing the approach over ICDAR2015 dataset, wherein text was able to be recognized from most of the scene images with good precision value.

Wang, Xiangru, Nourashrafeddin, Seyednaser, Milios, Evangelos.  2016.  Relaxing Orthogonality Assumption in Conceptual Text Document Similarity. Proceedings of the 2016 ACM Symposium on Document Engineering. :69–78.

By reflecting the degree of proximity or remoteness of documents, similarity measure plays the key role in text analytics. Traditional measures, e.g. cosine similarity, assume that documents are represented in an orthogonal space formed by words as dimensions. Words are considered independent from each other and document similarity is computed based on lexical overlap. This assumption is also made in the bag of concepts representation of documents while the space is formed by concepts. This paper proposes new semantic similarity measures without relying on the orthogonality assumption. By employing Wikipedia as an external resource, we introduce five similarity measures using concept-concept relatedness. Experimental results on real text datasets reveal that eliminating the orthogonality assumption improves the quality of text clustering algorithms.

Lira, Wallace, Gama, Fernando, Barbosa, Hivana, Alves, Ronnie, de Souza, Cleidson.  2016.  VCloud: Adding Interactiveness to Word Clouds for Knowledge Exploration in Large Unstructured Texts. Proceedings of the 31st Annual ACM Symposium on Applied Computing. :193–198.

The identification of relevant information in large text databases is a challenging task. One of the reasons is human beings' limitations in handling large volumes of data. A common solution for scavenging data from texts are word clouds. A word cloud illustrates word usage in a document by resizing individual words in documents proportionally to how frequently they appear. Even though word clouds are easy to understand, they are not particularly efficient, because they are static. In addition, the presented information lacks context, i.e., words are not explained and they may lead to radically erroneous interpretations. To tackle these problems we developed VCloud, a tool that allows the user to interact with word clouds, therefore allowing informative and interactive data exploration. Furthermore, our tool also allows one to compare two data sets presented as word clouds. We evaluated VCloud using real data about the evolution of gastritis research through the years. The papers indexed by Pubmed related to this medical context were selected for visualization and data analysis using VCloud. A domain expert explored these visualizations, being able to extract useful information from it. This illustrates how can VCloud be a valuable tool for visual text analytics.

Hoque, Enamul.  2016.  Visual Text Analytics for Online Conversations: Design, Evaluation, and Applications. Companion Publication of the 21st International Conference on Intelligent User Interfaces. :122–125.

Analyzing and gaining insights from a large amount of textual conversations can be quite challenging for a user, especially when the discussions become very long. During my doctoral research, I have focused on integrating Information Visualization (InfoVis) with Natural Language Processing (NLP) techniques to better support the user's task of exploring and analyzing conversations. For this purpose, I have designed a visual text analytics system that supports the user exploration, starting from a possibly large set of conversations, then narrowing down to a subset of conversations, and eventually drilling-down to a set of comments of one conversation. While so far our approach is evaluated mainly based on lab studies, in my on-going and future work I plan to evaluate our approach via online longitudinal studies.

Gupta, Dhruv.  2016.  Event Search and Analytics: Detecting Events in Semantically Annotated Corpora for Search & Analytics. Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. :705–705.

In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as named entities, geographic locations, and temporal expressions can help us mine events from the given corpora. These events thus provide us with useful means to discover the locked knowledge in them. I pose three problems that can help unlock this knowledge vault in semantically annotated text corpora: i. identifying important events; ii. semantic search; iii. and event analytics.

Hoque, Enamul, Carenini, Giuseppe.  2016.  MultiConVis: A Visual Text Analytics System for Exploring a Collection of Online Conversations. Proceedings of the 21st International Conference on Intelligent User Interfaces. :96–107.

Online conversations, such as blogs, provide rich amount of information and opinions about popular queries. Given a query, traditional blog sites return a set of conversations often consisting of thousands of comments with complex thread structure. Since the interfaces of these blog sites do not provide any overview of the data, it becomes very difficult for the user to explore and analyze such a large amount of conversational data. In this paper, we present MultiConVis, a visual text analytics system designed to support the exploration of a collection of online conversations. Our system tightly integrates NLP techniques for topic modeling and sentiment analysis with information visualizations, by considering the unique characteristics of online conversations. The resulting interface supports the user exploration, starting from a possibly large set of conversations, then narrowing down to the subset of conversations, and eventually drilling-down to the set of comments of one conversation. Our evaluations through case studies with domain experts and a formal user study with regular blog readers illustrate the potential benefits of our approach, when compared to a traditional blog reading interface.

Bhatia, Jaspreet, Breaux, Travis D., Friedberg, Liora, Hibshi, Hanan, Smullen, Daniel.  2016.  Privacy Risk in Cybersecurity Data Sharing. Proceedings of the 2016 ACM on Workshop on Information Sharing and Collaborative Security. :57–64.

As information systems become increasingly interdependent, there is an increased need to share cybersecurity data across government agencies and companies, and within and across industrial sectors. This sharing includes threat, vulnerability and incident reporting data, among other data. For cyberattacks that include sociotechnical vectors, such as phishing or watering hole attacks, this increased sharing could expose customer and employee personal data to increased privacy risk. In the US, privacy risk arises when the government voluntarily receives data from companies without meaningful consent from individuals, or without a lawful procedure that protects an individual's right to due process. In this paper, we describe a study to examine the trade-off between the need for potentially sensitive data, which we call incident data usage, and the perceived privacy risk of sharing that data with the government. The study is comprised of two parts: a data usage estimate built from a survey of 76 security professionals with mean eight years' experience; and a privacy risk estimate that measures privacy risk using an ordinal likelihood scale and nominal data types in factorial vignettes. The privacy risk estimate also factors in data purposes with different levels of societal benefit, including terrorism, imminent threat of death, economic harm, and loss of intellectual property. The results show which data types are high-usage, low-risk versus those that are low-usage, high-risk. We discuss the implications of these results and recommend future work to improve privacy when data must be shared despite the increased risk to privacy.

Ahmed, Irfan, Roussev, Vassil, Johnson, William, Senthivel, Saranyan, Sudhakaran, Sneha.  2016.  A SCADA System Testbed for Cybersecurity and Forensic Research and Pedagogy. Proceedings of the 2Nd Annual Industrial Control System Security Workshop. :1–9.

This paper presents a supervisory control and data acquisition (SCADA) testbed recently built at the University of New Orleans. The testbed consists of models of three industrial physical processes: a gas pipeline, a power transmission and distribution system, and a wastewater treatment plant–these systems are fully-functional and implemented at small-scale. It utilizes real-world industrial equipment such as transformers, programmable logic controllers (PLC), aerators, etc., bringing it closer to modeling real-world SCADA systems. Sensors, actuators, and PLCs are deployed at each physical process system for local control and monitoring, and the PLCs are also connected to a computer running human-machine interface (HMI) software for monitoring the status of the physical processes. The testbed is a useful resource for cybersecurity research, forensic research, and education on different aspects of SCADA systems such as PLC programming, protocol analysis, and demonstration of cyber attacks.

Zhang, Sixuan, Yu, Liang, Wakefield, Robin L., Leidner, Dorothy E..  2016.  Friend or Foe: Cyberbullying in Social Network Sites. SIGMIS Database. 47:51–71.

As the use of social media technologies proliferates in organizations, it is important to understand the nefarious behaviors, such as cyberbullying, that may accompany such technology use and how to discourage these behaviors. We draw from neutralization theory and the criminological theory of general deterrence to develop and empirically test a research model to explain why cyberbullying may occur and how the behavior may be discouraged. We created a research model of three second-order formative constructs to examine their predictive influence on intentions to cyberbully. We used PLS- SEM to analyze the responses of 174 Facebook users in two different cyberbullying scenarios. Our model suggests that neutralization techniques enable cyberbullying behavior and while sanction certainty is an important deterrent, sanction severity appears ineffective. We discuss the theoretical and practical implications of our model and results.