Biblio

Found 3405 results

Filters: First Letter Of Last Name is H  [Clear All Filters]
2016-12-06
Nariman Mirzaei, Joshua Garcia, Hamid Bagheri, Alireza Sadeghi, Sam Malek.  2016.  Reducing Combinatorics in GUI Testing of Android Applications. ICSE '16 Proceedings of the 38th International Conference on Software Engineering. :559-570.

The rising popularity of Android and the GUI-driven nature of its apps have motivated the need for applicable automated GUI testing techniques. Although exhaustive testing of all possible combinations is the ideal upper bound in combinatorial testing, it is often infeasible, due to the combinatorial explosion of test cases. This paper presents TrimDroid, a framework for GUI testing of Android apps that uses a novel strategy to generate tests in a combinatorial, yet scalable, fashion. It is backed with automated program analysis and formally rigorous test generation engines. TrimDroid relies on program analysis to extract formal specifications. These speci- fications express the app’s behavior (i.e., control flow between the various app screens) as well as the GUI elements and their dependencies. The dependencies among the GUI elements comprising the app are used to reduce the number of combinations with the help of a solver. Our experiments have corroborated TrimDroid’s ability to achieve a comparable coverage as that possible under exhaustive GUI testing using significantly fewer test cases.

2016-12-05
Hui Shen, Ram Krishnan, Rocky Slavin, Jianwei Niu.  2016.  Sequence Diagram Aided Privacy Policy Specification. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING. 13(3)

A fundamental problem in the specification of regulatory privacy policies such as the Health Insurance Portability and Accountability Act (HIPAA) in a computer system is to state the policies precisely, consistent with their high-level intuition. In this paper, we propose UML sequence diagrams as a practical means to graphically express privacy policies. A graphical representation allows decision-makers such as application domain experts and security architects to easily verify and confirm the expected behavior. Once intuitively confirmed, our work in this article introduces an algorithmic approach to formalizing the semantics of sequence diagrams in terms of linear temporal logic (LTL) templates. In all the templates, different semantic aspects are expressed as separate, yet simple LTL formulas that can be composed to define the complex semantics of sequence diagrams. The formalization enables us to leverage the analytical powers of automated decision procedures for LTL formulas to determine if a collection of sequence diagrams is consistent, independent, etc. and also to verify if a system design conforms to the privacy policies. We evaluate our approach by modeling and analyzing a substantial subset of HIPAA rules using sequence diagrams.

2016-06-19
Victor Heorhiadi, Shriram Rajagopalan, Hani Jamjoom, Michael K. Reiter, Vyas Sekar.  2016.  Gremlin: Systematic resilience testing of microservices. 36th IEEE International Conference on Distributed Computing Systems.

Modern Internet applications are being disaggregated into a microservice-based architecture, with services being updated and deployed hundreds of times a day. The accelerated software life cycle and heterogeneity of language runtimes in a single application necessitates a new approach for testing the resiliency of these applications in production infrastructures. We present Gremlin, a framework for systematically testing the failure-handling capabilities of microservices.  Gremlin is based on the observation that microservices are loosely coupled and thus rely on standard message-exchange patterns over the network. Gremlin allows the operator to easily design tests and executes them by manipulating inter-service messages at the network layer. We show how to use Gremlin to express common failure scenarios and how developers of an enterprise application were able to discover previously unknown bugs in their failure-handling code without modifying the application.

2016-12-06
Hamid Bagheri, Alireza Sadeghi, Reyhaneh Jabbarvand, Sam Malek.  2016.  Practical, Formal Synthesis and Automatic Enforcement of Security Policies for Android. 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

As the dominant mobile computing platform, Android has become a prime target for cyber-security attacks. Many of these attacks are manifested at the application level, and through the exploitation of vulnerabilities in apps downloaded from the popular app stores. Increasingly, sophisticated attacks exploit the vulnerabilities in multiple installed apps, making it extremely difficult to foresee such attacks, as neither the app developers nor the store operators know a priori which apps will be installed together. This paper presents an approach that allows the end-users to safeguard a given bundle of apps installed on their device from such attacks. The approach, realized in a tool, called SEPAR, combines static analysis with lightweight formal methods to automatically infer security-relevant properties from a bundle of apps. It then uses a constraint solver to synthesize possible security exploits, from which fine-grained security policies are derived and automatically enforced to protect a given device. In our experiments with over 4,000 Android apps, SEPAR has proven to be highly effective at detecting previously unknown vulnerabilities as well as preventing their exploitation.

2016-04-11
Haining Chen, Omar Chowdhury, Ninghui Li, Warut Khern-Am-Nuai, Suresh Chari, Ian Molloy, Youngja Park.  2016.  Tri-Modularization of Firewall Policies. ACM Symposium on Access Control Models and Technologies (SACMAT).

Firewall policies are notorious for having misconfiguration errors which can defeat its intended purpose of protecting hosts in the network from malicious users. We believe this is because today's firewall policies are mostly monolithic. Inspired by ideas from modular programming and code refactoring, in this work we introduce three kinds of modules: primary, auxiliary, and template, which facilitate the refactoring of a firewall policy into smaller, reusable, comprehensible, and more manageable components. We present algorithms for generating each of the three modules for a given legacy firewall policy. We also develop ModFP, an automated tool for converting legacy firewall policies represented in access control list to their modularized format. With the help of ModFP, when examining several real-world policies with sizes ranging from dozens to hundreds of rules, we were able to identify subtle errors.

 

Haining Chen, Omar Chowdhury, Ninghui Li, Warut Khern-Am-Nuai, Suresh Chari, Ian Molloy, Youngja Park.  2016.  Tri-Modularization of Firewall Policies. ACM Symposium on Access Control Models and Technologies (SACMAT).

Firewall policies are notorious for having misconfiguration errors which can defeat its intended purpose of protecting hosts in the network from malicious users. We believe this is because today's firewall policies are mostly monolithic. Inspired by ideas from modular programming and code refactoring, in this work we introduce three kinds of modules: primary, auxiliary, and template, which facilitate the refactoring of a firewall policy into smaller, reusable, comprehensible, and more manageable components. We present algorithms for generating each of the three modules for a given legacy firewall policy. We also develop ModFP, an automated tool for converting legacy firewall policies represented in access control list to their modularized format. With the help of ModFP, when examining several real-world policies with sizes ranging from dozens to hundreds of rules, we were able to identify subtle errors.

2016-06-20
Mehdi Mashayekhi, Hongying Du, George F. List, Munindar P. Singh.  2016.  Silk: A Simulation Study of Regulating Open Normative Multiagent Systems. Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI). :1–7.

In a multiagent system, a (social) norm describes what the agents may expect from each other.  Norms promote autonomy (an agent need not comply with a norm) and heterogeneity (a norm describes interactions at a high level independent of implementation details). Researchers have studied norm emergence through social learning where the agents interact repeatedly in a graph structure.

In contrast, we consider norm emergence in an open system, where membership can change, and where no predetermined graph structure exists.  We propose Silk, a mechanism wherein a generator monitors interactions among member agents and recommends norms to help resolve conflicts.  Each member decides on whether to accept or reject a recommended norm.  Upon exiting the system, a member passes its experience along to incoming members of the same type.  Thus, members develop norms in a hybrid manner to resolve conflicts.

We evaluate Silk via simulation in the traffic domain.  Our results show that social norms promoting conflict resolution emerge in both moderate and selfish societies via our hybrid mechanism.

2018-05-15
Jeremy Daily, Rose Gamble, Stephen Moffitt, Connor Raines, Paul Harris, Jannah Miran, Indrakshi Ray, Subhojeet Mukherjee, Hossein Shirazi, James Johnson.  2016.  Towards a Cyber Assurance Testbed for Heavy Vehicle Electronic Controls. SAE Int. J. Commer. Veh.. 9:339-349.

AbstractCyber assurance of heavy trucks is a major concern with new designs as well as with supporting legacy systems. Many cyber security experts and analysts are used to working with traditional information technology (IT) networks and are familiar with a set of technologies that may not be directly useful in the commercial vehicle sector. To help connect security researchers to heavy trucks, a remotely accessible testbed has been prototyped for experimentation with security methodologies and techniques to evaluate and improve on existing technologies, as well as developing domain-specific technologies. The testbed relies on embedded Linux-based node controllers that can simulate the sensor inputs to various heavy vehicle electronic control units (ECUs). The node controller also monitors and affects the flow of network information between the ECUs and the vehicle communications backbone. For example, a node controller acts as a clone that generates analog wheel speed sensor data while at the same time monitors or controls the network traffic on the J1939 and J1708 networks. The architecture and functions of the node controllers are detailed. Sample interaction with the testbed is illustrated, along with a discussion of the challenges of running remote experiments. Incorporating high fidelity hardware in the testbed enables security researchers to advance the state of the art in hardening heavy vehicle ECUs against cyber-attacks. How the testbed can be used for security research is presented along with an example of its use in evaluating seed/key exchange strength and in intrusion detection systems (IDSs).

2016-12-06
Hanan Hibshi, Travis Breaux, Maria Riaz, Laurie Williams.  2016.  A grounded analysis of experts’ decision-making during security assessments. Journal of Cybersecurity Advance Access .

Security analysis requires specialized knowledge to align threats and vulnerabilities in information technology. To identify mitigations, analysts need to understand how threats, vulnerabilities, and mitigations are composed together to yield security requirements. Despite abundant guidance in the form of checklists and controls about how to secure systems, evidence suggests that security experts do not apply these checklists. Instead, they rely on their prior knowledge and experience to identify security vulnerabilities. To better understand the different effects of checklists, design analysis, and expertise, we conducted a series of interviews to capture and encode the decisionmaking process of security experts and novices during three security analysis exercises. Participants were asked to analyze three kinds of artifacts: source code, data flow diagrams, and network diagrams, for vulnerabilities, and then to apply a requirements checklist to demonstrate their ability to mitigate vulnerabilities. We framed our study using Situation Awareness, which is a theory about human perception that was used to elicit interviewee responses. The responses were then analyzed using coding theory and grounded analysis. Our results include decision-making patterns that characterize how analysts perceive, comprehend, and project future threats against a system, and how these patterns relate to selecting security mitigations. Based on this analysis, we discovered new theory to measure how security experts and novices apply attack models and how structured and unstructured analysis enables increasing security requirements coverage. We highlight the role of expertise level and requirements composition in affecting security decision-making and we discuss how our method produced new hypotheses about security analysis and decisionmaking.

2016-12-07
Jaspreet Bhatia, Travis Breaux, Liora Friedberg, Hanan Hibshi, Daniel Smullen.  2016.  Privacy Risk in Cybersecurity Data Sharing. WISCS '16 Proceedings of the 2016 ACM on Workshop on Information Sharing and Collaborative Security.

As information systems become increasingly interdependent, there is an increased need to share cybersecurity data across government agencies and companies, and within and across industrial sectors. This sharing includes threat, vulnerability and incident reporting data, among other data. For cyberattacks that include sociotechnical vectors, such as phishing or watering hole attacks, this increased sharing could expose customer and employee personal data to increased privacy risk. In the US, privacy risk arises when the government voluntarily receives data from companies without meaningful consent from individuals, or without a lawful procedure that protects an individual's right to due process. In this paper, we describe a study to examine the trade-off between the need for potentially sensitive data, which we call incident data usage, and the perceived privacy risk of sharing that data with the government. The study is comprised of two parts: a data usage estimate built from a survey of 76 security professionals with mean eight years' experience; and a privacy risk estimate that measures privacy risk using an ordinal likelihood scale and nominal data types in factorial vignettes. The privacy risk estimate also factors in data purposes with different levels of societal benefit, including terrorism, imminent threat of death, economic harm, and loss of intellectual property. The results show which data types are high-usage, low-risk versus those that are low-usage, high-risk. We discuss the implications of these results and recommend future work to improve privacy when data must be shared despite the increased risk to privacy.

2017-01-09
Alireza Sadeghi, Hamid Bagheri, Joshua Garcia, Sam Malek.  2016.  A Taxonomy and Qualitative Comparison of Program Analysis Techniques for Security Assessment of Android Software. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING. 99

In parallel with the meteoric rise of mobile software, we are witnessing an alarming escalation in the number and sophistication of the security threats targeted at mobile platforms, particularly Android, as the dominant platform. While existing research has made significant progress towards detection and mitigation of Android security, gaps and challenges remain. This paper contributes a comprehensive taxonomy to classify and characterize the state-of-the-art research in this area. We have carefully followed the systematic literature review process, and analyzed the results of more than 300 research papers, resulting in the most comprehensive and elaborate investigation of the literature in this area of research. The systematic analysis of the research literature has revealed patterns, trends, and gaps in

2016-12-06
Bradley Schmerl, Jeffrey Gennari, Alireza Sadeghi, Hamid Bagheri, Sam Malek, Javier Camara, David Garlan.  2016.  Architecture Modeling and Analysis of Security in Android Systems. 10th European Conference on Software Architecture (ECSA 2016).

Software architecture modeling is important for analyzing system quality attributes, particularly security. However, such analyses often assume that the architecture is completely known in advance. In many modern domains, especially those that use plugin-based frameworks, it is not possible to have such a complete model because the software system continuously changes. The Android mobile operating system is one such framework, where users can install and uninstall apps at run time. We need ways to model and analyze such architectures that strike a balance between supporting the dynamism of the underlying platforms and enabling analysis, particularly throughout a system’s lifetime. In this paper, we describe a formal architecture style that captures the modifiable architectures of Android systems, and that supports security analysis as a system evolves. We illustrate the use of the style with two security analyses: a predicatebased approach defined over architectural structure that can detect some common security vulnerabilities, and inter-app permission leakage determined by model checking. We also show how the evolving architecture of an Android device can be obtained by analysis of the apps on a device, and provide some performance evaluation that indicates that the architecture can be amenable for use throughout the system’s lifetime. 

Hamid Bagheri, Sam Malek.  2016.  Titanium: Efficient Analysis of Evolving Alloy Specifications. FSE 2016: ACM SIGSOFT International Symposium on the Foundations of Software.

The Alloy specification language, and the corresponding Alloy Analyzer, have received much attention in the last two decades with applications in many areas of software engineering. Increasingly, formal analyses enabled by Alloy are desired for use in an on-line mode, where the specifications are automatically kept in sync with the running, possibly changing, software system. However, given Alloy Analyzer’s reliance on computationally expensive SAT solvers, an important challenge is the time it takes for such analyses to execute at runtime. The fact that in an on-line mode, the analyses are often repeated on slightly revised versions of a given specification, presents us with an opportunity to tackle this challenge. We present Titanium, an extension of Alloy for formal analysis of evolving specifications. By leveraging the results from previous analyses, Titanium narrows the state space of the revised specification, thereby greatly reducing the required computational effort. We describe the semantic basis of Titanium in terms of models specified in relational logic. We show how the approach can be realized atop an existing relational logic model finder. Our experimental results show Titanium achieves a significant speed-up over Alloy Analyzer when applied to the analysis of evolving specifications.

2016-12-08
Hanan Hibshi, Travis Breaux, Christian Wagner.  2016.  Improving Security Requirements Adequacy An Interval Type 2 Fuzzy Logic Security Assessment System. 2016 IEEE Symposium Series on Computational Intelligence .

Organizations rely on security experts to improve the security of their systems. These professionals use background knowledge and experience to align known threats and vulnerabilities before selecting mitigation options. The substantial depth of expertise in any one area (e.g., databases, networks, operating systems) precludes the possibility that an expert would have complete knowledge about all threats and vulnerabilities. To begin addressing this problem of distributed knowledge, we investigate the challenge of developing a security requirements rule base that mimics human expert reasoning to enable new decision-support systems. In this paper, we show how to collect relevant information from cyber security experts to enable the generation of: (1) interval type-2 fuzzy sets that capture intra- and inter-expert uncertainty around vulnerability levels; and (2) fuzzy logic rules underpinning the decision-making process within the requirements analysis. The proposed method relies on comparative ratings of security requirements in the context of concrete vignettes, providing a novel, interdisciplinary approach to knowledge generation for fuzzy logic systems. The proposed approach is tested by evaluating 52 scenarios with 13 experts to compare their assessments to those of the fuzzy logic decision support system. The initial results show that the system provides reliable assessments to the security analysts, in particular, generating more conservative assessments in 19% of the test scenarios compared to the experts’ ratings. 

2016-04-25
Junjie Qian, Witawas Srisa-an, Hong Jiang, Sharad Seth, Du Li, Pan Yi.  2016.  Exploiting FIFO Scheduler to Improve Parallel Garbage Collection Performance.. VEE '16 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments.

Recent studies have found that parallel garbage collection performs worse with more CPUs and more collector threads. As part of this work, we further investigate this enomenon and find that poor scalability is worst in highly scalable Java applications. Our investigation to find the causes clearly reveals that efficient multi-threading in an application can prolong the average object lifespan, which results in less effective garbage collection. We also find that prolonging lifespan is the direct result of Linux's Completely Fair Scheduler due to its round-robin like behavior that can increase the heap contention between the application threads. Instead, if we use pseudo first-in-first-out to schedule application threads in large multicore systems, the garbage collection scalability is significantly improved while the time spent in garbage collection is reduced by as much as 21%. The average execution time of the 24 Java applications used in our study is also reduced by 11%. Based on this observation, we propose two approaches to optimally select scheduling policies based on application scalability profile. Our first approach uses the profile information from one execution to tune the subsequent executions. Our second approach dynamically collects profile information and performs policy selection during execution.

Hemank Lamba, Thomas J. Glazier, Bradley Schmerl, Javier Camara, David Garlan, Jurgen Pfeffer.  2016.  A Model-based Approach to Anomaly Detection in Software Architectures. Symposium and Bootcamp on the Science of Security (HotSoS).

In an organization, the interactions users have with software leave patterns or traces of the parts of the systems accessed. These interactions can be associated with the underlying software architecture. The first step in detecting problems like insider threat is to detect those traces that are anomalous. Here, we propose a method to find anomalous users leveraging these interaction traces, categorized by user roles. We propose a model based approach to cluster user sequences and find outliers. We show that the approach works on a simulation of a large scale system based on and Amazon Web application style.

2017-11-27
Hong, M. Q., Wang, P. Y., Zhao, W. B..  2016.  Homomorphic Encryption Scheme Based on Elliptic Curve Cryptography for Privacy Protection of Cloud Computing. 2016 IEEE 2nd International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS). :152–157.

Cloud computing is becoming the main computing model in the future due to its advantages such as high resource utilization rate and save high cost of performance. The public environments is become necessary to secure their storage and transmission against possible attacks such as known-plain-text attack and semantic security. How to ensure the data security and the privacy preserving, however, becomes a huge obstacle to its development. The traditional way to solve Secure Multiparty Computation (SMC) problem is using Trusted Third Party (TTP), however, TTPs are particularly hard to achieve and compute complexity. To protect user's privacy data, the encrypted outsourcing data are generally stored and processed in cloud computing by applying homomorphic encryption. According to above situation, we propose Elliptic Curve Cryptography (ECC) based homomorphic encryption scheme for SMC problem that is dramatically reduced computation and communication cost. It shows that the scheme has advantages in energy consumption, communication consumption and privacy protection through the comparison experiment between ECC based homomorphic encryption and RSA&Paillier encryption algorithm. Further evidence, the scheme of homomorphic encryption scheme based on ECC is applied to the calculation of GPS data of the earthquake and prove it is proved that the scheme is feasible, excellent encryption effect and high security.

2018-05-16
Y. Jiang, Y. Yang, H. Liu, H. Kong, M. Gu, J. Sun, L. Sha.  2016.  From Stateflow Simulation to Verified Implementation: A Verification Approach and A Real-Time Train Controller Design. 2016 IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS). :1-11.
2018-01-10
Hamamreh, J. M., Yusuf, M., Baykas, T., Arslan, H..  2016.  Cross MAC/PHY layer security design using ARQ with MRC and adaptive modulation. 2016 IEEE Wireless Communications and Networking Conference. :1–7.

In this work, Automatic-Repeat-Request (ARQ) and Maximal Ratio Combination (MRC), have been jointly exploited to enhance the confidentiality of wireless services requested by a legitimate user (Bob) against an eavesdropper (Eve). The obtained security performance is analyzed using Packet Error Rate (PER), where the exact PER gap between Bob and Eve is determined. PER is proposed as a new practical security metric in cross layers (Physical/MAC) security design since it reflects the influence of upper layers mechanisms, and it can be linked with Quality of Service (QoS) requirements for various digital services such as voice and video. Exact PER formulas for both Eve and Bob in i.i.d Rayleigh fading channel are derived. The simulation and theoretical results show that the employment of ARQ mechanism and MRC on a signal level basis before demodulation can significantly enhance data security for certain services at specific SNRs. However, to increase and ensure the security of a specific service at any SNR, adaptive modulation is proposed to be used along with the aforementioned scheme. Analytical and simulation studies demonstrate orders of magnitude difference in PER performance between eavesdroppers and intended receivers.

2017-03-20
Shahriar, Hossain, Haddad, Hisham.  2016.  Object Injection Vulnerability Discovery Based on Latent Semantic Indexing. Proceedings of the 31st Annual ACM Symposium on Applied Computing. :801–807.

Object Injection Vulnerability (OIV) is an emerging threat for web applications. It involves accepting external inputs during deserialization operation and use the inputs for sensitive operations such as file access, modification, and deletion. The challenge is the automation of the detection process. When the application size is large, it becomes hard to perform traditional approaches such as data flow analysis. Recent approaches fall short of narrowing down the list of source files to aid developers in discovering OIV and the flexibility to check for the presence of OIV through various known APIs. In this work, we address these limitations by exploring a concept borrowed from the information retrieval domain called Latent Semantic Indexing (LSI) to discover OIV. The approach analyzes application source code and builds an initial term document matrix which is then transformed systematically using singular value decomposition to reduce the search space. The approach identifies a small set of documents (source files) that are likely responsible for OIVs. We apply the LSI concept to three open source PHP applications that have been reported to contain OIVs. Our initial evaluation results suggest that the proposed LSI-based approach can identify OIVs and identify new vulnerabilities.

2017-07-24
Hibshi, Hanan.  2016.  Systematic Analysis of Qualitative Data in Security. Proceedings of the Symposium and Bootcamp on the Science of Security. :52–52.

This tutorial will introduce participants to Grounded Theory, which is a qualitative framework to discover new theory from an empirical analysis of data. This form of analysis is particularly useful when analyzing text, audio or video artifacts that lack structure, but contain rich descriptions. We will frame Grounded Theory in the context of qualitative methods and case studies, which complement quantitative methods, such as controlled experiments and simulations. We will contrast the approaches developed by Glaser and Strauss, and introduce coding theory - the most prominent qualitative method for performing analysis to discover Grounded Theory. Topics include coding frames, first- and second-cycle coding, and saturation. We will use examples from security interview scripts to teach participants: developing a coding frame, coding a source document to discover relationships in the data, developing heuristics to resolve ambiguities between codes, and performing second-cycle coding to discover relationships within categories. Then, participants will learn how to discover theory from coded data. Participants will further learn about inter-rater reliability statistics, including Cohen's and Fleiss' Kappa, Krippendorf's Alpha, and Vanbelle's Index. Finally, we will review how to present Grounded Theory results in publications, including how to describe the methodology, report observations, and describe threats to validity.

2017-10-10
Yuan, Ganzhao, Yang, Yin, Zhang, Zhenjie, Hao, Zhifeng.  2016.  Convex Optimization for Linear Query Processing Under Approximate Differential Privacy. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :2005–2014.

Differential privacy enables organizations to collect accurate aggregates over sensitive data with strong, rigorous guarantees on individuals' privacy. Previous work has found that under differential privacy, computing multiple correlated aggregates as a batch, using an appropriate strategy, may yield higher accuracy than computing each of them independently. However, finding the best strategy that maximizes result accuracy is non-trivial, as it involves solving a complex constrained optimization program that appears to be non-convex. Hence, in the past much effort has been devoted in solving this non-convex optimization program. Existing approaches include various sophisticated heuristics and expensive numerical solutions. None of them, however, guarantees to find the optimal solution of this optimization problem. This paper points out that under (ε, ཬ)-differential privacy, the optimal solution of the above constrained optimization problem in search of a suitable strategy can be found, rather surprisingly, by solving a simple and elegant convex optimization program. Then, we propose an efficient algorithm based on Newton's method, which we prove to always converge to the optimal solution with linear global convergence rate and quadratic local convergence rate. Empirical evaluations demonstrate the accuracy and efficiency of the proposed solution.

2017-09-19
Hyun, Yoonjin, Kim, Namgyu.  2016.  Detecting Blog Spam Hashtags Using Topic Modeling. Proceedings of the 18th Annual International Conference on Electronic Commerce: E-Commerce in Smart Connected World. :43:1–43:6.

Tremendous amounts of data are generated daily. Accordingly, unstructured text data that is distributed through news, blogs, and social media has gained much attention from many researchers as this data contains abundant information about various consumers' opinions. However, as the usefulness of text data is increasing, attempts to gain profits by distorting text data maliciously or non-maliciously are also increasing. In this sense, various types of spam detection techniques have been studied to prevent the side effects of spamming. The most representative studies include e-mail spam detection, web spam detection, and opinion spam detection. "Spam" is recognized on the basis of three characteristics and actions: (1) if a certain user is recognized as a spammer, then all content created by that user should be recognized as spam; (2) if certain content is exposed to other users (regardless of the users' intention), then content is recognized as spam; and (3) any content that contains malicious or non-malicious false information is recognized as spam. Many studies have been performed to solve type (1) and type (2) spamming by analyzing various metadata, such as user networks and spam words. In the case of type (3), however, relatively few studies have been conducted because it is difficult to determine the veracity of a certain word or information. In this study, we regard a hashtag that is irrelevant to the content of a blog post as spam and devise a methodology to detect such spam hashtags.

2017-05-18
Lin, Jerry Chun-Wei, Liu, Qiankun, Fournier-Viger, Philippe, Hong, Tzung-Pei, Zhan, Justin, Voznak, Miroslav.  2016.  An Efficient Anonymous System for Transaction Data. Proceedings of the The 3rd Multidisciplinary International Social Networks Conference on SocialInformatics 2016, Data Science 2016. :28:1–28:6.

k-anonymity is an efficient way to anonymize the relational data to protect privacy against re-identification attacks. For the purpose of k-anonymity on transaction data, each item is considered as the quasi-identifier attribute, thus increasing high dimension problem as well as the computational complexity and information loss for anonymity. In this paper, an efficient anonymity system is designed to not only anonymize transaction data with lower information loss but also reduce the computational complexity for anonymity. An extensive experiment is carried to show the efficiency of the designed approach compared to the state-of-the-art algorithms for anonymity in terms of runtime and information loss. Experimental results indicate that the proposed anonymous system outperforms the compared algorithms in all respects.

2017-03-07
Madaio, Michael, Chen, Shang-Tse, Haimson, Oliver L., Zhang, Wenwen, Cheng, Xiang, Hinds-Aldrich, Matthew, Chau, Duen Horng, Dilkina, Bistra.  2016.  Firebird: Predicting Fire Risk and Prioritizing Fire Inspections in Atlanta. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :185–194.

The Atlanta Fire Rescue Department (AFRD), like many municipal fire departments, actively works to reduce fire risk by inspecting commercial properties for potential hazards and fire code violations. However, AFRD's fire inspection practices relied on tradition and intuition, with no existing data-driven process for prioritizing fire inspections or identifying new properties requiring inspection. In collaboration with AFRD, we developed the Firebird framework to help municipal fire departments identify and prioritize commercial property fire inspections, using machine learning, geocoding, and information visualization. Firebird computes fire risk scores for over 5,000 buildings in the city, with true positive rates of up to 71% in predicting fires. It has identified 6,096 new potential commercial properties to inspect, based on AFRD's criteria for inspection. Furthermore, through an interactive map, Firebird integrates and visualizes fire incidents, property information and risk scores to help AFRD make informed decisions about fire inspections. Firebird has already begun to make positive impact at both local and national levels. It is improving AFRD's inspection processes and Atlanta residents' safety, and was highlighted by National Fire Protection Association (NFPA) as a best practice for using data to inform fire inspections.