Visible to the public Biblio

Found 1897 results

Filters: Keyword is compositionality  [Clear All Filters]
2017-06-27
Jordan, Michael I..  2016.  On Computational Thinking, Inferential Thinking and Data Science. Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. :47–47.

The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as "runtime" in core statistical theory and the lack of a role for statistical concepts such as "risk" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.

2017-05-22
Suzuki, Kenichi, Kiselyov, Oleg, Kameyama, Yukiyoshi.  2016.  Finally, Safely-extensible and Efficient Language-integrated Query. Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation. :37–48.

Language-integrated query is an embedding of database queries into a host language to code queries at a higher level than the all-to-common concatenation of strings of SQL fragments. The eventually produced SQL is ensured to be well-formed and well-typed, and hence free from the embarrassing (security) problems. Language-integrated query takes advantage of the host language's functional and modular abstractions to compose and reuse queries and build query libraries. Furthermore, language-integrated query systems like T-LINQ generate efficient SQL, by applying a number of program transformations to the embedded query. Alas, the set of transformation rules is not designed to be extensible. We demonstrate a new technique of integrating database queries into a typed functional programming language, so to write well-typed, composable queries and execute them efficiently on any SQL back-end as well as on an in-memory noSQL store. A distinct feature of our framework is that both the query language as well as the transformation rules needed to generate efficient SQL are safely user-extensible, to account for many variations in the SQL back-ends, as well for domain-specific knowledge. The transformation rules are guaranteed to be type-preserving and hygienic by their very construction. They can be built from separately developed and reusable parts and arbitrarily composed into optimization pipelines. With this technique we have embedded into OCaml a relational query language that supports a very large subset of SQL including grouping and aggregation. Its types cover the complete set of intricate SQL behaviors.

2017-05-19
Green, Benjamin, Krotofil, Marina, Hutchison, David.  2016.  Achieving ICS Resilience and Security Through Granular Data Flow Management. Proceedings of the 2Nd ACM Workshop on Cyber-Physical Systems Security and Privacy. :93–101.

Modern Industrial Control Systems (ICS) rely on enterprise to plant floor connectivity. Where the size, diversity, and therefore complexity of ICS increase, operational requirements, goals, and challenges defined by users across various sub-systems follow. Recent trends in Information Technology (IT) and Operational Technology (OT) convergence may cause operators to lose a comprehensive understanding of end-to-end data flow requirements. This presents a risk to system security and resilience. Sensors were once solely applied for operational process use, but now act as inputs supporting a diverse set of organisational requirements. If these are not fully understood, incomplete risk assessment, and inappropriate implementation of security controls could occur. In search of a solution, operators may turn to standards and guidelines. This paper reviews popular standards and guidelines, prior to the presentation of a case study and conceptual tool, highlighting the importance of data flows, critical data processing points, and system-to-user relationships. The proposed approach forms a basis for risk assessment and security control implementation, aiding the evolution of ICS security and resilience.

Wadhawan, Yatin, Neuman, Clifford.  2016.  Evaluating Resilience of Gas Pipeline Systems Under Cyber-Physical Attacks: A Function-Based Methodology. Proceedings of the 2Nd ACM Workshop on Cyber-Physical Systems Security and Privacy. :71–80.

In this research paper, we present a function-based methodology to evaluate the resilience of gas pipeline systems under two different cyber-physical attack scenarios. The first attack scenario is the pressure integrity attack on the natural gas high-pressure transmission pipeline. Through simulations, we have analyzed the cyber attacks that propagate from cyber to the gas pipeline physical domain, the time before which the SCADA system should respond to such attacks, and finally, an attack which prevents the response of the system. We have used the combined results of simulations of a wireless mesh network for remote terminal units and of a gas pipeline simulation to measure the shortest Time to Criticality (TTC) parameter; the time for an event to reach the failure state. The second attack scenario describes how a failure of a cyber node controlling power grid functionality propagates from cyber to power to gas pipeline systems. We formulate this problem using a graph-theoretic approach and quantify the resilience of the networks by percentage of connected nodes and the length of the shortest path between them. The results show that parameters such as TTC, power distribution capacity of the power grid nodes and percentage of the type of cyber nodes compromised, regulate the efficiency and resilience of the power and gas networks. The analysis of such attack scenarios helps the gas pipeline system administrators design attack remediation algorithms and improve the response of the system to an attack.

Thakur, Gautam S., Kuruganti, Teja, Bobrek, Miljko, Killough, Stephen, Nutaro, James, Liu, Cheng, Lu, Wei.  2016.  Real-time Urban Population Monitoring Using Pervasive Sensor Network. Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems. :57:1–57:4.

It is estimated that 50% of the global population lives in urban areas occupying just 0.4% of the Earth's surface. Understanding urban activity constitutes monitoring population density and its changes over time, in urban environments. Currently, there are limited mechanisms to non-intrusively monitor population density in real-time. The pervasive use of cellular phones in urban areas is one such mechanism that provides a unique opportunity to study population density by monitoring the mobility patterns in near real-time. Cellular carriers such as AT&T harvest such data through their cell towers; however, this data is proprietary and the carriers restrict access, due to privacy concerns. In this work, we propose a system that passively senses the population density and infers mobility patterns in an urban area by monitoring power spectral density in cellular frequency bands using periodic beacons from each cellphone without knowing who and where they are located. A wireless sensor network platform is being developed to perform spectral monitoring along with environmental measurements. Algorithms are developed to generate real-time fine-resolution population estimates.

Jansen, Kai, Tippenhauer, Nils Ole, Pöpper, Christina.  2016.  Multi-receiver GPS Spoofing Detection: Error Models and Realization. Proceedings of the 32Nd Annual Conference on Computer Security Applications. :237–250.

Spoofing is a serious threat to the widespread use of Global Navigation Satellite Systems (GNSSs) such as GPS and can be expected to play an important role in the security of many future IoT systems that rely on time, location, or navigation information. In this paper, we focus on the technique of multi-receiver GPS spoofing detection, so far only proposed theoretically. This technique promises to detect malicious spoofing signals by making use of the reported positions of several GPS receivers deployed in a fixed constellation. We scrutinize the assumptions of prior work, in particular the error models, and investigate how these models and their results can be improved due to the correlation of errors at co-located receiver positions. We show that by leveraging spatial noise correlations, the false acceptance rate of the countermeasure can be improved while preserving the sensitivity to attacks. As a result, receivers can be placed significantly closer together than previously expected, which broadens the applicability of the countermeasure. Based on theoretical and practical investigations, we build the first realization of a multi-receiver countermeasure and experimentally evaluate its performance both in authentic and in spoofing scenarios.

Zhou, Mengyu, Sui, Kaixin, Ma, Minghua, Zhao, Youjian, Pei, Dan, Moscibroda, Thomas.  2016.  MobiCamp: A Campus-wide Testbed for Studying Mobile Physical Activities. Proceedings of the 3rd International on Workshop on Physical Analytics. :1–6.

Ubiquitous WiFi infrastructure and smart phones offer a great opportunity to study physical activities. In this paper, we present MobiCamp, a large-scale testbed for studying mobility-related activities of residents on a campus. MobiCamp consists of \textasciitilde2,700 APs, \textasciitilde95,000 smart phones, and an App with \textasciitilde2,300 opt-in volunteer users. More specifically, we capture how mobile users interact with different types of buildings, with other users, and with classroom courses, etc. To achieve this goal, we first obtain a relatively complete coverage of the users' mobility traces by utilizing four types of information from SNMP and by relaxing the location granularity to roughly at the room level. Then the popular App provides user attributes (grade, gender, etc.) and fine-grained behavior information (phone usages, course timetables, etc.) of the sampled population. These detailed mobile data is then correlated with the mobility traces from the SNMP to estimate the entire campus population's physical activities. We use two applications to show the power of MobiCamp.

Ben- Adar Bessos, Mai, Birnbach, Simon, Herzberg, Amir, Martinovic, Ivan.  2016.  Exposing Transmitters in Mobile Multi-Agent Games. Proceedings of the 2Nd ACM Workshop on Cyber-Physical Systems Security and Privacy. :125–136.

We study the trade-off between the benefits obtained by communication, vs. the risks due to exposure of the location of the transmitter. To study this problem, we introduce a game between two teams of mobile agents, the P-bots team and the E-bots team. The E-bots attempt to eavesdrop and collect information, while evading the P-bots; the P-bots attempt to prevent this by performing patrol and pursuit. The game models a typical use-case of micro-robots, i.e., their use for (industrial) espionage. We evaluate strategies for both teams, using analysis and simulations.

Park, Shinjo, Shaik, Altaf, Borgaonkar, Ravishankar, Seifert, Jean-Pierre.  2016.  White Rabbit in Mobile: Effect of Unsecured Clock Source in Smartphones. Proceedings of the 6th Workshop on Security and Privacy in Smartphones and Mobile Devices. :13–21.

With its high penetration rate and relatively good clock accuracy, smartphones are replacing watches in several market segments. Modern smartphones have more than one clock source to complement each other: NITZ (Network Identity and Time Zone), NTP (Network Time Protocol), and GNSS (Global Navigation Satellite System) including GPS. NITZ information is delivered by the cellular core network, indicating the network name and clock information. NTP provides a facility to synchronize the clock with a time server. Among these clock sources, only NITZ and NTP are updated without user interaction, as location services require manual activation. In this paper, we analyze security aspects of these clock sources and their impact on security features of modern smartphones. In particular, we investigate NITZ and NTP procedures over cellular networks (2G, 3G and 4G) and Wi-Fi communication respectively. Furthermore, we analyze several European, Asian, and American cellular networks from NITZ perspective. We identify three classes of vulnerabilities: specification issues in a cellular protocol, configurational issues in cellular network deployments, and implementation issues in different mobile OS's. We demonstrate how an attacker with low cost setup can spoof NITZ and NTP messages to cause Denial of Service attacks. Finally, we propose methods for securely synchronizing the clock on smartphones.

Schäfer, Matthias, Leu, Patrick, Lenders, Vincent, Schmitt, Jens.  2016.  Secure Motion Verification Using the Doppler Effect. Proceedings of the 9th ACM Conference on Security & Privacy in Wireless and Mobile Networks. :135–145.

Future transportation systems highly rely on the integrity of spatial information provided by their means of transportation such as vehicles and planes. In critical applications (e.g. collision avoidance), tampering with this data can result in life-threatening situations. It is therefore essential for the safety of these systems to securely verify this information. While there is a considerable body of work on the secure verification of locations, movement of nodes has only received little attention in the literature. This paper proposes a new method to securely verify spatial movement of a mobile sender in all dimensions, i.e., position, speed, and direction. Our scheme uses Doppler shift measurements from different locations to verify a prover's motion. We provide formal proof for the security of the scheme and demonstrate its applicability to air traffic communications. Our results indicate that it is possible to reliably verify the motion of aircraft in currently operational systems with an equal error rate of zero.

Khaledi, Mojgan, Khaledi, Mehrad, Kasera, Sneha Kumar, Patwari, Neal.  2016.  Preserving Location Privacy in Radio Networks Using a Stackelberg Game Framework. Proceedings of the 12th ACM Symposium on QoS and Security for Wireless and Mobile Networks. :29–37.

Radio network information is leaked well beyond the perimeter in which the radio network is deployed. We investigate attacks where person location can be inferred using the radio characteristics of wireless links (e.g., the received signal strength). An attacker can deploy a network of receivers which measure the received signal strength of the radio signals transmitted by the legitimate wireless devices inside a perimeter, allowing the attacker to learn the locations of people moving in the vicinity of the devices inside the perimeter. In this paper, we develop the first solution to this location privacy problem where neither the attacker nodes nor the tracked moving object transmit any RF signals. We first model the radio network leakage attack using a Stackelberg game. Next, we define utility and cost functions related to the defender and attacker actions. Last, using our utility and cost functions, we find the optimal strategy for the defender by applying a greedy method. We evaluate our game theoretic framework using experiments and find that our approach significantly reduces the chance of an attacker determining the location of people inside a perimeter.

Joy, Joshua, Le, Minh, Gerla, Mario.  2016.  LocationSafe: Granular Location Privacy for IoT Devices. Proceedings of the Eighth Wireless of the Students, by the Students, and for the Students Workshop. :39–41.

Today, mobile data owners lack consent and control over the release and utilization of their location data. Third party applications continuously process and access location data without data owners granular control and without knowledge of how location data is being used. The proliferation of GPS enabled IoT devices will lead to larger scale abuses of trust. In this paper we present the first design and implementation of a privacy module built into the GPSD daemon. The GPSD daemon is a low-level GPS interface that runs on GPS enabled devices. The integration of the privacy module ensures that data owners have granular control over the release of their GPS location. We describe the design of our privacy module integration into the GPSD daemon.

Alves, Thiago, Das, Rishabh, Morris, Thomas.  2016.  Virtualization of Industrial Control System Testbeds for Cybersecurity. Proceedings of the 2Nd Annual Industrial Control System Security Workshop. :10–14.

With an immense number of threats pouring in from nation states and hacktivists as well as terrorists and cybercriminals, the requirement of a globally secure infrastructure becomes a major obligation. Most critical infrastructures were primarily designed to work isolated from the normal communication network, but due to the advent of the "Smart Grid" that uses advanced and intelligent approaches to control critical infrastructure, it is necessary for these cyber-physical systems to have access to the communication system. Consequently, such critical systems have become prime targets; hence security of critical infrastructure is currently one of the most challenging research problems. Performing an extensive security analysis involving experiments with cyber-attacks on a live industrial control system (ICS) is not possible. Therefore, researchers generally resort to test beds and complex simulations to answer questions related to SCADA systems. Since all conclusions are drawn from the test bed, it is necessary to perform validation against a physical model. This paper examines the fidelity of a virtual SCADA testbed to a physical test bed and allows for the study of the effects of cyber- attacks on both of the systems.

Ahmed, Irfan, Roussev, Vassil, Johnson, William, Senthivel, Saranyan, Sudhakaran, Sneha.  2016.  A SCADA System Testbed for Cybersecurity and Forensic Research and Pedagogy. Proceedings of the 2Nd Annual Industrial Control System Security Workshop. :1–9.

This paper presents a supervisory control and data acquisition (SCADA) testbed recently built at the University of New Orleans. The testbed consists of models of three industrial physical processes: a gas pipeline, a power transmission and distribution system, and a wastewater treatment plant–these systems are fully-functional and implemented at small-scale. It utilizes real-world industrial equipment such as transformers, programmable logic controllers (PLC), aerators, etc., bringing it closer to modeling real-world SCADA systems. Sensors, actuators, and PLCs are deployed at each physical process system for local control and monitoring, and the PLCs are also connected to a computer running human-machine interface (HMI) software for monitoring the status of the physical processes. The testbed is a useful resource for cybersecurity research, forensic research, and education on different aspects of SCADA systems such as PLC programming, protocol analysis, and demonstration of cyber attacks.

2017-05-18
Huang, Waylon.  2016.  Discovering Additional Violations of Java API Invariants. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. :1145–1147.

In the absence of formal specifications or test oracles, automating testing is made possible by the fact that a program must satisfy certain requirements set down by the programming language. This work describes Randoop, an automatic unit test generator which checks for invariants specified by the Java API. Randoop is able to detect violations to invariants as specified by the Java API and create error tests that reveal related bugs. Randoop is also able to produce regression tests, meant to be added to regression test suites, that capture expected behavior. We discuss additional extensions that we have made to Randoop which expands its capability for the detection of violation of specified invariants. We also examine an optimization and a heuristic for making the invariant checking process more efficient.

Nadi, Sarah, Krüger, Stefan, Mezini, Mira, Bodden, Eric.  2016.  Jumping Through Hoops: Why Do Java Developers Struggle with Cryptography APIs? Proceedings of the 38th International Conference on Software Engineering. :935–946.

To protect sensitive data processed by current applications, developers, whether security experts or not, have to rely on cryptography. While cryptography algorithms have become increasingly advanced, many data breaches occur because developers do not correctly use the corresponding APIs. To guide future research into practical solutions to this problem, we perform an empirical investigation into the obstacles developers face while using the Java cryptography APIs, the tasks they use the APIs for, and the kind of (tool) support they desire. We triangulate data from four separate studies that include the analysis of 100 StackOverflow posts, 100 GitHub repositories, and survey input from 48 developers. We find that while developers find it difficult to use certain cryptographic algorithms correctly, they feel surprisingly confident in selecting the right cryptography concepts (e.g., encryption vs. signatures). We also find that the APIs are generally perceived to be too low-level and that developers prefer more task-based solutions.

Amani, Sven, Nadi, Sarah, Nguyen, Hoan A., Nguyen, Tien N., Mezini, Mira.  2016.  MUBench: A Benchmark for API-misuse Detectors. Proceedings of the 13th International Conference on Mining Software Repositories. :464–467.

Over the last few years, researchers proposed a multitude of automated bug-detection approaches that mine a class of bugs that we call API misuses. Evaluations on a variety of software products show both the omnipresence of such misuses and the ability of the approaches to detect them. This work presents MuBench, a dataset of 89 API misuses that we collected from 33 real-world projects and a survey. With the dataset we empirically analyze the prevalence of API misuses compared to other types of bugs, finding that they are rare, but almost always cause crashes. Furthermore, we discuss how to use it to benchmark and compare API-misuse detectors.

Nguyen, Trong Duc, Nguyen, Anh Tuan, Nguyen, Tien N..  2016.  Mapping API Elements for Code Migration with Vector Representations. Proceedings of the 38th International Conference on Software Engineering Companion. :756–758.

Problem. Code migration between languages is challenging partly because different languages require developers to use different software libraries and frameworks. For example, in Java, Java Development Kit library (JDK) is a popular toolkit while .NET is the main framework used in C\# software development. Code migration requires not only the mappings between the language constructs (e.g., statements, expressions) but also the mappings among the APIs of the libraries/frameworks used in two languages. For example, in Java, to write to a file, one can use FileWriter.write of FileWriter, and in C\#, one can achieve the same function with StreamWriter.Write of StreamWriter. Such mapping is called API mapping.

Indela, Soumya, Kulkarni, Mukul, Nayak, Kartik, Dumitras, Tudor.  2016.  Helping Johnny Encrypt: Toward Semantic Interfaces for Cryptographic Frameworks. Proceedings of the 2016 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. :180–196.

Several mature cryptographic frameworks are available, and they have been utilized for building complex applications. However, developers often use these frameworks incorrectly and introduce security vulnerabilities. This is because current cryptographic frameworks erode abstraction boundaries, as they do not encapsulate all the framework-specific knowledge and expect developers to understand security attacks and defenses. Starting from the documented misuse cases of cryptographic APIs, we infer five developer needs and we show that a good API design would address these needs only partially. Building on this observation, we propose APIs that are semantically meaningful for developers, we show how these interfaces can be implemented consistently on top of existing frameworks using novel and known design patterns, and we propose build management hooks for isolating security workarounds needed during the development and test phases. Through two case studies, we show that our APIs can be utilized to implement non-trivial client-server protocols and that they provide a better separation of concerns than existing frameworks. We also discuss the challenges and potential approaches for evaluating our solution. Our semantic interfaces represent a first step toward preventing misuses of cryptographic APIs.

Nguyen, Anh Tuan, Hilton, Michael, Codoban, Mihai, Nguyen, Hoan Anh, Mast, Lily, Rademacher, Eli, Nguyen, Tien N., Dig, Danny.  2016.  API Code Recommendation Using Statistical Learning from Fine-grained Changes. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. :511–522.

Learning and remembering how to use APIs is difficult. While code-completion tools can recommend API methods, browsing a long list of API method names and their documentation is tedious. Moreover, users can easily be overwhelmed with too much information. We present a novel API recommendation approach that taps into the predictive power of repetitive code changes to provide relevant API recommendations for developers. Our approach and tool, APIREC, is based on statistical learning from fine-grained code changes and from the context in which those changes were made. Our empirical evaluation shows that APIREC correctly recommends an API call in the first position 59% of the time, and it recommends the correct API call in the top five positions 77% of the time. This is a significant improvement over the state-of-the-art approaches by 30-160% for top-1 accuracy, and 10-30% for top-5 accuracy, respectively. Our result shows that APIREC performs well even with a one-time, minimal training dataset of 50 publicly available projects.

Tan, Antoine Tran, Kaiser, Hartmut.  2016.  Extending C++ with Co-array Semantics. Proceedings of the 3rd ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming. :63–68.

The current trend of large scientific computing problems is to align as much as possible to a Single Programming Multiple Data (or SPMD) scheme when the application algorithms are conducive to parallelization and vectorization. This reduces the complexity of code because the processors or (computational nodes) perform the same instructions which allows for better performance as algorithms work on local data sets instead of continuously transferring data from one locality to another. However, certain applications, such as stencil problems, demonstrate the need to move data to or from remote localities. This involves an additional degree of complexity, as one must know with which localities to exchange data. In order to solve this issue, Fortran has extended its scalar element indexing approach to distributed structures of elements. In this extension, a structure of scalar elements is attributed a ”co-index” and lives in a specific locality. A co-index provides the application with enough information to retrieve the corresponding data reference. In C++, containers present themselves as a ”smarter” alternative of Fortran arrays but there are still no corresponding standardized features similar to the Fortran co-indexing approach. In this paper, we present an implementation of such features in HPX, a general purpose C++ runtime system for applications of any scale. We describe how the combination of the HPX features and the actual C++ Standard makes it easy to define a high performance API similar to Co-Array Fortran.

Hasan, Samir, King, Zachary, Hafiz, Munawar, Sayagh, Mohammed, Adams, Bram, Hindle, Abram.  2016.  Energy Profiles of Java Collections Classes. Proceedings of the 38th International Conference on Software Engineering. :225–236.

We created detailed profiles of the energy consumed by common operations done on Java List, Map, and Set abstractions. The results show that the alternative data types for these abstractions differ significantly in terms of energy consumption depending on the operations. For example, an ArrayList consumes less energy than a LinkedList if items are inserted at the middle or at the end, but consumes more energy than a LinkedList if items are inserted at the start of the list. To explain the results, we explored the memory usage and the bytecode executed during an operation. Expensive computation tasks in the analyzed bytecode traces appeared to have an energy impact, but memory usage did not contribute. We evaluated our profiles by using them to selectively replace Collections types used in six applications and libraries. We found that choosing the wrong Collections type, as indicated by our profiles, can cost even 300% more energy than the most efficient choice. Our work shows that the usage context of a data structure and our measured energy profiles can be used to decide between alternative Collections implementations.

Lin, Ziyi, Zhong, Hao, Chen, Yuting, Zhao, Jianjun.  2016.  LockPeeker: Detecting Latent Locks in Java APIs. Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering. :368–378.

Detecting lock-related defects has long been a hot research topic in software engineering. Many efforts have been spent on detecting such deadlocks in concurrent software systems. However, latent locks may be hidden in application programming interface (API) methods whose source code may not be accessible to developers. Many APIs have latent locks. For example, our study has shown that J2SE alone can have 2,000+ latent locks. As latent locks are less known by developers, they can cause deadlocks that are hard to perceive or diagnose. Meanwhile, the state-of-the-art tools mostly handle API methods as black boxes, and cannot detect deadlocks that involve such latent locks. In this paper, we propose a novel black-box testing approach, called LockPeeker, that reveals latent locks in Java APIs. The essential idea of LockPeeker is that latent locks of a given API method can be revealed by testing the method and summarizing the locking effects during testing execution. We have evaluated LockPeeker on ten real-world Java projects. Our evaluation results show that (1) LockPeeker detects 74.9% of latent locks in API methods, and (2) it enables state-of-the-art tools to detect deadlocks that otherwise cannot be detected.

Gyori, Alex, Lambeth, Ben, Shi, August, Legunsen, Owolabi, Marinov, Darko.  2016.  NonDex: A Tool for Detecting and Debugging Wrong Assumptions on Java API Specifications. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. :993–997.

We present NonDex, a tool for detecting and debugging wrong assumptions on Java APIs. Some APIs have underdetermined specifications to allow implementations to achieve different goals, e.g., to optimize performance. When clients of such APIs assume stronger-than-specified guarantees, the resulting client code can fail. For example, HashSet’s iteration order is underdetermined, and code assuming some implementation-specific iteration order can fail. NonDex helps to proactively detect and debug such wrong assumptions. NonDex performs detection by randomly exploring different behaviors of underdetermined APIs during test execution. When a test fails during exploration, NonDex searches for the invocation instance of the API that caused the failure. NonDex is open source, well-integrated with Maven, and also runs from the command line. During our experiments with the NonDex Maven plugin, we detected 21 new bugs in eight Java projects from GitHub, and, using the debugging feature of NonDex, we identified the underlying wrong assumptions for these 21 new bugs and 54 previously detected bugs. We opened 13 pull requests; developers already accepted 12, and one project changed the continuous-integration configuration to run NonDex on every push. The demo video is at: https://youtu.be/h3a9ONkC59c

Fowkes, Jaroslav, Sutton, Charles.  2016.  Parameter-free Probabilistic API Mining Across GitHub. Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. :254–265.

Existing API mining algorithms can be difficult to use as they require expensive parameter tuning and the returned set of API calls can be large, highly redundant and difficult to understand. To address this, we present PAM (Probabilistic API Miner), a near parameter-free probabilistic algorithm for mining the most interesting API call patterns. We show that PAM significantly outperforms both MAPO and UPMiner, achieving 69% test-set precision, at retrieving relevant API call sequences from GitHub. Moreover, we focus on libraries for which the developers have explicitly provided code examples, yielding over 300,000 LOC of hand-written API example code from the 967 client projects in the data set. This evaluation suggests that the hand-written examples actually have limited coverage of real API usages.