Biblio | CPS-VO

Islam, Mafijul Md., Lautenbach, Aljoscha, Sandberg, Christian, Olovsson, Tomas. 2016. A Risk Assessment Framework for Automotive Embedded Systems. Proceedings of the 2Nd ACM International Workshop on Cyber-Physical System Security. :3–14.

The automotive industry is experiencing a paradigm shift towards autonomous and connected vehicles. Coupled with the increasing usage and complexity of electrical and/or electronic systems, this introduces new safety and security risks. Encouragingly, the automotive industry has relatively well-known and standardised safety risk management practices, but security risk management is still in its infancy. In order to facilitate the derivation of security requirements and security measures for automotive embedded systems, we propose a specifically tailored risk assessment framework, and we demonstrate its viability with an industry use-case. Some of the key features are alignment with existing processes for functional safety, and usability for non-security specialists. The framework begins with a threat analysis to identify the assets, and threats to those assets. The following risk assessment process consists of an estimation of the threat level and of the impact level. This step utilises several existing standards and methodologies, with changes where necessary. Finally, a security level is estimated which is used to formulate high-level security requirements. The strong alignment with existing standards and processes should make this framework well-suited for the needs in the automotive industry.

Kim, Woobin, Jin, Jungha, Kim, Keecheon. 2016. A Routing Protocol Method That Sets Up Multi-hops in the Ad-hoc Network. Proceedings of the 10th International Conference on Ubiquitous Information Management and Communication. :70:1–70:6.

In infrastructure wireless network technology, communication between users is provided within a certain area supported by access points (APs) or base station communication networks, but in ad-hoc networks, communication between users is provided only through direct connections between nodes. Ad-hoc network technology supports mobility directly through routing algorithms. However, when a connected node is lost owing to the node's movement, the routing protocol transfers this traffic to another node. The routing table in the node that is receiving the traffic detects any changes that occur and manages them. This paper proposes a routing protocol method that sets up multi-hops in the ad-hoc network and verifies the performance, which provides more effective connection persistence than existing methods.

Toulouse, Michel, Le, Hai, Phung, Cao Vien, Hock, Denis. 2016. Robust Consensus-based Network Intrusion Detection in Presence of Byzantine Attacks. Proceedings of the Seventh Symposium on Information and Communication Technology. :278–285.

Consensus algorithms provide strategies to solve problems in a distributed system with the added constraint that data can only be shared between adjacent computing nodes. We find these algorithms in applications for wireless and sensor networks, spectrum sensing for cognitive radio, even for some IoT services. However, consensus-based applications are not resilient to compromised nodes sending falsified data to their neighbors, i.e. they can be the target of Byzantine attacks. Several solutions have been proposed in the literature inspired from reputation based systems, outlier detection or model-based fault detection techniques in process control. We have reviewed some of these solutions, and propose two mitigation techniques to protect the consensus-based Network Intrusion Detection System in [1]. We analyze several implementation issues such as computational overhead, fine tuning of the solution parameters, impacts on the convergence of the consensus phase, accuracy of the intrusion detection system.

Kumar, Vimal, Kumar, Satish, Gupta, Avadhesh Kumar. 2016. Real-time Detection of Botnet Behavior in Cloud Using Domain Generation Algorithm. Proceedings of the International Conference on Advances in Information Communication Technology & Computing. :69:1–69:3.

In the last few years, the high acceptability of service computing delivered over the internet has exponentially created immense security challenges for the services providers. Cyber criminals are using advanced malware such as polymorphic botnets for participating in our everyday online activities and trying to access the desired information in terms of personal details, credit card numbers and banking credentials. Polymorphic botnet attack is one of the biggest attacks in the history of cybercrime and currently, millions of computers are infected by the botnet clients over the world. Botnet attack is an intelligent and highly coordinated distributed attack which consists of a large number of bots that generates big volumes of spamming e-mails and launching distributed denial of service (DDoS) attacks on the victim machines in a heterogeneous network environment. Therefore, it is necessary to detect the malicious bots and prevent their planned attacks in the cloud environment. A number of techniques have been developed for detecting the malicious bots in a network in the past literature. This paper recognize the ineffectiveness exhibited by the singnature based detection technique and networktraffic based detection such as NetFlow or traffic flow detection and Anomaly based detection. We proposed a real time malware detection methodology based on Domain Generation Algorithm. It increasesthe throughput in terms of early detection of malicious bots and high accuracy of identifying the suspicious behavior.

Rahbarinia, Babak, Balduzzi, Marco, Perdisci, Roberto. 2016. Real-Time Detection of Malware Downloads via Large-Scale URL-≫File-≫Machine Graph Mining. Proceedings of the 11th ACM on Asia Conference on Computer and Communications Security. :783–794.

In this paper we propose Mastino, a novel defense system to detect malware download events. A download event is a 3-tuple that identifies the action of downloading a file from a URL that was triggered by a client (machine). Mastino utilizes global situation awareness and continuously monitors various network- and system-level events of the clients' machines across the Internet and provides real time classification of both files and URLs to the clients upon submission of a new, unknown file or URL to the system. To enable detection of the download events, Mastino builds a large download graph that captures the subtle relationships among the entities of download events, i.e. files, URLs, and machines. We implemented a prototype version of Mastino and evaluated it in a large-scale real-world deployment. Our experimental evaluation shows that Mastino can accurately classify malware download events with an average of 95.5% true positive (TP), while incurring less than 0.5% false positives (FP). In addition, we show the Mastino can classify a new download event as either benign or malware in just a fraction of a second, and is therefore suitable as a real time defense system.

Bo, Li, Jinzhen, Wang, Ping, Zhao, Zhongjiang, Yan, Mao, Yang. 2016. Research of Recognition System of Web Intrusion Detection Based on Storm. Proceedings of the Fifth International Conference on Network, Communication and Computing. :98–102.

Based on Storm, a distributed, reliable, fault-tolerant real-time data stream processing system, we propose a recognition system of web intrusion detection. The system is based on machine learning, feature selection algorithm by TF-IDF(Term Frequency–Inverse Document Frequency) and the optimised cosine similarity algorithm, at low false positive rate and a higher detection rate of attacks and malicious behavior in real-time to protect the security of user data. From comparative analysis of experiments we find that the system for intrusion recognition rate and false positive rate has improved to some extent, it can be better to complete the intrusion detection work.

Yingying, Xu, Chao, Liu, Tao, Tang. 2016. Research on Risk Assessment of CTCS Based on Fuzzy Reasoning and Analytic Hierarchy Process. Proceedings of the 2016 International Conference on Intelligent Information Processing. :31:1–31:7.

In this paper, we describe the formatting guidelines for ACM SIG Proceedings. In order to assure safety of Chinese Train Control System (CTCS), it is necessary to ensure the operational risk is acceptable throughout its life-cycle, which requires a pragmatic risk assessment required for effective risk control. Many risk assessment techniques currently used in railway domain are qualitative, and rely on the experience of experts, which unavoidably brings in subjective judgements. This paper presents a method that combines fuzzy reasoning and analytic hierarchy process approach to quantify the experiences of experts to get the scores of risk parameters. Fuzzy reasoning is used to obtain the risk of system hazard, analytic hierarchy process approach is used to determine the risk level (RL) and its membership of the system. This method helps safety analyst to calculate overall collective risk level of system. A case study of risk assessment of CTCS system is used to demonstrate this method can give quantitative result of collective risks without much information from experts, but can support the risk assessment with risk level and its membership, which are more valuable to guide the further risk management.

Hall, Chris, Puder, Doron, Sawin, William F.. 2016. Ramanujan Coverings of Graphs. Proceedings of the Forty-eighth Annual ACM Symposium on Theory of Computing. :533–541.

Let G be a finite connected graph, and let ρ be the spectral radius of its universal cover. For example, if G is k-regular then ρ=2√k−1. We show that for every r, there is an r-covering (a.k.a. an r-lift) of G where all the new eigenvalues are bounded from above by ρ. It follows that a bipartite Ramanujan graph has a Ramanujan r-covering for every r. This generalizes the r=2 case due to Marcus, Spielman and Srivastava (2013). Every r-covering of G corresponds to a labeling of the edges of G by elements of the symmetric group Sr. We generalize this notion to labeling the edges by elements of various groups and present a broader scenario where Ramanujan coverings are guaranteed to exist. In particular, this shows the existence of richer families of bipartite Ramanujan graphs than was known before. Inspired by Marcus-Spielman-Srivastava, a crucial component of our proof is the existence of interlacing families of polynomials for complex reflection groups. The core argument of this component is taken from Marcus-Spielman-Srivastava (2015). Another important ingredient of our proof is a new generalization of the matching polynomial of a graph. We define the r-th matching polynomial of G to be the average matching polynomial of all r-coverings of G. We show this polynomial shares many properties with the original matching polynomial. For example, it is real rooted with all its roots inside [−ρ,ρ].

Cheng, Wei, Zhang, Kai, Chen, Haifeng, Jiang, Guofei, Chen, Zhengzhang, Wang, Wei. 2016. Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :805–814.

Modern world has witnessed a dramatic increase in our ability to collect, transmit and distribute real-time monitoring and surveillance data from large-scale information systems and cyber-physical systems. Detecting system anomalies thus attracts significant amount of interest in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be a powerful way in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: 1) fault propagation in the network is ignored; 2) the root casual anomalies may not always be the nodes with a high-percentage of vanishing correlations; 3) temporal patterns of vanishing correlations are not exploited for robust detection. To address these limitations, in this paper we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectively model fault propagation over the entire invariant network, and can perform joint inference on both the structural, and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations, and can compensate for unstructured measurement noise in the system. Extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets demonstrate the effectiveness of our approach.

Crampton, Jason, Gutin, Gregory, Watrigant, Rémi. 2016. Resiliency Policies in Access Control Revisited. Proceedings of the 21st ACM on Symposium on Access Control Models and Technologies. :101–111.

Resiliency is a relatively new topic in the context of access control. Informally, it refers to the extent to which a multi-user computer system, subject to an authorization policy, is able to continue functioning if a number of authorized users are unavailable. Several interesting problems connected to resiliency were introduced by Li, Wang and Tripunitara [13], many of which were found to be intractable. In this paper, we show that these resiliency problems have unexpected connections with the workflow satisfiability problem (WSP). In particular, we show that an instance of the resiliency checking problem (RCP) may be reduced to an instance of WSP. We then demonstrate that recent advances in our understanding of WSP enable us to develop fixed-parameter tractable algorithms for RCP. Moreover, these algorithms are likely to be useful in practice, given recent experimental work demonstrating the advantages of bespoke algorithms to solve WSP. We also generalize RCP in several different ways, showing in each case how to adapt the reduction to WSP. Li et al also showed that the coexistence of resiliency policies and static separation-of-duty policies gives rise to further interesting questions. We show how our reduction of RCP to WSP may be extended to solve these problems as well and establish that they are also fixed-parameter tractable.

C. Theisen, K. Herzig, B. Murphy, L. Williams. 2017. Risk-based attack surface approximation: how much data is enough? 2017 IEEE/ACM 39th International Conference on Software Engineering: Software Engineering in Practice Track (ICSE-SEIP). :273-282.

Proactive security reviews and test efforts are a necessary component of the software development lifecycle. Resource limitations often preclude reviewing the entire code base. Making informed decisions on what code to review can improve a team's ability to find and remove vulnerabilities. Risk-based attack surface approximation (RASA) is a technique that uses crash dump stack traces to predict what code may contain exploitable vulnerabilities. The goal of this research is to help software development teams prioritize security efforts by the efficient development of a risk-based attack surface approximation. We explore the use of RASA using Mozilla Firefox and Microsoft Windows stack traces from crash dumps. We create RASA at the file level for Firefox, in which the 15.8% of the files that were part of the approximation contained 73.6% of the vulnerabilities seen for the product. We also explore the effect of random sampling of crashes on the approximation, as it may be impractical for organizations to store and process every crash received. We find that 10-fold random sampling of crashes at a rate of 10% resulted in 3% less vulnerabilities identified than using the entire set of stack traces for Mozilla Firefox. Sampling crashes in Windows 8.1 at a rate of 40% resulted in insignificant differences in vulnerability and file coverage as compared to a rate of 100%.

Theisen, Christopher. 2016. Reusing Stack Traces: Automated Attack Surface Approximation. Proceedings of the 38th International Conference on Software Engineering Companion. :859–862.

Security requirements around software systems have become more stringent as society becomes more interconnected via the Internet. New ways of prioritizing security efforts are needed so security professionals can use their time effectively to find security vulnerabilities or prevent them from occurring in the first place. The goal of this work is to help software development teams prioritize security efforts by approximating the attack surface of a software system via stack trace analysis. Automated attack surface approximation is a technique that uses crash dump stack traces to predict what code may contain exploitable vulnerabilities. If a code entity (a binary, file or function) appears on stack traces, then Attack Surface Approximation (ASA) considers that code entity is on the attack surface of the software system. We also explore whether number of appearances of code on stack traces correlates with where security vulnerabilities are found. To date, feasibility studies of ASA have been performed on Windows 8 and 8.1, and Mozilla Firefox. The results from these studies indicate that ASA may be useful for practitioners trying to secure their software systems. We are now working towards establishing the ground truth of what the attack surface of software systems is, along with looking at how ASA could change over time, among other metrics.

Lampert, Ben, Wahby, Riad S., Leonard, Shane, Levis, Philip. 2016. Robust, Low-cost, Auditable Random Number Generation for Embedded System Security. Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM. :16–27.

This paper presents an architecture for a discrete, high-entropy hardware random number generator. Because it is constructed out of simple hardware components, its operation is transparent and auditable. Using avalanche noise, a non-deterministic physical phenomenon, the circuit is inherently probabilistic and resists adversarial control. Furthermore, because it compares the outputs from two matched noise sources, it rejects environmental disturbances like RF energy and power supply ripple. The resulting hardware produces more than 0.98 bits of entropy per sample, is inexpensive, has a small footprint, and can be disabled to conserve power when not in use.

Toulouse, Michel, Le, Hai, Phung, Cao Vien, Hock, Denis. 2016. Robust Consensus-based Network Intrusion Detection in Presence of Byzantine Attacks. Proceedings of the Seventh Symposium on Information and Communication Technology. :278–285.

Consensus algorithms provide strategies to solve problems in a distributed system with the added constraint that data can only be shared between adjacent computing nodes. We find these algorithms in applications for wireless and sensor networks, spectrum sensing for cognitive radio, even for some IoT services. However, consensus-based applications are not resilient to compromised nodes sending falsified data to their neighbors, i.e. they can be the target of Byzantine attacks. Several solutions have been proposed in the literature inspired from reputation based systems, outlier detection or model-based fault detection techniques in process control. We have reviewed some of these solutions, and propose two mitigation techniques to protect the consensus-based Network Intrusion Detection System in [1]. We analyze several implementation issues such as computational overhead, fine tuning of the solution parameters, impacts on the convergence of the consensus phase, accuracy of the intrusion detection system.

Cheng, Wei, Zhang, Kai, Chen, Haifeng, Jiang, Guofei, Chen, Zhengzhang, Wang, Wei. 2016. Ranking Causal Anomalies via Temporal and Dynamical Analysis on Vanishing Correlations. Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :805–814.

Modern world has witnessed a dramatic increase in our ability to collect, transmit and distribute real-time monitoring and surveillance data from large-scale information systems and cyber-physical systems. Detecting system anomalies thus attracts significant amount of interest in many fields such as security, fault management, and industrial optimization. Recently, invariant network has shown to be a powerful way in characterizing complex system behaviours. In the invariant network, a node represents a system component and an edge indicates a stable, significant interaction between two components. Structures and evolutions of the invariance network, in particular the vanishing correlations, can shed important light on locating causal anomalies and performing diagnosis. However, existing approaches to detect causal anomalies with the invariant network often use the percentage of vanishing correlations to rank possible casual components, which have several limitations: 1) fault propagation in the network is ignored; 2) the root casual anomalies may not always be the nodes with a high-percentage of vanishing correlations; 3) temporal patterns of vanishing correlations are not exploited for robust detection. To address these limitations, in this paper we propose a network diffusion based framework to identify significant causal anomalies and rank them. Our approach can effectively model fault propagation over the entire invariant network, and can perform joint inference on both the structural, and the time-evolving broken invariance patterns. As a result, it can locate high-confidence anomalies that are truly responsible for the vanishing correlations, and can compensate for unstructured measurement noise in the system. Extensive experiments on synthetic datasets, bank information system datasets, and coal plant cyber-physical system datasets demonstrate the effectiveness of our approach.

Zhang, Lihua, Shang, Yue, Qin, Qi, Chen, Shaowei, Zhao, Shuai. 2016. Research on Fault Feature Extraction for Analog Circuits. Proceedings of the 8th International Conference on Signal Processing Systems. :173–177.

In order to realize the accurate positioning and recognition effectively of the analog circuit, the feature extraction of fault information is an extremely important port. This arrival based on the experimental circuit which is designed as a failure mode to pick-up the fault sample set. We have chosen two methods, one is the combination of wavelet transform and principal component analysis, the other is the factorial analysis for the fault data's feature extraction, and we also use the extreme learning machine to train and diagnose the data, to compare the performance of these two methods through the accuracy of the diagnosis. The results of the experiment shows that the data which we get from the experimental circuit, after dealing with these two methods can quickly get the fault location.

Tuba, Eva, Tuba, Milan, Simian, Dana. 2016. Range Based Wireless Sensor Node Localization Using Bat Algorithm. Proceedings of the 13th ACM Symposium on Performance Evaluation of Wireless Ad Hoc, Sensor, & Ubiquitous Networks. :41–44.

For most wireless sensor networks applications it is necessary to know the locations of all sensor nodes. Since sensor nodes are usually cheap, it is impossible to equip them all with GPS devices, hence the localization process depends on few static or mobile anchor nodes with GPS devices. Range based localization methods use estimated distance between sensor and anchor nodes where the quality of estimation usually depends on the distance and angle of arrival. Localization based on such noisy data represents a hard optimization problem for which swarm intelligence algorithms have been successfully used. In this paper we propose a range based localization algorithm that uses recently developed bat algorithm. The two stage localization algorithm uses four semi-mobile anchors that are at first located at the corners of the area where sensors are deployed and after that the anchors move to their optimal positions with minimal distances to sensor nodes, but with maximal viewing angles. Our proposed algorithm is even at the first stage superior to other approaches from literature in minimizing the error between real and estimated sensor node positions and it is additionally improved at the second stage.

Usman, Aminu Bello, Gutierrez, Jairo. 2016. A Reliability-Based Trust Model for Efficient Collaborative Routing in Wireless Networks. Proceedings of the 11th International Conference on Queueing Theory and Network Applications. :15:1–15:7.

Different wireless Peer-to-Peer (P2P) routing protocols rely on cooperative protocols of interaction among peers, yet, most of the surveyed provide little detail on how the peers can take into consideration the peers' reliability for improving routing efficiency in collaborative networks. Previous research has shown that in most of the trust and reputation evaluation schemes, the peers' rating behaviour can be improved to include the peers' attributes for understanding peers' reliability. This paper proposes a reliability based trust model for dynamic trust evaluation between the peers in P2P networks for collaborative routing. Since the peers' routing attributes vary dynamically, our proposed model must also accommodate the dynamic changes of peers' attributes and behaviour. We introduce peers' buffers as a scaling factor for peers' trust evaluation in the trust and reputation routing protocols. The comparison between reliability and non-reliability based trust models using simulation shows the improved performance of our proposed model in terms of delivery ratio and average message latency.

Shubham Goel, University of Illinois at Urbana-Champaign, Masooda Bashir, University of Illinois at Urbana-Champaign. 2017. Ransomware: Recommendations against the Extortion.

Poster presented at the 2017 Science of Security UIUC Lablet Summer Internship Poster Session held on July 27, 2017 in Urbana, IL.

Matsuki, Tatsuma, Matsuoka, Naoki. 2016. A Resource Contention Analysis Framework for Diagnosis of Application Performance Anomalies in Consolidated Cloud Environments. Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering. :173–184.

Cloud services have made large contributions to the agile developments and rapid revisions of various applications. However, the performance of these applications is still one of the largest concerns for developers. Although it has created many performance analysis frameworks, most of them have not been efficient for the rapid application revisions because they have required performance models, which may have had to be remodeled whenever application revisions occurred. We propose an analysis framework for diagnosis of application performance anomalies. We designed our framework so that it did not require any performance models to be efficient in rapid application revisions. That investigates the Pearson correlation and association rules between system metrics and application performance. The association rules are widely used in data-mining areas to find relations between variables in databases. We demonstrated through an experiment and testing on a real data set that our framework could select causal metrics even when the metrics were temporally correlated, which reduced the false negatives obtained from cause diagnosis. We evaluated our framework from the perspective of the expected remaining diagnostic costs of framework users. The results indicated that it was expected to reduce the diagnostic costs by 84.8\textbackslash% at most, compared with a method that only used the Pearson correlation.

Liu, Mingmou, Pan, Xiaoyin, Yin, Yitong. 2016. Randomized Approximate Nearest Neighbor Search with Limited Adaptivity. Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures. :23–33.

We study the problem of approximate nearest neighbor search in \$d\$-dimensional Hamming space \0,1\d. We study the complexity of the problem in the famous cell-probe model, a classic model for data structures. We consider algorithms in the cell-probe model with limited adaptivity, where the algorithm makes k rounds of parallel accesses to the data structure for a given k. For any k ≥ 1, we give a simple randomized algorithm solving the approximate nearest neighbor search using k rounds of parallel memory accesses, with O(k(log d)1/k) accesses in total. We also give a more sophisticated randomized algorithm using O(k+(1/k log d)O(1/k)) memory accesses in k rounds for large enough k. Both algorithms use data structures of size polynomial in n, the number of points in the database. We prove an Ω(1/k(log d)1/k) lower bound for the total number of memory accesses required by any randomized algorithm solving the approximate nearest neighbor search within k ≤ (log log d)/(2 log log log d) rounds of parallel memory accesses on any data structures of polynomial size. This lower bound shows that our first algorithm is asymptotically optimal for any constant round k. And our second algorithm approaches the asymptotically optimal tradeoff between rounds and memory accesses, in a sense that the lower bound of memory accesses for any k1 rounds can be matched by the algorithm within k2=O(k1) rounds. In the extreme, for some large enough k=Θ((log log d)/(log log log d)), our second algorithm matches the Θ((log log d)/(log log log d)) tight bound for fully adaptive algorithms for approximate nearest neighbor search due to Chakrabarti and Regev.

Basilico, Justin, Raimond, Yves. 2016. Recommending for the World. Proceedings of the 10th ACM Conference on Recommender Systems. :375–375.

The Netflix experience is driven by a number of recommendation algorithms: personalized ranking, page generation, similarity, ratings, search, etc. On the January 6th, 2016 we simultaneously launched Netflix in 130 new countries around the world, which brought the total to over 190 countries. Preparing for such a rapid expansion while ensuring each algorithm was ready to work seamlessly created new challenges for our recommendation and search teams. In this talk, we will highlight the four most interesting challenges we encountered in making our algorithms operate globally and how this improved our ability to connect members worldwide with stories they'll love. In particular, we will dive into the problems of uneven availability across catalogs, balancing personal and cultural tastes, handling language, and tracking quality of recommendations. Uneven catalog availability is a challenge because many recommendation algorithms assume that people could interact with any item and then use the absence of interaction implicitly or explicitly as negative information in the model. However, this assumption does not hold globally and across time where item availability differs. Running algorithms globally means needing a notion of location so that we can handle local variations in taste while also providing a good basis for personalization. Language is another challenge in recommending video content because people can typically only enjoy content that has assets (audio, subtitles) in languages they understand. The preferences for how people enjoy such content also vary between people and depend on their familiarity with a language. Also, while would like our recommendations to work well for every one of our members, tracking quality becomes difficult because with so many members in so many countries speaking so many languages, it can be hard to determine when an algorithm or system is performing sub-optimally for some subset of them. Thus, to support this global launch, we examined each and every algorithm that is part of our service and began to address these challenges.

Chen, Jing, McCauley, Samuel, Singh, Shikha. 2016. Rational Proofs with Multiple Provers. Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science. :237–248.

Interactive proofs model a world where a verifier delegates computation to an untrustworthy prover, verifying the prover's claims before accepting them. These proofs have applications to delegation of computation, probabilistically checkable proofs, crowdsourcing, and more. In some of these applications, the verifier may pay the prover based on the quality of his work. Rational proofs, introduced by Azar and Micali (2012), are an interactive proof model in which the prover is rational rather than untrustworthy–-he may lie, but only to increase his payment. This allows the verifier to leverage the greed of the prover to obtain better protocols: while rational proofs are no more powerful than interactive proofs, the protocols are simpler and more efficient. Azar and Micali posed as an open problem whether multiple provers are more powerful than one for rational proofs. We provide a model that extends rational proofs to allow multiple provers. In this model, a verifier can cross-check the answers received by asking several provers. The verifier can pay the provers according to the quality of their work, incentivizing them to provide correct information. We analyze rational proofs with multiple provers from a complexity-theoretic point of view. We fully characterize this model by giving tight upper and lower bounds on its power. On the way, we resolve Azar and Micali's open problem in the affirmative, showing that multiple rational provers are strictly more powerful than one (under standard complexity-theoretic assumptions). We further show that the full power of rational proofs with multiple provers can be achieved using only two provers and five rounds of interaction. Finally, we consider more demanding models where the verifier wants the provers' payment to decrease significantly when they are lying, and fully characterize the power of the model when the payment gap must be noticeable (i.e., at least 1/p where p is a polynomial).

Chakrabarti, Aniket, Marwah, Manish, Arlitt, Martin. 2016. Robust Anomaly Detection for Large-Scale Sensor Data. Proceedings of the 3rd ACM International Conference on Systems for Energy-Efficient Built Environments. :31–40.

Large scale sensor networks are ubiquitous nowadays. An important objective of deploying sensors is to detect anomalies in the monitored system or infrastructure, which allows remedial measures to be taken to prevent failures, inefficiencies, and security breaches. Most existing sensor anomaly detection methods are local, i.e., they do not capture the global dependency structure of the sensors, nor do they perform well in the presence of missing or erroneous data. In this paper, we propose an anomaly detection technique for large scale sensor data that leverages relationships between sensors to improve robustness even when data is missing or erroneous. We develop a probabilistic graphical model-based global outlier detection technique that represents a sensor network as a pairwise Markov Random Field and uses graphical model inference to detect anomalies. We show our model is more robust than local models, and detects anomalies with 90% accuracy even when 50% of sensors are erroneous. We also build a synthetic graphical model generator that preserves statistical properties of a real data set to test our outlier detection technique at scale.

Abdulla, Parosh Aziz, Aiswarya, C., Atig, Mohamed Faouzi, Montali, Marco, Rezine, Othmane. 2016. Recency-Bounded Verification of Dynamic Database-Driven Systems. Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems. :195–210.

We propose a formalism to model database-driven systems, called database manipulating systems (DMS). The actions of a (DMS) modify the current instance of a relational database by adding new elements into the database, deleting tuples from the relations and adding tuples to the relations. The elements which are modified by an action are chosen by (full) first-order queries. (DMS) is a highly expressive model and can be thought of as a succinct representation of an infinite state relational transition system, in line with similar models proposed in the literature. We propose monadic second order logic (MSO-FO) to reason about sequences of database instances appearing along a run. Unsurprisingly, the linear-time model checking problem of (DMS) against (MSO-FO) is undecidable. Towards decidability, we propose under-approximate model checking of (DMS), where the under-approximation parameter is the "bound on recency". In a k-recency-bounded run, only the most recent k elements in the current active domain may be modified by an action. More runs can be verified by increasing the bound on recency. Our main result shows that recency-bounded model checking of (DMS) against (MSO-FO) is decidable, by a reduction to the satisfiability problem of MSO over nested words.