Visible to the public Biblio

Found 112 results

Filters: Keyword is pubcrawl170201  [Clear All Filters]
2017-03-07
Mittal, Gaurav, Yagnik, Kaushal B., Garg, Mohit, Krishnan, Narayanan C..  2016.  SpotGarbage: Smartphone App to Detect Garbage Using Deep Learning. Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. :940–945.

Maintaining a clean and hygienic civic environment is an indispensable yet formidable task, especially in developing countries. With the aim of engaging citizens to track and report on their neighborhoods, this paper presents a novel smartphone app, called SpotGarbage, which detects and coarsely segments garbage regions in a user-clicked geo-tagged image. The app utilizes the proposed deep architecture of fully convolutional networks for detecting garbage in images. The model has been trained on a newly introduced Garbage In Images (GINI) dataset, achieving a mean accuracy of 87.69%. The paper also proposes optimizations in the network architecture resulting in a reduction of 87.9% in memory usage and 96.8% in prediction time with no loss in accuracy, facilitating its usage in resource constrained smartphones.

Queiroz, Rodrigo, Berger, Thorsten, Czarnecki, Krzysztof.  2016.  Towards Predicting Feature Defects in Software Product Lines. Proceedings of the 7th International Workshop on Feature-Oriented Software Development. :58–62.

Defect-prediction techniques can enhance the quality assurance activities for software systems. For instance, they can be used to predict bugs in source files or functions. In the context of a software product line, such techniques could ideally be used for predicting defects in features or combinations of features, which would allow developers to focus quality assurance on the error-prone ones. In this preliminary case study, we investigate how defect prediction models can be used to identify defective features using machine-learning techniques. We adapt process metrics and evaluate and compare three classifiers using an open-source product line. Our results show that the technique can be effective. Our best scenario achieves an accuracy of 73 % for accurately predicting features as defective or clean using a Naive Bayes classifier. Based on the results we discuss directions for future work.

Mittelbach, Frank.  2016.  A General Framework for Globally Optimized Pagination. Proceedings of the 2016 ACM Symposium on Document Engineering. :11–20.

Pagination problems deal with questions around transforming a source text stream into a formatted document by dividing it up into individual columns and pages, including adding auxiliary elements that have some relationship to the source stream data but may allow a certain amount of variation in placement (such as figures or footnotes). Traditionally the pagination problem has been approached by separating it into one of micro-typography (e.g., breaking text into paragraphs, also known as h&j) and one of macro-typography (e.g., taking a galley of already formatted paragraphs and breaking them into columns and pages) without much interaction between the two. While early solutions for both problem spaces used simple greedy algorithms, Knuth and Plass introduced in the '80s a global-fit algorithm for line breaking that optimizes the breaks across the whole paragraph [1]. This algorithm was implemented in TeX'82 [2] and has since kept its crown as the best available solution for this space. However, for macro-typography there has been no (successful) attempt to provide globally optimized page layout: all systems to date (including TeX) use greedy algorithms for pagination. Various problems in this area have been researched (e.g., [3,4,5,6]) and the literature documents some prototype development. But none of these prototypes have been made widely available to the research community or ever made it into a generally usable and publicly available system. This paper presents a framework for a global-fit algorithm for page breaking based on the ideas of Knuth/Plass. It is implemented in such a way that it is directly usable without additional executables with any modern TeX installation. It therefore can serve as a test bed for future experiments and extensions in this space. At the same time a cleaned-up version of the current prototype has the potential to become a production tool for the huge number of TeX users world-wide. The paper also discusses two already implemented extensions that increase the flexibility of the pagination process: the ability to automatically consider existing flexibility in paragraph length (by considering paragraph variations with different numbers of lines [7]) and the concept of running the columns on a double spread a line long or short. It concludes with a discussion of the overall approach, its inherent limitations and directions for future research. [1] D. E. Knuth and M. F. Plass. Breaking Paragraphs into Lines. Software-Practice and Experience, 11(11):1119-1184, Nov. 1981. [2] D. E. Knuth. TeX: The Program, volume B of Computers and Typesetting. Addison-Wesley, Reading, MA, USA, 1986. [3] A. Brüggemann-Klein, R. Klein, and S. Wohlfeil. Computer science in perspective. Chapter On the Pagination of Complex Documents, pages 49-68. Springer-Verlag New York, Inc., New York, NY, USA, 2003. [4] C. Jacobs, W. Li, and D. H. Salesin. Adaptive document layout via manifold content. In Second International Workshop on Web Document Analysis (wda2003), Liverpool, UK, 2003, 2003. [5] A. Holkner. Global multiple objective line breaking. Master's thesis, School of Computer Science and Information Technology, RMIT University, Melbourne, Victoria, Australia, 2006. [6] P. Ciancarini, A. Di Iorio, L. Furini, and F. Vitali. High-quality pagination for publishing. Software-Practice and Experience, 42(6):733-751, June 2012. [7] T. Hassan and A. Hunter. Knuth-Plass revisited: Flexible line-breaking for automatic document layout. In Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng '15, pages 17-20, New York, NY, USA, 2015.

Hsu, Justin, Morgenstern, Jamie, Rogers, Ryan, Roth, Aaron, Vohra, Rakesh.  2016.  Do Prices Coordinate Markets? Proceedings of the Forty-eighth Annual ACM Symposium on Theory of Computing. :440–453.

Walrasian equilibrium prices have a remarkable property: they allow each buyer to purchase a bundle of goods that she finds the most desirable, while guaranteeing that the induced allocation over all buyers will globally maximize social welfare. However, this clean story has two caveats. * First, the prices may induce indifferences. In fact, the minimal equilibrium prices necessarily induce indifferences. Accordingly, buyers may need to coordinate with one another to arrive at a socially optimal outcome—the prices alone are not sufficient to coordinate the market. * Second, although natural procedures converge to Walrasian equilibrium prices on a fixed population, in practice buyers typically observe prices without participating in a price computation process. These prices cannot be perfect Walrasian equilibrium prices, but instead somehow reflect distributional information about the market. To better understand the performance of Walrasian prices when facing these two problems, we give two results. First, we propose a mild genericity condition on valuations under which the minimal Walrasian equilibrium prices induce allocations which result in low over-demand, no matter how the buyers break ties. In fact, under genericity the over-demand of any good can be bounded by 1, which is the best possible at the minimal prices. We demonstrate our results for unit demand valuations and give an extension to matroid based valuations (MBV), conjectured to be equivalent to gross substitute valuations (GS). Second, we use techniques from learning theory to argue that the over-demand and welfare induced by a price vector converge to their expectations uniformly over the class of all price vectors, with respective sample complexity linear and quadratic in the number of goods in the market. These results make no assumption on the form of the valuation functions. These two results imply that under a mild genericity condition, the exact Walrasian equilibrium prices computed in a market are guaranteed to induce both low over-demand and high welfare when used in a new market where agents are sampled independently from the same distribution, whenever the number of agents is larger than the number of commodities in the market.

Gustafson, Stephen, Arumugam, Hemagiri, Kanyuk, Paul, Lorenzen, Michael.  2016.  MURE: Fast Agent Based Crowd Simulation for VFX and Animation. ACM SIGGRAPH 2016 Talks. :56:1–56:2.

Crowd simulation in visual effects and animation is a field where creativity is often bound by the scalability of its tools. High end animation systems like Autodesk Maya [Autodesk ] are tailored for scenes with at most tens of characters, whereas more scaleable VFX packages like SideFX's Houdini [SideFX] can lack the directability required by character animation. We present a suite of technologies built around Houdini that vastly improves both its scalability and directability for agent based crowd simulation. Dubbed MURE (Japanese for "crowd"), this system employs a new VEX context with lock-free, multithreaded KD-Tree construction/look-up, a procedural finite state machine for massive animation libraries, a suite of VEX nodes for fuzzy logic, and a fast GPU drawing plugin built upon the open source USD (Universal Scene Description) library [Pixar Animation Studios ]. MURE has proven its success on two feature films, The Good Dinosaur, and Finding Dory, with crowd spectacles including flocks of birds, swarms of fireflies, automobile traffic, and schools of fish. Pixar has a history with agent based crowd simulation using a custom Massive [Massive Software] based pipeline, first developed on Ratatouille [Ryu and Kanyuk 2007], and subsequently used on Wall-E, Up, and Cars 2. A re-write of the studio's proprietary animation software, Presto, deprecated this crowd pipeline. The crowds team on Brave and Monster's University replaced it with a new system for "non-simulated" crowds that sequenced geometry caches [Kanyuk et al. 2012] via finite state machines and sketch based tools [Arumugam et al. 2013]. However, the story reels for The Good Dinosaur called for large crowds with such complex inter-agent and environment interaction that simulated crowds were necessary. This creative need afforded Pixar's crowd team the opportunity of evaluate the pros and cons of our former agent based simulation pipeline and weigh which features would be part of its successor. Fuzzy logic brains and customizable navigation were indispensable, but our practice of approximating hero quality rigs with simulatable equivalents was fraught with problems. Creating the mappings was labor intensive, lossy, and even when mostly correct, animators found the synthesized animation splines so foreign that many would start from scratch rather than build upon a crowd simulation. The avoid this pitfall, we instead opted to start building our new pipeline around pre-cached clips of animation and thus always be able to deliver crowd animators clean splines. This reliance on caches also affords tremendous opportunities for interactivity at massive scales. Thus, rather than focusing on rigging/posing, the goals of our new system, MURE, became interactivity and directability.

Riofrío-Calderón, Gioconda, Ramírez-Montoya, María-Soledad, Rodríguez-Conde, María-José.  2016.  Mediation Practices for Learning in MOOC Courses to Promote Open Innovation. Proceedings of the Fourth International Conference on Technological Ecosystems for Enhancing Multiculturality. :1167–1170.

With the increase of the training offered in online environments over the last decade, one of the key elements in these training events is mediation as an enabler best results approval and learning achievements. In this context the role of mediation is key, especially when there is no teacher or tutor figure that is in contact with the participants. The aim of this research, that is derived from a doctoral program, is to analyze the practices of mediation in massive open online courses (MOOC) in the areas of energy sustainability and propose a mediation model that considers the open innovation as a key element. In this way answers the research question regarding the relationship between the practices of teaching and learning technological mediation with MOOC courses. The research is the result of cooperation between government institutions in Mexico such as CONACYT and SENER with the Technological Institute of Monterrey, the latter by offering 10 massive courses on the subject of energy sustainability, whose course name in spanish is: "Generación de energías y limpias y energías convencionales" [Generation of clean energy and conventional energy], and its participantes constitute the research population, who, through various instruments of qualitative and quantitative, will be consulted regarding mediation and its relationship with learning. The results of this study allow thus to answer the research question, in addition to the generation of proposals in the field of mediation that promote open innovation. It is expected that the results represent a contribution in the sense of determining the relationship mediation - outcomes of learning in MOOC courses.

Hsu, Kai-Cheng, Lin, Kate Ching-Ju, Wei, Hung-Yu.  2016.  Full-duplex Delay-and-forward Relaying. Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing. :221–230.

A full-duplex radio can transmit and receive simultaneously, and, hence, is a natural fit for realizing an in-band relay system. Most of existing full-duplex relay designs, however, simply forward an amplified version of the received signal without decoding it, and, thereby, also amplify the noise at the relay, offsetting throughput gains of full-duplex relaying. To overcome this issue, we explore an alternative: demodulate-and-forward. This paper presents the design and implementation of DelayForward (DF), a practical system that fully extracts the relay gains of full-duplex demodulate-and-forward mechanism. DF allows a relay to remove its noise from the signal it receives via demodulation and forward the clean signal to destination with a small delay. While such delay-and-forward mechanism avoids forwarding the noise at the relay, the half-duplex destination, however, now receives the combination of the direct signal from a source and the delayed signal from a relay. Unlike previous theoretical work, which mainly focuses on deriving the capacity of demodulate-and-forward relaying, we observe that such combined signals have a structure similar to the convolutional code, and, hence, propose a novel viterbi-type decoder to recover data from those combined signals in practice. Another challenge is that the performance of full-duplex relay is inherently bounded by the minimum of the relay's SNR and the destination's SNR. To break this limitation, we further develop a power allocation scheme to optimize the capacity of DF. We have built a prototype of DF using USRP software radios. Experimental results show that our power-adaptive DF delivers the throughput gain of 1.25×, on average, over the state-of-the-art full-duplex relay design. The gain is as high as 2.03× for the more challenged clients.

Nanda, Mangala Gowri, Arun-Kumar, S..  2016.  Decompiling Boolean Expressions from Java™ Bytecode. Proceedings of the 9th India Software Engineering Conference. :59–69.

Java bytecode obfuscates the original structure of a Java expression in the source code. So a simple expression such as (c1 {\textbackslash}textbar{\textbackslash}textbar c2) or (c1 && c2) may be captured in the bytecode in 4 different ways (as shown in the paper). And correspondingly, when we reconvert the bytecode back into Java source code, there are four different ways this may happen. Further, although gotos are not permitted in the Java source code, the bytecode is full of gotos. If you were to blindly convert the bytecode into Java source code, then you would replace a goto by a labeled break. A labeled break has the advantage that it only allows you to break out of a block structure and (unlike a setjump) does not permit you to jump arbitrarily into a block structure. So while the data structures used in the regenerated Java source code are still relatively "clean" arbitrary usage of labeled breaks makes for unreadable code (as we show in the paper). And this can be a point of concern, since decompilation is generally related to debugging code. Instead of dumping arbitrary labeled breaks, we try to reconstruct the original expression, in terms of && and {\textbackslash}textbar{\textbackslash}textbar clauses as well as ternary operators "?:" (c0 ? c1 : c2); Thus our goal is quite simply to regenerate, without using goto or labeled breaks, the expressions as close to the original as possible (it is not possible to guarantee an exact match). In this paper we explain what is the state of the art in Java decompilers for decoding complex expressions. Then we will present our solution. We have implemented the algorithms described here in this paper and give you our experience with it.

Pevny, Tomas, Somol, Petr.  2016.  Discriminative Models for Multi-instance Problems with Tree Structure. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :83–91.

Modelling network traffic is gaining importance to counter modern security threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as a prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in the model training phase. We propose a discriminative model that makes decisions based on a computer's all traffic observed during a predefined time window (5 minutes in our case). The model is trained on traffic samples collected over equally-sized time windows for a large number of computers, where the only labels needed are (human) verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training, the model itself learns discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, and demonstrate that the learned traffic patterns can be interpreted as Indicators of Compromise. We implement the discriminative model as a neural network with special structure reflecting two stacked multi instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) that are typically visited by infected computers.

Liang, Jiongqian, Parthasarathy, Srinivasan.  2016.  Robust Contextual Outlier Detection: Where Context Meets Sparsity. Proceedings of the 25th ACM International on Conference on Information and Knowledge Management. :2167–2172.

Outlier detection is a fundamental data science task with applications ranging from data cleaning to network security. Recently, a new class of outlier detection algorithms has emerged, called contextual outlier detection, and has shown improved performance when studying anomalous behavior in a specific context. However, as we point out in this article, such approaches have limited applicability in situations where the context is sparse (i.e., lacking a suitable frame of reference). Moreover, approaches developed to date do not scale to large datasets. To address these problems, here we propose a novel and robust approach alternative to the state-of-the-art called RObust Contextual Outlier Detection (ROCOD). We utilize a local and global behavioral model based on the relevant contexts, which is then integrated in a natural and robust fashion. We run ROCOD on both synthetic and real-world datasets and demonstrate that it outperforms other competitive baselines on the axes of efficacy and efficiency. We also drill down and perform a fine-grained analysis to shed light on the rationale for the performance gains of ROCOD and reveal its effectiveness when handling objects with sparse contexts.

Pinsenschaum, Richard, Neff, Flaithri.  2016.  Evaluating Gesture Characteristics When Using a Bluetooth Handheld Music Controller. Proceedings of the Audio Mostly 2016. :209–214.

This paper describes a study that investigates tilt-gesture depth on a Bluetooth handheld music controller for activating and deactivating music loops. Making use of a Wii Remote's 3-axis ADXL330 accelerometer, a Max patch was programmed to receive, handle, and store incoming accelerometer data. Each loop corresponded to the front, back, left and right tilt-gesture direction, with each gesture motion triggering a loop 'On' or 'Off' depending on its playback status. The study comprised 40 undergraduate students interacting with the prototype controller for a duration of 5 minutes per person. Each participant performed three full cycles beginning with the front gesture direction and moving clockwise. This corresponded to a total of 24 trigger motions per participant. Raw data associated with tilt-gesture motion depth was scaled, analyzed and graphed. Results show significant differences between each gesture direction in terms of tilt-gesture depth, as well as issues with noise for left/right gesture motion due to dependency on Roll and Yaw values. Front and Left tilt-gesture depths displayed significantly higher threshold levels compared to the Back and Right axes. Front and Left tilt-gesture thresholds therefore allow the device to easily differentiate between intentional sample triggering and general device handling, while this is more difficult for Back and Left directions. Future work will include finding an alternative method for evaluating intentional tilt-gesture triggering on the Back and Left axes, as well as utilizing two 2-axis accelerometers to garner clean data from the Left and Right axes.

Huang, Muhuan, Wu, Di, Yu, Cody Hao, Fang, Zhenman, Interlandi, Matteo, Condie, Tyson, Cong, Jason.  2016.  Programming and Runtime Support to Blaze FPGA Accelerator Deployment at Datacenter Scale. Proceedings of the Seventh ACM Symposium on Cloud Computing. :456–469.

With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft's FPGA deployment in its Bing search engine and Intel's 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems—like Apache Spark and Hadoop—to access the performance and energy benefits of FPGA accelerators. In this paper we design and implement Blaze to provide programming and runtime support for enabling easy and efficient deployments of FPGA accelerators in datacenters. In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. Our Blaze runtime implements an FaaS framework to efficiently share FPGA accelerators among multiple heterogeneous threads on a single node, and extends Hadoop YARN with accelerator-centric scheduling to efficiently share them among multiple computing tasks in the cluster. Experimental results using four representative big data applications demonstrate that Blaze greatly reduces the programming efforts to access FPGA accelerators in systems like Apache Spark and YARN, and improves the system throughput by 1.7× to 3× (and energy efficiency by 1.5× to 2.7×) compared to a conventional CPU-only cluster.

Backes, Michael, Bugiel, Sven, Huang, Jie, Schranz, Oliver.  2016.  POSTER: The ART of App Compartmentalization. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :1811–1813.

On Android, advertising libraries are commonly integrated with their host apps. Since the host and advertising components share the application's sandbox, advertisement code inherits all permissions and can access host resources with no further approval needed. Motivated by the privacy risks of advertisement libraries as already shown in the literature, this poster introduces an Android Runtime (ART) based app compartmentalization mechanism to achieve separation between trusted app code and untrusted library code without system modification and application rewriting. With our approach, advertising libraries will be isolated from the host app and the original app will be partitioned into two sub-apps that run independently, with the host app's resources and permissions being protected by Android's app sandboxing mechanism. ARTist [1], a compiler-based Android app instrumentation framework, is utilized here to recreate the communication channels between host and advertisement library. The result is a robust toolchain on device which provides a clean separation of developer-written app code and third-party advertisement code, allowing for finer-grained access control policies and information flow control without OS customization and application rebuilding.

Subramanya, Supreeth, Mustafa, Zain, Irwin, David, Shenoy, Prashant.  2016.  Beyond Energy-Efficiency: Evaluating Green Datacenter Applications for Energy-Agility. Proceedings of the 7th ACM/SPEC on International Conference on Performance Engineering. :185–196.

Computing researchers have long focused on improving energy-efficiency under the implicit assumption that all energy is created equal. Yet, this assumption is actually incorrect: energy's cost and carbon footprint vary substantially over time. As a result, consuming energy inefficiently when it is cheap and clean may sometimes be preferable to consuming it efficiently when it is expensive and dirty. Green datacenters adapt their energy usage to optimize for such variations, as reflected in changing electricity prices or renewable energy output. Thus, we introduce energy-agility as a new metric to evaluate green datacenter applications. To illustrate fundamental tradeoffs in energy-agile design, we develop GreenSort, a distributed sorting system optimized for energy-agility. GreenSort is representative of the long-running, massively-parallel, data-intensive tasks that are common in datacenters and amenable to delays from power variations. Our results demonstrate the importance of energy-agile design when considering the benefits of using variable power. For example, we show that GreenSort requires 31% more time and energy to complete when power varies based on real-time electricity prices versus when it is constant. Thus, in this case, real-time prices should be at least 31% lower than fixed prices to warrant using them.

Parshakova, Tetiana, Cho, Minjoo, Cassinelli, Alvaro, Saakes, Daniel.  2016.  Ratchair: Furniture Learns to Move Itself with Vibration. ACM SIGGRAPH 2016 Emerging Technologies. :19:1–19:2.

An Egyptian statue on display at the Manchester Museum mysteriously spins on its axis every day; it is eventually discovered that this is due to anisotropic friction forces, and that the motile power comes from imperceptible mechanical waves caused by visitors' footsteps and nearby traffic. This phenomena involves microscopic ratchets, and is pervasive in the microscopic world - this is basically how muscles contract. It was the source of inspiration to think about everyday objects that move by harvesting external vibration rather than using mechanical traction and steering wheels. We propose here a strategy for displacing objects by attaching relatively small vibration sources. After learning how several random bursts of vibration affect its pose, an optimization algorithm discovers the optimal sequence of vibration patterns required to (slowly but surely) move the object to a very different specified position. We describe and demonstrate two application scenarios, namely assisted transportation of heavy objects with little effort on the part of the human and self arranging furniture, useful for instance to clean classrooms or restaurants during vacant hours.

Wang, Ju, Jiang, Hongbo, Xiong, Jie, Jamieson, Kyle, Chen, Xiaojiang, Fang, Dingyi, Xie, Binbin.  2016.  LiFS: Low Human-effort, Device-free Localization with Fine-grained Subcarrier Information. Proceedings of the 22Nd Annual International Conference on Mobile Computing and Networking. :243–256.

Device-free localization of people and objects indoors not equipped with radios is playing a critical role in many emerging applications. This paper presents an accurate model-based device-free localization system LiFS, implemented on cheap commercial off-the-shelf (COTS) Wi-Fi devices. Unlike previous COTS device-based work, LiFS is able to localize a target accurately without offline training. The basic idea is simple: channel state information (CSI) is sensitive to a target's location and by modelling the CSI measurements of multiple wireless links as a set of power fading based equations, the target location can be determined. However, due to rich multipath propagation indoors, the received signal strength (RSS) or even the fine-grained CSI can not be easily modelled. We observe that even in a rich multipath environment, not all subcarriers are affected equally by multipath reflections. Our pre-processing scheme tries to identify the subcarriers not affected by multipath. Thus, CSIs on the "clean" subcarriers can be utilized for accurate localization. We design, implement and evaluate LiFS with extensive experiments in three different environments. Without knowing the majority transceivers' locations, LiFS achieves a median accuracy of 0.5 m and 1.1 m in line-of-sight (LoS) and non-line-of-sight (NLoS) scenarios respectively, outperforming the state-of-the-art systems. Besides single target localization, LiFS is able to differentiate two sparsely-located targets and localize each of them at a high accuracy.

Thibodeau, David, Cave, Andrew, Pientka, Brigitte.  2016.  Indexed Codata Types. Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming. :351–363.

Indexed data types allow us to specify and verify many interesting invariants about finite data in a general purpose programming language. In this paper we investigate the dual idea: indexed codata types, which allow us to describe data-dependencies about infinite data structures. Unlike finite data which is defined by constructors, we define infinite data by observations. Dual to pattern matching on indexed data which may refine the type indices, we define copattern matching on indexed codata where type indices guard observations we can make. Our key technical contributions are three-fold: first, we extend Levy's call-by-push value language with support for indexed (co)data and deep (co)pattern matching; second, we provide a clean foundation for dependent (co)pattern matching using equality constraints; third, we describe a small-step semantics using a continuation-based abstract machine, define coverage for indexed (co)patterns, and prove type safety. This is an important step towards building a foundation where (co)data type definitions and dependent types can coexist.

Muthusamy, Vinod, Slominski, Aleksander, Ishakian, Vatche, Khalaf, Rania, Reason, Johnathan, Rozsnyai, Szabolcs.  2016.  Lessons Learned Using a Process Mining Approach to Analyze Events from Distributed Applications. Proceedings of the 10th ACM International Conference on Distributed and Event-based Systems. :199–204.

The execution of distributed applications are captured by the events generated by the individual components. However, understanding the behavior of these applications from their event logs can be a complex and error prone task, compounded by the fact that applications continuously change rendering any knowledge obsolete. We describe our experiences applying a suite of process-aware analytic tools to a number of real world scenarios, and distill our lessons learned. For example, we have seen that these tools are used iteratively, where insights gained at one stage inform the configuration decisions made at an earlier stage. As well, we have observed that data onboarding, where the raw data is cleaned and transformed, is the most critical stage in the pipeline and requires the most manual effort and domain knowledge. In particular, missing, inconsistent, and low-resolution event time stamps are recurring problems that require better solutions. The experiences and insights presented here will assist practitioners applying process analytic tools to real scenarios, and reveal to researchers some of the more pressing challenges in this space.

Lofgren, Peter, Banerjee, Siddhartha, Goel, Ashish.  2016.  Personalized PageRank Estimation and Search: A Bidirectional Approach. Proceedings of the Ninth ACM International Conference on Web Search and Data Mining. :163–172.

We present new algorithms for Personalized PageRank estimation and Personalized PageRank search. First, for the problem of estimating Personalized PageRank (PPR) from a source distribution to a target node, we present a new bidirectional estimator with simple yet strong guarantees on correctness and performance, and 3x to 8x speedup over existing estimators in experiments on a diverse set of networks. Moreover, it has a clean algebraic structure which enables it to be used as a primitive for the Personalized PageRank Search problem: Given a network like Facebook, a query like "people named John," and a searching user, return the top nodes in the network ranked by PPR from the perspective of the searching user. Previous solutions either score all nodes or score candidate nodes one at a time, which is prohibitively slow for large candidate sets. We develop a new algorithm based on our bidirectional PPR estimator which identifies the most relevant results by sampling candidates based on their PPR; this is the first solution to PPR search that can find the best results without iterating through the set of all candidate results. Finally, by combining PPR sampling with sequential PPR estimation and Monte Carlo, we develop practical algorithms for PPR search, and we show via experiments that our algorithms are efficient on networks with billions of edges.

Ruan, Wenjie, Sheng, Quan Z., Yang, Lei, Gu, Tao, Xu, Peipei, Shangguan, Longfei.  2016.  AudioGest: Enabling Fine-grained Hand Gesture Detection by Decoding Echo Signal. Proceedings of the 2016 {ACM} {International} {Joint} {Conference} on {Pervasive} and {Ubiquitous} {Computing}. :474–485.
Hand gesture is becoming an increasingly popular means of interacting with consumer electronic devices, such as mobile phones, tablets and laptops. In this paper, we present AudioGest, a device-free gesture recognition system that can accurately sense the hand in-air movement around user's devices. Compared to the state-of-the-art, AudioGest is superior in using only one pair of built-in speaker and microphone, without any extra hardware or infrastructure support and with no training, to achieve fine-grained hand detection. Our system is able to accurately recognize various hand gestures, estimate the hand in-air time, as well as average moving speed and waving range. We achieve this by transforming the device into an active sonar system that transmits inaudible audio signal and decodes the echoes of hand at its microphone. We address various challenges including cleaning the noisy reflected sound signal, interpreting the echo spectrogram into hand gestures, decoding the Doppler frequency shifts into the hand waving speed and range, as well as being robust to the environmental motion and signal drifting. We implement the proof-of-concept prototype in three different electronic devices and extensively evaluate the system in four real-world scenarios using 3,900 hand gestures that collected by five users for more than two weeks. Our results show that AudioGest can detect six hand gestures with an accuracy up to 96%, and by distinguishing the gesture attributions, it can provide up to 162 control commands for various applications.
Urbanowicz, Ryan J., Olson, Randal S., Moore, Jason H..  2016.  Pareto Inspired Multi-objective Rule Fitness for Adaptive Rule-based Machine Learning. Proceedings of the 2016 on Genetic and Evolutionary Computation Conference Companion. :1403–1403.

Learning classifier systems (LCSs) are rule-based evolutionary algorithms uniquely suited to classification and data mining in complex, multi-factorial, and heterogeneous problems. LCS rule fitness is commonly based on accuracy, but this metric alone is not ideal for assessing global rule `value' in noisy problem domains, and thus impedes effective knowledge extraction. Multi-objective fitness functions are promising but rely on knowledge of how to weigh objective importance. Prior knowledge would be unavailable in most real-world problems. The Pareto-front concept offers a strategy for multi-objective machine learning that is agnostic to objective importance. We propose a Pareto-inspired multi-objective rule fitness (PIMORF) for LCS, and combine it with a complimentary rule-compaction approach (SRC). We implemented these strategies in ExSTraCS, a successful supervised LCS and evaluated performance over an array of complex simulated noisy and clean problems (i.e. genetic and multiplexer) that each concurrently model pure interaction effects and heterogeneity. While evaluation over multiple performance metrics yielded mixed results, this work represents an important first step towards efficiently learning complex problem spaces without the advantage of prior problem knowledge. Overall the results suggest that PIMORF paired with SRC improved rule set interpretability, particularly with regard to heterogeneous patterns.

Purohit, Suchit S., Bothale, Vinod M., Gandhi, Savita R..  2016.  Towards M-gov in Solid Waste Management Sector Using RFID, Integrated Technologies. Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies. :61:1–61:4.

Due to explosive increase in teledensity, penetration of mobile networks in urban as well as rural areas, m-governance in India is growing from infancy to a more mature shape. Various steps are taken by Indian government for offering citizen services through mobile platform hence offering smooth transition from web based e-gov services to more pervasive mobile based services. Municipalities and Municipal corporations in India are already providing m-gov services like property and professional tax transaction, Birth and death registration, Marriage registration, due of taxes and charges etc. through SMS alerts or via call centers. To the best of our knowledge no municipality offers mobile based services in Solid Waste management sector. This paper proposes an m-gov service implemented as Android mobile application for SWM department, AMC, Ahmadabad. The application operates on real time data collected from a fully automated Solid waste Collection process integrated using RFID, GPS, GIS and GPRS proposed in the preceding work by the authors. The mobile application facilitates citizens to interactively view the status of the cleaning process of their area file complaints in the case of failure and also can follow up the status of their complaints which could be handled by SWM officials using the same application. This application also facilitates SWM officials to observe, analyze the real time status of the collection process and generated reports.

Bell, Eamonn, Pugin, Laurent.  2016.  Approaches to Handwritten Conductor Annotation Extraction in Musical Scores. Proceedings of the 3rd International Workshop on Digital Libraries for Musicology. :33–36.

Conductor copies of musical scores are typically rich in handwritten annotations. Ongoing archival efforts to digitize orchestral conductors' scores have made scanned copies of hundreds of these annotated scores available in digital formats. The extraction of handwritten annotations from digitized printed documents is a difficult task for computer vision, with most approaches focusing on the extraction of handwritten text. However, conductors' annotation practices provide us with at least two affordances, which make the task more tractable in the musical domain. First, many conductors opt to mark their scores using colored pencils, which contrast with the black and white print of sheet music. Consequently, we show promising results when using color separation techniques alone to recover handwritten annotations from conductors' scores. We also compare annotated scores to unannotated copies and use a printed sheet music comparison tool to recover handwritten annotations as additions to the clean copy. We then investigate the use of both of these techniques in a combined method, which improves the results of the color separation technique. These techniques are demonstrated using a sample of orchestral scores annotated by professional conductors of the New York Philharmonic. Handwritten annotation extraction in musical scores has applications to the systematic investigation of score annotation practices by performers, annotator attribution, and to the interactive presentation of annotated scores, which we briefly discuss.

Schear, Nabil, Cable, II, Patrick T., Moyer, Thomas M., Richard, Bryan, Rudd, Robert.  2016.  Bootstrapping and Maintaining Trust in the Cloud. Proceedings of the 32Nd Annual Conference on Computer Security Applications. :65–77.

Today's infrastructure as a service (IaaS) cloud environments rely upon full trust in the provider to secure applications and data. Cloud providers do not offer the ability to create hardware-rooted cryptographic identities for IaaS cloud resources or sufficient information to verify the integrity of systems. Trusted computing protocols and hardware like the TPM have long promised a solution to this problem. However, these technologies have not seen broad adoption because of their complexity of implementation, low performance, and lack of compatibility with virtualized environments. In this paper we introduce keylime, a scalable trusted cloud key management system. keylime provides an end-to-end solution for both bootstrapping hardware rooted cryptographic identities for IaaS nodes and for system integrity monitoring of those nodes via periodic attestation. We support these functions in both bare-metal and virtualized IaaS environments using a virtual TPM. keylime provides a clean interface that allows higher level security services like disk encryption or configuration management to leverage trusted computing without being trusted computing aware. We show that our bootstrapping protocol can derive a key in less than two seconds, we can detect system integrity violations in as little as 110ms, and that keylime can scale to thousands of IaaS cloud nodes.

Gary, Saccá, Jaime, Moreno-Llorena.  2016.  Towards a Strategy for Recognition of Collective Emotions on Social Networking Sites. Proceedings of the XVII International Conference on Human Computer Interaction. :25:1–25:8.

Since the emergence of emotional theories and models, which explain individuals feelings and their emotional processes, diverse research areas have shown interest in studying these ideas in order to obtain relevant information about behavior, habits and preferences of people. However, there are some limitations on emotion recognition that have forced specialists to search ways to achieve it on particular cases. This article treats collective emotions recognition case focusing on social networking sites applying a particular strategy, as follow: Firstly, state of art investigation regard emotions representation models in individual and collectives. In addition, possible solutions are provided by computing areas regarding collective emotions problems. Secondly, a collective emotion strategy was designed where it was retrieved a collection of data from Twitter, in which some cleaning and processing steps were applied, in order to keep the expression as purest. Afterward, the collective emotion tagging step arrived, whither based on consensus theory approach, the majority tagged-feelings were grouped and recognized as collective emotions. Finally, prediction step was executed and resided on modeling collective data, wherein one part was supplied into the Machine Learning during training and the other one was served to test the machine accuracy. Thirdly, An evaluation was set to check the fit of the collective recognition strategy, where results obtained allow to place the proposed work in the right path as consequence of minor differences observed, that indicate higher precision according to the distances measures used during the study development.