Biblio
Abstract—The past ten years has seen increasing calls to make security research more “scientific”. On the surface, most agree that this is desirable, given universal recognition of “science” as a positive force. However, we find that there is little clarity on what “scientific” means in the context of computer security research, or consensus on what a “Science of Security” should look like. We selectively review work in the history and philosophy of science and more recent work under the label “Science of Security”. We explore what has been done under the theme of relating science and security, put this in context with historical science, and offer observations and insights we hope may motivate further exploration and guidance. Among our findings are that practices on which the rest of science has reached consensus appear little used or recognized in security, and a pattern of methodological errors continues unaddressed.
I think this may well prove to be the most significant paper on science of security published this year. - Carl Landwehr, 2012 National Cyber Security Hall of Fame Inductee
The safety-critical aspects of cyber-physical systems motivate the need for rigorous analysis of these systems. In the literature this work is often done using idealized models of systems where the analysis can be carried out using high-level reasoning techniques such as Lyapunov functions and model checking. In this paper we present VERIDRONE, a foundational framework for reasoning about cyber-physical systems at all levels from high-level models to C code that implements the system. VERIDRONE is a library within the Coq proof assistant enabling us to build on its foundational implementation, its interactive development environments, and its wealth of libraries capturing interesting theories ranging from real numbers and differential equations to verified compilers and floating point numbers. These features make proof assistants in general, and Coq in particular, a powerful platform for unifying foundational results about safety-critical systems and ensuring interesting properties at all levels of the stack.
The Google Identity Platform is a system that allows a user to sign in to applications and other services by using a Google account. Google Sign-In is one such method for providing one’s identity to the Google Identity Platform. Google Sign-In is available for Android applications and iOS applications, as well as for websites and other devices. Users of Google Sign-In find that it integrates well with the Android platform, but iOS users (iPhone, iPad, etc.) do not have the same experience. The user experience when logging in to a Google account on an iOS application can not only be more tedious than the Android experience, but it also conditions users to engage in behaviors that put the information in their Google accounts at risk.
The safety-critical aspects of cyber-physical systems motivate the need for rigorous analysis of these systems. In the literature this work is often done using idealized models of systems where the analysis can be carried out using high-level reasoning techniques such as Lyapunov functions and model checking. In this paper we present VERIDRONE, a foundational framework for reasoning about cyber-physical systems at all levels from high-level models to C code that implements the system. VERIDRONE is a library within the Coq proof assistant enabling us to build on its foundational implementation, its interactive development environments, and its wealth of libraries capturing interesting theories ranging from real numbers and differential equations to verified compilers and floating point numbers. These features make proof assistants in general, and Coq in particular, a powerful platform for unifying foundational results about safety-critical systems and ensuring interesting properties at all levels of the stack.
To help establish a more scientific basis for security science, which will enable the development of fundamental theories and move the field from being primarily reactive to primarily proactive, it is important for research results to be reported in a scientifically rigorous manner. Such reporting will allow for the standard pillars of science, namely replication, meta-analysis, and theory building. In this paper we aim to establish a baseline of the state of scientific work in security through the analysis of indicators of scientific research as reported in the papers from the 2015 IEEE Symposium on Security and Privacy. To conduct this analysis, we developed a series of rubrics to determine the completeness of the papers relative to the type of evaluation used (e.g. case study, experiment, proof). Our findings showed that while papers are generally easy to read, they often do not explicitly document some key information like the research objectives, the process for choosing the cases to include in the studies, and the threats to validity. We hope that this initial analysis will serve as a baseline against which we can measure the advancement of the science of security.
To help establish a more scientific basis for security science, which will enable the development of fundamental theories and move the field from being primarily reactive to primarily proactive, it is important for research results to be reported in a scientifically rigorous manner. Such reporting will allow for the standard pillars of science, namely replication, meta-analysis, and theory building. In this paper we aim to establish a baseline of the state of scientific work in security through the analysis of indicators of scientific research as reported in the papers from the 2015 IEEE Symposium on Security and Privacy. To conduct this analysis, we developed a series of rubrics to determine the completeness of the papers relative to the type of evaluation used (e.g. case study, experiment, proof). Our findings showed that while papers are generally easy to read, they often do not explicitly document some key information like the research objectives, the process for choosing the cases to include in the studies, and the threats to validity. We hope that this initial analysis will serve as a baseline against which we can measure the advancement of the science of security.
Information security management is time-consuming and error-prone. Apart from day-to-day operations, organizations need to comply with industrial regulations or government directives. Thus, organizations are looking for security tools to automate security management tasks and daily operations. Security Content Automation Protocol (SCAP) is a suite of specifications that help to automate security management tasks such as vulnerability measurement and policy compliance evaluation. SCAP benchmark provides detailed guidance on setting the security configuration of network devices, operating systems, and applications. Organizations can use SCAP benchmark to perform automated configuration compliance assessment on network devices, operating systems, and applications. This paper discusses SCAP benchmark components and the development of a SCAP benchmark for automating Cisco router security configuration compliance.
We summarize the optimization and performance evaluation of the Nonhydrostatic ICosahedral Atmospheric Model (NICAM) on two different types of supercomputers: the K computer and TSUBAME2.5. First, we evaluated and improved several kernels extracted from the model code on the K computer. We did not significantly change the loop and data ordering for sufficient usage of the features of the K computer, such as the hardware-aided thread barrier mechanism and the relatively high bandwidth of the memory, i.e., a 0.5 Byte/FLOP ratio. Loop optimizations and code cleaning for a reduction in memory transfer contributed to a speed-up of the model execution time. The sustained performance ratio of the main loop of the NICAM reached 0.87 PFLOPS with 81,920 nodes on the K computer. For GPU-based calculations, we applied OpenACC to the dynamical core of NICAM. The performance and scalability were evaluated using the TSUBAME2.5 supercomputer. We achieved good performance results, which showed efficient use of the memory throughput performance of the GPU as well as good weak scalability. A dry dynamical core experiment was carried out using 2560 GPUs, which achieved 60 TFLOPS of sustained performance.
This paper focuses on a practically very important problem of matching a real-world product photo to exactly the same item(s) in online shopping sites. The task is extremely challenging because the user photos (i.e., the queries in this scenario) are often captured in uncontrolled environments, while the product images in online shops are mostly taken by professionals with clean backgrounds and perfect lighting conditions. To tackle the problem, we study deep network architectures and training schemes, with the goal of learning a robust deep feature representation that is able to bridge the domain gap between the user photos and the online product images. Our contributions are two-fold. First, we propose an alternative of the popular contrastive loss used in siamese deep networks, namely robust contrastive loss, where we "relax" the penalty on positive pairs to alleviate over-fitting. Second, a multi-task fine-tuning approach is introduced to learn a better feature representation, which not only incorporates knowledge from the provided training photo pairs, but also explores additional information from the large ImageNet dataset to regularize the fine-tuning procedure. Experiments on two challenging real-world datasets demonstrate that both the robust contrastive loss and the multi-task fine-tuning approach are effective, leading to very promising results with a time cost suitable for real-time retrieval.
Compared to conventional general-purpose processors, accelerator-rich architectures (ARAs) can provide orders-of-magnitude performance and energy gains. In this paper we design and implement the ARAPrototyper to enable rapid design space explorations for ARAs in real silicons and reduce the tedious prototyping efforts. First, ARAPrototyper provides a reusable baseline prototype with a highly customizable memory system, including interconnect between accelerators and buffers, interconnect between buffers and last-level cache (LLC) or DRAM, coherency choice at LLC or DRAM, and address translation support. To provide more insights into performance analysis, ARAPrototyper adds several performance counters on the accelerator side and leverages existing performance counters on the CPU side. Second, ARAPrototyper provides a clean interface to quickly integrate a user?s own accelerators written in high-level synthesis (HLS) code. Then, an ARA prototype can be automatically generated and mapped to a Xilinx Zynq SoC. To quickly develop applications that run seamlessly on the ARA prototype, ARAPrototyper provides a system software stack and abstracts the accelerators as software libraries for application developers. Our results demonstrate that ARAPrototyper enables a wide range of design space explorations for ARAs at manageable prototyping efforts and 4,000 to 10,000X faster evaluation time than full-system simulations. We believe that ARAPrototyper can be an attractive alternative for ARA design and evaluation.
Automatic detection of TV advertisements is of paramount importance for various media monitoring agencies. Existing works in this domain have mostly focused on news channels using news specific features. Most commercial products use near copy detection algorithms instead of generic advertisement classification. A generic detector needs to handle inter-class and intra-class imbalances present in data due to variability in content aired across channels and frequent repetition of advertisements. Imbalances present in data make classifiers biased towards one of the classes and thus require special treatment. We propose to use tree of perceptrons to solve this problem. The training data available for each perceptron node is balanced using cluster based over-sampling and TOMEK link cleaning as we traverse the tree downwards. The trained perceptron node then passes the original unbalanced data to its children. This process is repeated recursively till we reach the leaf nodes. We call this new algorithm as "Progressively Balanced Perceptron Tree". We have also contributed a TV advertisements dataset consisting of 250 hours of videos recorded from five non-news TV channels of different genres. Experimentations on this dataset have shown that the proposed approach has comparatively superior and balanced performance with respect to six baseline methods. Our proposal generalizes well across channels, with varying training data sizes and achieved a top F1-score of 97% in detecting advertisements.
Smartphones nowadays are customized to help users with their daily tasks such as storing important data or making transactions through the internet. With the sensitivity of the data involved, authentication mechanism such as fixed-text password, PIN, or unlock patterns are used to safeguard these data against intruders. However, these mechanisms have the risk from security threats such as cracking or shoulder surfing. To enhance mobile and/or information security, this study aimed to develop a free-form handwriting gesture user authentication for smartphones. It also tried to discover the static and dynamic handwriting features that significantly influence the recognition of a legitimate user. The experiment was then conducted by asking thirty (30) individuals to draw or swipe using their fingertip their desired free-form security pattern ten (10) times. These patterns were then cleaned and processed, and extracted seven (7) static and eleven (11) dynamic handwriting features. By means of Neural Network classifier of the RapidMiner data mining tool, these features were used to develop, validate, and test a model for user authentication. The model showed a very promising recognition rate of 96.67%. The model is further tested through a prototype, and it still gave a very satisfactory result.
Staging is a program generation paradigm with a clean, well-investigated semantics which statically ensures that the generated code is always well-typed and well-scoped. Staging is often used for specializing programs to the known properties or parts of data to improve efficiency, but so far it has been limited to generating terms. This short paper describes our ongoing work on extending staging, with its strong safety guarantees, to generation of non-terms, focusing on ML-style modules. The purpose is to map out the promises and challenges, then to pose a question to solicit the community's expertise in evaluating how essential our extensions are for the purpose of applying staging beyond the realm of terms. We demonstrate our extensions' use in specializing functor applications to eliminate its (currently large) overhead in OCaml. We explain the challenges that those extensions bring in and identify a promising line of attack. Unexpectedly, however, it turns out that we can avoid module generation altogether by representing modules, possibly containing abstract types, as polymorphic records. With the help of first-class modules, module specialization reduces to ordinary term specialization, which can be done with conventional staging. The extent to which this hack generalizes is unclear. Thus we have a question to the community: is there a compelling use case for module generation? With these insights and questions, we offer a starting point for a long-term program in the next stage of staging research.
In Pixar's Finding Dory, we are introduced to a new character: Hank the Octopus. This is a very different character than Pixar has been asked to animate before. Our directors demanded both precise control and graceful, clean silhouettes. The reference artwork we were given showed complex curves between arms and body without any disjointed shapes or breaks in form. Video of Octopus in motion reveals an infinitely malleable creature capable of an enormous shape language. This art direction required a small group of TDs to create a control scheme that was sensible, flexible and with a new level of control in order for animators to bring Hank to life. We had to think deeply from the tips of the fingers all the way through how the tentacles connect to the mouth corners, and eye sockets. Each of this issues raised concerns around design, deformation and finally how the end user can manipulate such complexity effectively.
In this paper, we propose a novel Social Networking Service (SNS) for a regional community. The purpose of the SNS is to support and encourage people by making them aware beneficial social relations in the real world. The conventional SNSs can hardly deal with beneficial social relations, because they are implicit and dynamic. The proposed SNS is designed to provide positive information for two types of people: people who does community voluntary works, such as cleaning, as contributors, and people who receives benefit from them as beneficiary. This paper introduces the basic scheme based on the SNS for beneficial social relations, and evaluates the effectiveness of our scheme based on the result of the experimental studies. The experimental result shows the users of our SNS tend to consider the information about the voluntary works valuable if they have been performed in their living area, and it suggests that our proposed SNS system would work well in a regional community.
Record linkage refers to the task of finding same entity across different databases. We propose a machine learning based record linkage algorithm for financial entity databases. Record linkage on financial databases are essential for information integration on certain financial entity, since those databases do not have common unified identifier. Our algorithm works in two steps to determine if a pair of record is same entity or not. First we check with proposed rules if the record pair can be exactly matched after cleaning the entity name and address. Second, inspired by earlier work on author name disambiguation, we train a binary Random Forest classifier to decide the linkage. To reduce and scale the computation, this process is done only for candidate pairs within a proposed heuristic. Initial evaluation for precision, recall and F1 measures on two different linking tasks in the Financial Entity Identification and Information Integration (FEIII) Challenge show promising results.
Despite the growing promotion of the “open data” movement, the collection, cleaning, management, interpretation, and dissemination of open data is laborious and cost intensive, particularly for non-profits with limited resources. In this paper, we describe how non-profit organizations (NPOs) use open data, building on prior literature that focuses on understanding challenges that NPOs face. Based on 15 interviews of staff from 10 NPOs, our results suggest that NPOs use data to develop narratives to build a case for support from grantors and other stakeholders. We then present empirical results based on the usage of a data portal we created, which suggests that technologies should be designed to not only make data accessible, but also to facilitate communication and support relationships between expert data analysts and NPOs.
It is common practice for data scientists to acquire and integrate disparate data sources to achieve higher quality results. But even with a perfectly cleaned and merged data set, two fundamental questions remain: (1) is the integrated data set complete and (2) what is the impact of any unknown (i.e., unobserved) data on query results? In this work, we develop and analyze techniques to estimate the impact of the unknown data (a.k.a., unknown unknowns) on simple aggregate queries. The key idea is that the overlap between different data sources enables us to estimate the number and values of the missing data items. Our main techniques are parameter-free and do not assume prior knowledge about the distribution. Through a series of experiments, we show that estimating the impact of unknown unknowns is invaluable to better assess the results of aggregate queries over integrated data sources.
Open data is publicly available data that can be universally and readily accessed, used, and redistributed. Open data holds particular potential in the health and social sectors but, presently, health and social data are often published in a 'closed' format. There are different tools that allow to 'open' data, clean, structure and process them in order to elaborate them and build advanced services but, unfortunately, there is no single tool that can be used to perform all different tasks. We believe that the availability of Open Data in the health and social fields should be greatly increased and a way for creating new health and social services should be provided. In this paper, we present a framework that allows to create health and social Open Data starting from whatever is available on the web and to easily build advanced services based on those data.
Maintaining a clean and hygienic civic environment is an indispensable yet formidable task, especially in developing countries. With the aim of engaging citizens to track and report on their neighborhoods, this paper presents a novel smartphone app, called SpotGarbage, which detects and coarsely segments garbage regions in a user-clicked geo-tagged image. The app utilizes the proposed deep architecture of fully convolutional networks for detecting garbage in images. The model has been trained on a newly introduced Garbage In Images (GINI) dataset, achieving a mean accuracy of 87.69%. The paper also proposes optimizations in the network architecture resulting in a reduction of 87.9% in memory usage and 96.8% in prediction time with no loss in accuracy, facilitating its usage in resource constrained smartphones.
Defect-prediction techniques can enhance the quality assurance activities for software systems. For instance, they can be used to predict bugs in source files or functions. In the context of a software product line, such techniques could ideally be used for predicting defects in features or combinations of features, which would allow developers to focus quality assurance on the error-prone ones. In this preliminary case study, we investigate how defect prediction models can be used to identify defective features using machine-learning techniques. We adapt process metrics and evaluate and compare three classifiers using an open-source product line. Our results show that the technique can be effective. Our best scenario achieves an accuracy of 73 % for accurately predicting features as defective or clean using a Naive Bayes classifier. Based on the results we discuss directions for future work.
Pagination problems deal with questions around transforming a source text stream into a formatted document by dividing it up into individual columns and pages, including adding auxiliary elements that have some relationship to the source stream data but may allow a certain amount of variation in placement (such as figures or footnotes). Traditionally the pagination problem has been approached by separating it into one of micro-typography (e.g., breaking text into paragraphs, also known as h&j) and one of macro-typography (e.g., taking a galley of already formatted paragraphs and breaking them into columns and pages) without much interaction between the two. While early solutions for both problem spaces used simple greedy algorithms, Knuth and Plass introduced in the '80s a global-fit algorithm for line breaking that optimizes the breaks across the whole paragraph [1]. This algorithm was implemented in TeX'82 [2] and has since kept its crown as the best available solution for this space. However, for macro-typography there has been no (successful) attempt to provide globally optimized page layout: all systems to date (including TeX) use greedy algorithms for pagination. Various problems in this area have been researched (e.g., [3,4,5,6]) and the literature documents some prototype development. But none of these prototypes have been made widely available to the research community or ever made it into a generally usable and publicly available system. This paper presents a framework for a global-fit algorithm for page breaking based on the ideas of Knuth/Plass. It is implemented in such a way that it is directly usable without additional executables with any modern TeX installation. It therefore can serve as a test bed for future experiments and extensions in this space. At the same time a cleaned-up version of the current prototype has the potential to become a production tool for the huge number of TeX users world-wide. The paper also discusses two already implemented extensions that increase the flexibility of the pagination process: the ability to automatically consider existing flexibility in paragraph length (by considering paragraph variations with different numbers of lines [7]) and the concept of running the columns on a double spread a line long or short. It concludes with a discussion of the overall approach, its inherent limitations and directions for future research. [1] D. E. Knuth and M. F. Plass. Breaking Paragraphs into Lines. Software-Practice and Experience, 11(11):1119-1184, Nov. 1981. [2] D. E. Knuth. TeX: The Program, volume B of Computers and Typesetting. Addison-Wesley, Reading, MA, USA, 1986. [3] A. Brüggemann-Klein, R. Klein, and S. Wohlfeil. Computer science in perspective. Chapter On the Pagination of Complex Documents, pages 49-68. Springer-Verlag New York, Inc., New York, NY, USA, 2003. [4] C. Jacobs, W. Li, and D. H. Salesin. Adaptive document layout via manifold content. In Second International Workshop on Web Document Analysis (wda2003), Liverpool, UK, 2003, 2003. [5] A. Holkner. Global multiple objective line breaking. Master's thesis, School of Computer Science and Information Technology, RMIT University, Melbourne, Victoria, Australia, 2006. [6] P. Ciancarini, A. Di Iorio, L. Furini, and F. Vitali. High-quality pagination for publishing. Software-Practice and Experience, 42(6):733-751, June 2012. [7] T. Hassan and A. Hunter. Knuth-Plass revisited: Flexible line-breaking for automatic document layout. In Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng '15, pages 17-20, New York, NY, USA, 2015.
This paper describes a study that investigates tilt-gesture depth on a Bluetooth handheld music controller for activating and deactivating music loops. Making use of a Wii Remote's 3-axis ADXL330 accelerometer, a Max patch was programmed to receive, handle, and store incoming accelerometer data. Each loop corresponded to the front, back, left and right tilt-gesture direction, with each gesture motion triggering a loop 'On' or 'Off' depending on its playback status. The study comprised 40 undergraduate students interacting with the prototype controller for a duration of 5 minutes per person. Each participant performed three full cycles beginning with the front gesture direction and moving clockwise. This corresponded to a total of 24 trigger motions per participant. Raw data associated with tilt-gesture motion depth was scaled, analyzed and graphed. Results show significant differences between each gesture direction in terms of tilt-gesture depth, as well as issues with noise for left/right gesture motion due to dependency on Roll and Yaw values. Front and Left tilt-gesture depths displayed significantly higher threshold levels compared to the Back and Right axes. Front and Left tilt-gesture thresholds therefore allow the device to easily differentiate between intentional sample triggering and general device handling, while this is more difficult for Back and Left directions. Future work will include finding an alternative method for evaluating intentional tilt-gesture triggering on the Back and Left axes, as well as utilizing two 2-axis accelerometers to garner clean data from the Left and Right axes.
With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft's FPGA deployment in its Bing search engine and Intel's 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems—like Apache Spark and Hadoop—to access the performance and energy benefits of FPGA accelerators. In this paper we design and implement Blaze to provide programming and runtime support for enabling easy and efficient deployments of FPGA accelerators in datacenters. In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. Our Blaze runtime implements an FaaS framework to efficiently share FPGA accelerators among multiple heterogeneous threads on a single node, and extends Hadoop YARN with accelerator-centric scheduling to efficiently share them among multiple computing tasks in the cluster. Experimental results using four representative big data applications demonstrate that Blaze greatly reduces the programming efforts to access FPGA accelerators in systems like Apache Spark and YARN, and improves the system throughput by 1.7× to 3× (and energy efficiency by 1.5× to 2.7×) compared to a conventional CPU-only cluster.
Internet has been being becoming the most famous and biggest communication networks as social, industrial, and public infrastructure since Internet was invented at late 1960s. In a historical retrospect of Internet's evolution, the Internet architecture continues evolution repeatedly by going through various technical challenges, for instance, in early 1990s, Internet had encountered danger of scalability, after a short while it had been overcome and successfully evolved by applying emerging techniques such as CIDR, NAT, and IPv6. Especially this paper emphasizes scalability issues as technical challenges with forecasting that Internet of things era has come. Firstly, we describe the Identifier and locator separation scheme that can achieve dramatically architectural evolution in historical perspective. Additionally, it reviews various kinds of Identifier and locator separation scheme because recently the scheme can be the major design pillar towards future of Internet architecture such as both various clean-slated future Internet architectures and evolving Internet architectures. Lastly we show a result of analysis by analysis table for future of internet of everything where number of Internet connected devices will growth to more than 20 billion by 2020.