Biblio
Smartphones nowadays are customized to help users with their daily tasks such as storing important data or making transactions through the internet. With the sensitivity of the data involved, authentication mechanism such as fixed-text password, PIN, or unlock patterns are used to safeguard these data against intruders. However, these mechanisms have the risk from security threats such as cracking or shoulder surfing. To enhance mobile and/or information security, this study aimed to develop a free-form handwriting gesture user authentication for smartphones. It also tried to discover the static and dynamic handwriting features that significantly influence the recognition of a legitimate user. The experiment was then conducted by asking thirty (30) individuals to draw or swipe using their fingertip their desired free-form security pattern ten (10) times. These patterns were then cleaned and processed, and extracted seven (7) static and eleven (11) dynamic handwriting features. By means of Neural Network classifier of the RapidMiner data mining tool, these features were used to develop, validate, and test a model for user authentication. The model showed a very promising recognition rate of 96.67%. The model is further tested through a prototype, and it still gave a very satisfactory result.
Staging is a program generation paradigm with a clean, well-investigated semantics which statically ensures that the generated code is always well-typed and well-scoped. Staging is often used for specializing programs to the known properties or parts of data to improve efficiency, but so far it has been limited to generating terms. This short paper describes our ongoing work on extending staging, with its strong safety guarantees, to generation of non-terms, focusing on ML-style modules. The purpose is to map out the promises and challenges, then to pose a question to solicit the community's expertise in evaluating how essential our extensions are for the purpose of applying staging beyond the realm of terms. We demonstrate our extensions' use in specializing functor applications to eliminate its (currently large) overhead in OCaml. We explain the challenges that those extensions bring in and identify a promising line of attack. Unexpectedly, however, it turns out that we can avoid module generation altogether by representing modules, possibly containing abstract types, as polymorphic records. With the help of first-class modules, module specialization reduces to ordinary term specialization, which can be done with conventional staging. The extent to which this hack generalizes is unclear. Thus we have a question to the community: is there a compelling use case for module generation? With these insights and questions, we offer a starting point for a long-term program in the next stage of staging research.
In Pixar's Finding Dory, we are introduced to a new character: Hank the Octopus. This is a very different character than Pixar has been asked to animate before. Our directors demanded both precise control and graceful, clean silhouettes. The reference artwork we were given showed complex curves between arms and body without any disjointed shapes or breaks in form. Video of Octopus in motion reveals an infinitely malleable creature capable of an enormous shape language. This art direction required a small group of TDs to create a control scheme that was sensible, flexible and with a new level of control in order for animators to bring Hank to life. We had to think deeply from the tips of the fingers all the way through how the tentacles connect to the mouth corners, and eye sockets. Each of this issues raised concerns around design, deformation and finally how the end user can manipulate such complexity effectively.
In this paper, we propose a novel Social Networking Service (SNS) for a regional community. The purpose of the SNS is to support and encourage people by making them aware beneficial social relations in the real world. The conventional SNSs can hardly deal with beneficial social relations, because they are implicit and dynamic. The proposed SNS is designed to provide positive information for two types of people: people who does community voluntary works, such as cleaning, as contributors, and people who receives benefit from them as beneficiary. This paper introduces the basic scheme based on the SNS for beneficial social relations, and evaluates the effectiveness of our scheme based on the result of the experimental studies. The experimental result shows the users of our SNS tend to consider the information about the voluntary works valuable if they have been performed in their living area, and it suggests that our proposed SNS system would work well in a regional community.
Record linkage refers to the task of finding same entity across different databases. We propose a machine learning based record linkage algorithm for financial entity databases. Record linkage on financial databases are essential for information integration on certain financial entity, since those databases do not have common unified identifier. Our algorithm works in two steps to determine if a pair of record is same entity or not. First we check with proposed rules if the record pair can be exactly matched after cleaning the entity name and address. Second, inspired by earlier work on author name disambiguation, we train a binary Random Forest classifier to decide the linkage. To reduce and scale the computation, this process is done only for candidate pairs within a proposed heuristic. Initial evaluation for precision, recall and F1 measures on two different linking tasks in the Financial Entity Identification and Information Integration (FEIII) Challenge show promising results.
Despite the growing promotion of the “open data” movement, the collection, cleaning, management, interpretation, and dissemination of open data is laborious and cost intensive, particularly for non-profits with limited resources. In this paper, we describe how non-profit organizations (NPOs) use open data, building on prior literature that focuses on understanding challenges that NPOs face. Based on 15 interviews of staff from 10 NPOs, our results suggest that NPOs use data to develop narratives to build a case for support from grantors and other stakeholders. We then present empirical results based on the usage of a data portal we created, which suggests that technologies should be designed to not only make data accessible, but also to facilitate communication and support relationships between expert data analysts and NPOs.
It is common practice for data scientists to acquire and integrate disparate data sources to achieve higher quality results. But even with a perfectly cleaned and merged data set, two fundamental questions remain: (1) is the integrated data set complete and (2) what is the impact of any unknown (i.e., unobserved) data on query results? In this work, we develop and analyze techniques to estimate the impact of the unknown data (a.k.a., unknown unknowns) on simple aggregate queries. The key idea is that the overlap between different data sources enables us to estimate the number and values of the missing data items. Our main techniques are parameter-free and do not assume prior knowledge about the distribution. Through a series of experiments, we show that estimating the impact of unknown unknowns is invaluable to better assess the results of aggregate queries over integrated data sources.
Open data is publicly available data that can be universally and readily accessed, used, and redistributed. Open data holds particular potential in the health and social sectors but, presently, health and social data are often published in a 'closed' format. There are different tools that allow to 'open' data, clean, structure and process them in order to elaborate them and build advanced services but, unfortunately, there is no single tool that can be used to perform all different tasks. We believe that the availability of Open Data in the health and social fields should be greatly increased and a way for creating new health and social services should be provided. In this paper, we present a framework that allows to create health and social Open Data starting from whatever is available on the web and to easily build advanced services based on those data.
Maintaining a clean and hygienic civic environment is an indispensable yet formidable task, especially in developing countries. With the aim of engaging citizens to track and report on their neighborhoods, this paper presents a novel smartphone app, called SpotGarbage, which detects and coarsely segments garbage regions in a user-clicked geo-tagged image. The app utilizes the proposed deep architecture of fully convolutional networks for detecting garbage in images. The model has been trained on a newly introduced Garbage In Images (GINI) dataset, achieving a mean accuracy of 87.69%. The paper also proposes optimizations in the network architecture resulting in a reduction of 87.9% in memory usage and 96.8% in prediction time with no loss in accuracy, facilitating its usage in resource constrained smartphones.
Defect-prediction techniques can enhance the quality assurance activities for software systems. For instance, they can be used to predict bugs in source files or functions. In the context of a software product line, such techniques could ideally be used for predicting defects in features or combinations of features, which would allow developers to focus quality assurance on the error-prone ones. In this preliminary case study, we investigate how defect prediction models can be used to identify defective features using machine-learning techniques. We adapt process metrics and evaluate and compare three classifiers using an open-source product line. Our results show that the technique can be effective. Our best scenario achieves an accuracy of 73 % for accurately predicting features as defective or clean using a Naive Bayes classifier. Based on the results we discuss directions for future work.
Pagination problems deal with questions around transforming a source text stream into a formatted document by dividing it up into individual columns and pages, including adding auxiliary elements that have some relationship to the source stream data but may allow a certain amount of variation in placement (such as figures or footnotes). Traditionally the pagination problem has been approached by separating it into one of micro-typography (e.g., breaking text into paragraphs, also known as h&j) and one of macro-typography (e.g., taking a galley of already formatted paragraphs and breaking them into columns and pages) without much interaction between the two. While early solutions for both problem spaces used simple greedy algorithms, Knuth and Plass introduced in the '80s a global-fit algorithm for line breaking that optimizes the breaks across the whole paragraph [1]. This algorithm was implemented in TeX'82 [2] and has since kept its crown as the best available solution for this space. However, for macro-typography there has been no (successful) attempt to provide globally optimized page layout: all systems to date (including TeX) use greedy algorithms for pagination. Various problems in this area have been researched (e.g., [3,4,5,6]) and the literature documents some prototype development. But none of these prototypes have been made widely available to the research community or ever made it into a generally usable and publicly available system. This paper presents a framework for a global-fit algorithm for page breaking based on the ideas of Knuth/Plass. It is implemented in such a way that it is directly usable without additional executables with any modern TeX installation. It therefore can serve as a test bed for future experiments and extensions in this space. At the same time a cleaned-up version of the current prototype has the potential to become a production tool for the huge number of TeX users world-wide. The paper also discusses two already implemented extensions that increase the flexibility of the pagination process: the ability to automatically consider existing flexibility in paragraph length (by considering paragraph variations with different numbers of lines [7]) and the concept of running the columns on a double spread a line long or short. It concludes with a discussion of the overall approach, its inherent limitations and directions for future research. [1] D. E. Knuth and M. F. Plass. Breaking Paragraphs into Lines. Software-Practice and Experience, 11(11):1119-1184, Nov. 1981. [2] D. E. Knuth. TeX: The Program, volume B of Computers and Typesetting. Addison-Wesley, Reading, MA, USA, 1986. [3] A. Brüggemann-Klein, R. Klein, and S. Wohlfeil. Computer science in perspective. Chapter On the Pagination of Complex Documents, pages 49-68. Springer-Verlag New York, Inc., New York, NY, USA, 2003. [4] C. Jacobs, W. Li, and D. H. Salesin. Adaptive document layout via manifold content. In Second International Workshop on Web Document Analysis (wda2003), Liverpool, UK, 2003, 2003. [5] A. Holkner. Global multiple objective line breaking. Master's thesis, School of Computer Science and Information Technology, RMIT University, Melbourne, Victoria, Australia, 2006. [6] P. Ciancarini, A. Di Iorio, L. Furini, and F. Vitali. High-quality pagination for publishing. Software-Practice and Experience, 42(6):733-751, June 2012. [7] T. Hassan and A. Hunter. Knuth-Plass revisited: Flexible line-breaking for automatic document layout. In Proceedings of the 2015 ACM Symposium on Document Engineering, DocEng '15, pages 17-20, New York, NY, USA, 2015.
This paper describes a study that investigates tilt-gesture depth on a Bluetooth handheld music controller for activating and deactivating music loops. Making use of a Wii Remote's 3-axis ADXL330 accelerometer, a Max patch was programmed to receive, handle, and store incoming accelerometer data. Each loop corresponded to the front, back, left and right tilt-gesture direction, with each gesture motion triggering a loop 'On' or 'Off' depending on its playback status. The study comprised 40 undergraduate students interacting with the prototype controller for a duration of 5 minutes per person. Each participant performed three full cycles beginning with the front gesture direction and moving clockwise. This corresponded to a total of 24 trigger motions per participant. Raw data associated with tilt-gesture motion depth was scaled, analyzed and graphed. Results show significant differences between each gesture direction in terms of tilt-gesture depth, as well as issues with noise for left/right gesture motion due to dependency on Roll and Yaw values. Front and Left tilt-gesture depths displayed significantly higher threshold levels compared to the Back and Right axes. Front and Left tilt-gesture thresholds therefore allow the device to easily differentiate between intentional sample triggering and general device handling, while this is more difficult for Back and Left directions. Future work will include finding an alternative method for evaluating intentional tilt-gesture triggering on the Back and Left axes, as well as utilizing two 2-axis accelerometers to garner clean data from the Left and Right axes.
With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft's FPGA deployment in its Bing search engine and Intel's 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems—like Apache Spark and Hadoop—to access the performance and energy benefits of FPGA accelerators. In this paper we design and implement Blaze to provide programming and runtime support for enabling easy and efficient deployments of FPGA accelerators in datacenters. In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. Our Blaze runtime implements an FaaS framework to efficiently share FPGA accelerators among multiple heterogeneous threads on a single node, and extends Hadoop YARN with accelerator-centric scheduling to efficiently share them among multiple computing tasks in the cluster. Experimental results using four representative big data applications demonstrate that Blaze greatly reduces the programming efforts to access FPGA accelerators in systems like Apache Spark and YARN, and improves the system throughput by 1.7× to 3× (and energy efficiency by 1.5× to 2.7×) compared to a conventional CPU-only cluster.
Internet has been being becoming the most famous and biggest communication networks as social, industrial, and public infrastructure since Internet was invented at late 1960s. In a historical retrospect of Internet's evolution, the Internet architecture continues evolution repeatedly by going through various technical challenges, for instance, in early 1990s, Internet had encountered danger of scalability, after a short while it had been overcome and successfully evolved by applying emerging techniques such as CIDR, NAT, and IPv6. Especially this paper emphasizes scalability issues as technical challenges with forecasting that Internet of things era has come. Firstly, we describe the Identifier and locator separation scheme that can achieve dramatically architectural evolution in historical perspective. Additionally, it reviews various kinds of Identifier and locator separation scheme because recently the scheme can be the major design pillar towards future of Internet architecture such as both various clean-slated future Internet architectures and evolving Internet architectures. Lastly we show a result of analysis by analysis table for future of internet of everything where number of Internet connected devices will growth to more than 20 billion by 2020.
Internet of Things(IoT) is the next big boom in the networking field. The vision of IoT is to connect daily used objects (which have the ability of sensing and actuation) to the Internet. This may or may or may not involve human. IoT field is still maturing and has many open issues. We build up on the security issues. As the devices have low computational power and low memory the existing security mechanisms (which are a necessity) should also be optimized accordingly or a clean slate approach needs to be followed. This is a survey paper to focus on the security aspects of IoT. We further also discuss the open challenges in this fie
As demand for wireless mobile connectivity continues to explode, cellular network infrastructure capacity requirements continue to grow. While 5G tries to address capacity requirements at the radio layer, the load on the cellular core network infrastructure (called Enhanced Packet Core (EPC)) stresses the network infrastructure. Our work examines the architecture, protocols of current cellular infrastructures and the workload on the EPC. We study the challenges in dimensioning capacity and review the design alternatives to support the significant scale up desired, even for the near future. We breakdown the workload on the network infrastructure into its components-signaling event transactions; database or lookup transactions and packet processing. We quantitatively show the control plane and data plane load on the various components of the EPC and estimate how future 5G cellular network workloads will scale. This analysis helps us to understand the scalability challenges for future 5G EPC network components. Other efforts to scale the 5G cellular network take a system view where the control plane is separated from the data path and is terminated on a centralized SDN controller. The SDN controller configures the data path on a widely distributed switching infrastructure. Our analysis of the workload informs us on the feasibility of various design alternatives and motivates our efforts to develop our clean-slate approach, called CleanG.
The data processing capabilities of MapReduce systems pioneered with the on-demand scalability of cloud computing have enabled the Big Data revolution. However, the data controllers/owners worried about the privacy and accountability impact of storing their data in the cloud infrastructures as the existing cloud computing solutions provide very limited control on the underlying systems. The intuitive approach - encrypting data before uploading to the cloud - is not applicable to MapReduce computation as the data analytics tasks are ad-hoc defined in the MapReduce environment using general programming languages (e.g, Java) and homomorphic encryption methods that can scale to big data do not exist. In this paper, we address the challenges of determining and detecting unauthorized access to data stored in MapReduce based cloud environments. To this end, we introduce alarm raising honeypots distributed over the data that are not accessed by the authorized MapReduce jobs, but only by the attackers and/or unauthorized users. Our analysis shows that unauthorized data accesses can be detected with reasonable performance in MapReduce based cloud environments.
Information Technology experts cite security and privacy concerns as the major challenges in the adoption of cloud computing. On Platform-as-a-Service (PaaS) clouds, customers are faced with challenges of selecting service providers and evaluating security implementations based on their security needs and requirements. This study aims to enable cloud customers the ability to quantify their security requirements in order to identify critical areas in PaaS cloud architectures were security provisions offered by CSPs could be assessed. With the use of an adaptive security mapping matrix, the study uses a quantitative approach to presents findings of numeric data that shows critical architectures within the PaaS environment where security can be evaluated and security controls assessed to meet these security requirements. The matrix can be adapted across different types of PaaS cloud models based on individual security requirements and service level objectives identified by PaaS cloud customers.
Language vector space models (VSMs) have recently proven to be effective across a variety of tasks. In VSMs, each word in a corpus is represented as a real-valued vector. These vectors can be used as features in many applications in machine learning and natural language processing. In this paper, we study the effect of vector space representations in cyber security. In particular, we consider a passive traffic analysis attack (Website Fingerprinting) that threatens users' navigation privacy on the web. By using anonymous communication, Internet users (such as online activists) may wish to hide the destination of web pages they access for different reasons such as avoiding tyrant governments. Traditional website fingerprinting studies collect packets from the users' network and extract features that are used by machine learning techniques to reveal the destination of certain web pages. In this work, we propose the packet to vector (P2V) approach where we model website fingerprinting attack using word vector representations. We show how the suggested model outperforms previous website fingerprinting works.
Searchable encryption is a new developing information security technique and it enables users to search over encrypted data through keywords without having to decrypt it at first. In the last decade, many researchers are engaging in the field of searchable encryption and have proposed a series of efficient search schemes over encrypted cloud data. It is the time to survey this field to conclude a comprehensive framework by analyzing individual contributions. This paper focuses on the searchable encryption schemes in cloud. We firstly summarize the general model and threat model in searchable encryption schemes, and then present the privacy-preserving issues in these schemes. In addition, we compare the efficiency and security between semantic search and preferred search in detail. At last, some open issues and research challenges in the future are proposed.
Strength of security and privacy of any cryptographic mechanisms that use random numbers require that the random numbers generated have two important properties namely 1. Uniform distribution and 2. Independence. With the growth of Internet many devices are connected to Internet that host sensors. One idea proposed is to use sensor data as seed for Random Number Generator (RNG) since sensors measure the physical phenomena that exhibit randomness over time. The random numbers generated from sensor data can be used for cryptographic algorithms in Internet activities. These sensor data also pose weaknesses where sensors may be under adversarial control that may lead to generating expected random sequence which breaks the security and privacy. This paper proposes a wash-rinse-spin approach to process the raw sensor data that increases randomness in the seed value. The generated sequences from two sensors are combined by Decimation method to improve unpredictability. This makes the sensor data to be more secure in generating random numbers preventing attackers from knowing the random sequence through adversarial control.
Privacy analysis is essential in the society. Data privacy preservation for access control, guaranteed service in wireless sensor networks are important parts. In programs' verification, we not only consider about these kinds of safety and liveness properties but some security policies like noninterference, and observational determinism which have been proposed as hyper properties. Fairness is widely applied in verification for concurrent systems, wireless sensor networks and embedded systems. This paper studies verification and analysis for proving security-relevant properties and hyper properties by proposing deductive proof rules under fairness requirements (constraints).
Steganography is the art of the hidden data in such a way that it detection of hidden knowledge prevents. As the necessity of security and privacy increases, the need of the hiding secret data is ongoing. In this paper proposed an enhanced detection of the 1-2-4 LSB steganography and RSA cryptography in Gray Scale and Color images. For color images, we apply 1-2-4 LSB on component of the RGB, then encrypt information applying RSA technique. For Gray Images, we use LSB to then encrypt information and also detect edges of gray image. In the experimental outcomes, calculate PSNR and MSE. We calculate peak signal noise ratio for quality and brightness. This method makes sure that the information has been encrypted before hiding it into an input image. If in any case the cipher text got revealed from the input image, the middle person other than receiver can't access the information as it is in encrypted form.
Today's systems produce a rapidly exploding amount of data, and the data further derives more data, forming a complex data propagation network that we call the data's lineage. There are many reasons that users want systems to forget certain data including its lineage. From a privacy perspective, users who become concerned with new privacy risks of a system often want the system to forget their data and lineage. From a security perspective, if an attacker pollutes an anomaly detector by injecting manually crafted data into the training data set, the detector must forget the injected data to regain security. From a usability perspective, a user can remove noise and incorrect entries so that a recommendation engine gives useful recommendations. Therefore, we envision forgetting systems, capable of forgetting certain data and their lineages, completely and quickly. This paper focuses on making learning systems forget, the process of which we call machine unlearning, or simply unlearning. We present a general, efficient unlearning approach by transforming learning algorithms used by a system into a summation form. To forget a training data sample, our approach simply updates a small number of summations – asymptotically faster than retraining from scratch. Our approach is general, because the summation form is from the statistical query learning in which many machine learning algorithms can be implemented. Our approach also applies to all stages of machine learning, including feature selection and modeling. Our evaluation, on four diverse learning systems and real-world workloads, shows that our approach is general, effective, fast, and easy to use.
Hash based biometric template protection schemes (BTPS), such as fuzzy commitment, fuzzy vault, and secure sketch, address the privacy leakage concern on the plain biometric template storage in a database through using cryptographic hash calculation for template verification. However, cryptographic hashes have only computational security whose being cracked shall leak the biometric feature in these BTPS; and furthermore, existing BTPS are rarely able to detect during a verification process whether a probe template has been leaked from the database or not (i.e., being used by an imposter or a genuine user). In this paper we tailor the "honeywords" idea, which was proposed to detect the hashed password cracking, to enable the detectability of biometric template database leakage. However, unlike passwords, biometric features encoded in a template cannot be renewed after being cracked and thus not straightforwardly able to be protected by the honeyword idea. To enable the honeyword idea on biometrics, diversifiability (and thus renewability) is required on the biometric features. We propose to use BTPS for his purpose in this paper and present a machine learning based protected template generation protocol to ensure the best anonymity of the generated sugar template (from a user's genuine biometric feature) among other honey ones (from synthesized biometric features).