Biblio
Text analytics systems often rely heavily on detecting and linking entity mentions in documents to knowledge bases for downstream applications such as sentiment analysis, question answering and recommender systems. A major challenge for this task is to be able to accurately detect entities in new languages with limited labeled resources. In this paper we present an accurate and lightweight, multilingual named entity recognition (NER) and linking (NEL) system. The contributions of this paper are three-fold: 1) Lightweight named entity recognition with competitive accuracy; 2) Candidate entity retrieval that uses search click-log data and entity embeddings to achieve high precision with a low memory footprint; and 3) efficient entity disambiguation. Our system achieves state-of-the-art performance on TAC KBP 2013 multilingual data and on English AIDA CONLL data.
Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. The need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.
Smart Grid cybersecurity is one of the key ingredients for successful and wide scale adaptation of the Smart Grid by utilities and governments around the world. The implementation of the Smart Grid relies mainly on the highly distributed sensing and communication functionalities of its components such as Wireless Sensor Networks (WSNs), Phasor Measurement Units (PMUs) and other protection devices. This distributed nature and the high number of connected devices are the main challenges for implementing cybersecurity in the smart grid. As an example, the North American Electric Reliability Corporation (NERC) issued the Critical Infrastructure Protection (CIP) standards (CIP-002 through CIP-009) to define cybersecurity requirements for critical power grid infrastructure. However, NERC CIP standards do not specify cybersecurity for different communication technologies such as WSNs, fiber networks and other network types. Implementing security mechanisms in WSNs is a challenging task due to the limited resources of the sensor devices. WSN security mechanisms should not only focus on reducing the power consumption of the sensor devices, but they should also maintain high reliability and throughput needed by Smart Grid applications. In this paper, we present a WSN cybersecurity mechanism suitable for smart grid monitoring application. Our mechanism can detect and isolate various attacks in a smart grid environment, such as denial of sleep, forge and replay attacks in an energy efficient way. Simulation results show that our mechanism can outperform existing techniques while meeting the NERC CIP requirements.
This paper introduces a new efficient algorithm for computing Grobner-bases named M4GB. Like Faugere's algorithm F4 it is an extension of Buchberger's algorithm that describes: how to store already computed (tail-)reduced multiples of basis polynomials to prevent redundant work in the reduction step; and how to exploit efficient linear algebra for the reduction step. In comparison to F4 it removes further redundant work in the processing of reducible monomials. Furthermore, instead of translating the reduction of many critical pairs into the row reduction of some large matrix, our algorithm is described more natively and is efficient while processing critical pairs one by one. This feature implies that typically M4GB has to process fewer critical pairs than F4, and reduces the time and data complexity 'staircase' related to the increasing degree of regularity for a sequence of problems one observes for F4. To achieve high efficiency, M4GB has been designed specifically to operate only on tail-reduced polynomials, i.e., polynomials of which all terms except the leading term are non-reducible. This allows it to perform full-reduction directly in the computation of a term polynomial multiplication, where all computations are done over coefficient vectors over the non-reducible monomials. We have implemented a version of our new algorithm tailored for dense overdefined polynomial systems as a proof of concept and made our source code publicly available. We have made a comparison of our implementation against the implementations of FGBlib, Magma and OpenF4 on various dense Fukuoka MQ challenge problems that we were able to compute in reasonable time and memory. We observed that M4GB uses the least total CPU time and the least memory of all these implementations for those MQ problems, often by a significant factor. In the Fukuoka MQ challenges, the starting challenges of Type V and Type VI have 16 equations which was chosen based on an extrapolated computational runtime of more than a month using Magma. M4GB allowed us to set new records for these Fukuoka MQ challenges breaking Type V (F28) up to 18 equations and Type VI (F31) up to 19 equations, each can be computed within up to 11 days on our dual Xeon system.
There are currently few methods that can be applied to malware classification problems which don't require domain knowledge to apply. In this work, we develop our new SHWeL feature vector representation, by extending the recently proposed Lempel-Ziv Jaccard Distance. These SHWeL vectors improve upon LZJD's accuracy, outperform byte n-grams, and allow us to build efficient algorithms for both training (a weakness of byte n-grams) and inference (a weakness of LZJD). Furthermore, our new SHWeL method also allows us to directly tackle the class imbalance problem, which is common for malware-related tasks. Compared to existing methods like SMOTE, SHWeL provides significantly improved accuracy while reducing algorithmic complexity to O(N). Because our approach is developed without the use of domain knowledge, it can be easily re-applied to any new domain where there is a need to classify byte sequences.
Open Source Software developer communities are susceptible to challenges related to volatility, distributed coordination and the interplay between commercial and ideological interests. Here, community managers play a vital role in growing, shepherding, and coordinating the developers' work. This study investigates the varied tasks that community managers perform to ensure the health and vitality of their communities. We describe the challenges managers face while directing the community and seeking support for their work from the analysis tools provided by state-of-the-art software platforms. Our results describe seven roles that community managers may play, highlighting the versatile and people-centric nature of the community manager's work. Managers experience hardship of connecting their goals, questions and metrics that define a community's health and effects of their actions. Our results voice common concerns among community managers, and can be used to help them structure the management activity and to find a theoretical frame for further research on how health of developer communities could be understood.
Network analysts have long used two-dimensional security visualizations to make sense of overwhelming amounts of network data. As networks grow larger and more complex, two-dimensional displays can become convoluted, compromising user cyber-threat perspective. Using augmented reality to display data with cyber-physical context creates a naturally intuitive interface that helps restore perspective and comprehension sacrificed by complicated two-dimensional visualizations. We introduce Mobile Augmented Reality for Cybersecurity, or MARCS, as a platform to visualize a diverse array of data in real time and space to improve user perspective and threat response. Early work centers around CovARVT and ConnectAR, two proof of concept, prototype applications designed to visualize intrusion detection and wireless association data, respectively.
The increasing complexity and ubiquity in user connectivity, computing environments, information content, and software, mobile, and web applications transfers the responsibility of privacy management to the individuals. Hence, making it extremely difficult for users to maintain the intelligent and targeted level of privacy protection that they need and desire, while simultaneously maintaining their ability to optimally function. Thus, there is a critical need to develop intelligent, automated, and adaptable privacy management systems that can assist users in managing and protecting their sensitive data in the increasingly complex situations and environments that they find themselves in. This work is a first step in exploring the development of such a system, specifically how user personality traits and other characteristics can be used to help automate determination of user sharing preferences for a variety of user data and situations. The Big-Five personality traits of openness, conscientiousness, extroversion, agreeableness, and neuroticism are examined and used as inputs into several popular machine learning algorithms in order to assess their ability to elicit and predict user privacy preferences. Our results show that the Big-Five personality traits can be used to significantly improve the prediction of user privacy preferences in a number of contexts and situations, and so using machine learning approaches to automate the setting of user privacy preferences has the potential to greatly reduce the burden on users while simultaneously improving the accuracy of their privacy preferences and security.
Microdata is collected by companies in order to enhance their quality of service as well as the accuracy of their recommendation systems. These data often become publicly available after they have been sanitized. Recent reidentification attacks on publicly available, sanitized datasets illustrate the privacy risks involved in microdata collections. Currently, users have to trust the provider that their data will be safe in case data is published or if a privacy breach occurs. In this work, we empower users by developing a novel, user-centric tool for privacy measurement and a new lightweight privacy metric. The goal of our tool is to estimate users' privacy level prior to sharing their data with a provider. Hence, users can consciously decide whether to contribute their data. Our tool estimates an individuals' privacy level based on published popularity statistics regarding the items in the provider's database, and the users' microdata. In this work, we describe the architecture of our tool as well as a novel privacy metric, which is necessary for our setting where we do not have access to the provider's database. Our tool is user friendly, relying on smart visual results that raise privacy awareness. We evaluate our tool using three real world datasets, collected from major providers. We demonstrate strong correlations between the average anonymity set per user and the privacy score obtained by our metric. Our results illustrate that our tool which uses minimal information from the provider, estimates users' privacy levels comparably well, as if it had access to the actual database.
Provenance counterfeit and packet loss assaults are measured as threats in the large scale wireless sensor networks which are engaged for diverse application domains. The assortments of information source generate necessitate promising the reliability of information such as only truthful information is measured in the decision procedure. Details about the sensor nodes play an major role in finding trust value of sensor nodes. In this paper, a novel lightweight secure provenance method is initiated for improving the security of provenance data transmission. The anticipated system comprises provenance authentication and renovation at the base station by means of Merkle-Hellman knapsack algorithm based protected provenance encoding in the Bloom filter framework. Side Channel Monitoring (SCM) is exploited for noticing the presence of selfish nodes and packet drop behaviors. This lightweight secure provenance method decreases the energy and bandwidth utilization with well-organized storage and secure data transmission. The investigational outcomes establishes the efficacy and competence of the secure provenance secure system by professionally noticing provenance counterfeit and packet drop assaults which can be seen from the assessment in terms of provenance confirmation failure rate, collection error, packet drop rate, space complexity, energy consumption, true positive rate, false positive rate and packet drop attack detection.
Denial-of-Service attacks have rapidly increased in terms of frequency and intensity, steadily becoming one of the biggest threats to Internet stability and reliability. However, a rigorous comprehensive characterization of this phenomenon, and of countermeasures to mitigate the associated risks, faces many infrastructure and analytic challenges. We make progress toward this goal, by introducing and applying a new framework to enable a macroscopic characterization of attacks, attack targets, and DDoS Protection Services (DPSs). Our analysis leverages data from four independent global Internet measurement infrastructures over the last two years: backscatter traffic to a large network telescope; logs from amplification honeypots; a DNS measurement platform covering 60% of the current namespace; and a DNS-based data set focusing on DPS adoption. Our results reveal the massive scale of the DoS problem, including an eye-opening statistic that one-third of all / 24 networks recently estimated to be active on the Internet have suffered at least one DoS attack over the last two years. We also discovered that often targets are simultaneously hit by different types of attacks. In our data, Web servers were the most prominent attack target; an average of 3% of the Web sites in .com, .net, and .org were involved with attacks, daily. Finally, we shed light on factors influencing migration to a DPS.
In this paper, we focus on the security issues and challenges in smart grid. Smart grid security features must address not only the expected deliberate attacks, but also inadvertent compromises of the information infrastructure due to user errors, equipment failures, and natural disasters. An important component of smart grid is the advanced metering infrastructure which is critical to support two-way communication of real time information for better electricity generation, distribution and consumption. These reasons makes security a prominent factor of importance to AMI. In recent times, attacks on smart grid have been modelled using attack tree. Attack tree has been extensively used as an efficient and effective tool to model security threats and vulnerabilities in systems where the ultimate goal of an attacker can be divided into a set of multiple concrete or atomic sub-goals. The sub-goals are related to each other as either AND-siblings or OR-siblings, which essentially depicts whether some or all of the sub-goals must be attained for the attacker to reach the goal. On the other hand, as a security professional one needs to find out the most effective way to address the security issues in the system under consideration. It is imperative to assume that each attack prevention strategy incurs some cost and the utility company would always look to minimize the same. We present a cost-effective mechanism to identify minimum number of potential atomic attacks in an attack tree.
Bitcoin, one major virtual currency, attracts users' attention by its novel mode in recent years. With blockchain as its basic technique, Bitcoin possesses strong security features which anonymizes user's identity to protect their private information. However, some criminals utilize Bitcoin to do several illegal activities bringing in great security threat to the society. Therefore, it is necessary to get knowledge of the current trend of Bitcoin and make effort to de-anonymize. In this paper, we put forward and realize a system to analyze Bitcoin from two aspects: blockchain data and network traffic data. We resolve the blockchain data to analyze Bitcoin from the point of Bitcoin address while simulate Bitcoin P2P protocol to evaluate Bitcoin from the point of IP address. At last, with our system, we finish analyzing its current trends and tracing its transactions by putting some statistics on Bitcoin transactions and addresses, tracing the transaction flow and de-anonymizing some Bitcoin addresses to IPs.
Like any other software engineering activity, assessing the security of a software system entails prioritizing the resources and minimizing the risks. Techniques ranging from the manual inspection to automated static and dynamic analyses are commonly employed to identify security vulnerabilities prior to the release of the software. However, none of these techniques is perfect, as static analysis is prone to producing lots of false positives and negatives, while dynamic analysis and manual inspection are unwieldy, both in terms of required time and cost. This research aims to improve these techniques by mining relevant information from vulnerabilities found in the app markets. The approach relies on the fact that many modern software systems, in particular mobile software, are developed using rich application development frameworks (ADF), allowing us to raise the level of abstraction for detecting vulnerabilities and thereby making it possible to classify the types of vulnerabilities that are encountered in a given category of application. By coupling this type of information with severity of the vulnerabilities, we are able to improve the efficiency of static and dynamic analyses, and target the manual effort on the riskiest vulnerabilities.
Let us consider a scenario that a data holder (e.g., a hospital) encrypts a data (e.g., a medical record) which relates a keyword (e.g., a disease name), and sends its ciphertext to a server. We here suppose not only the data but also the keyword should be kept private. A receiver sends a query to the server (e.g., average of body weights of cancer patients). Then, the server performs the homomorphic operation to the ciphertexts of the corresponding medical records, and returns the resultant ciphertext. In this scenario, the server should NOT be allowed to perform the homomorphic operation against ciphertexts associated with different keywords. If such a mis-operation happens, then medical records of different diseases are unexpectedly mixed. However, in the conventional homomorphic encryption, there is no way to prevent such an unexpected homomorphic operation, and this fact may become visible after decrypting a ciphertext, or as the most serious case it might be never detected. To circumvent this problem, in this paper, we propose mis-operation resistant homomorphic encryption, where even if one performs the homomorphic operations against ciphertexts associated with keywords ω' and ω, where ω -ω', the evaluation algorithm detects this fact. Moreover, even if one (intentionally or accidentally) performs the homomorphic operations against such ciphertexts, a ciphertext associated with a random keyword is generated, and the decryption algorithm rejects it. So, the receiver can recognize such a mis-operation happens in the evaluation phase. In addition to mis-operation resistance, we additionally adopt secure search functionality for keywords since it is desirable when one would like to delegate homomorphic operations to a third party. So, we call the proposed primitive mis-operation resistant searchable homomorphic encryption (MR-SHE). We also give our implementation result of inner products of encrypted vectors. In the case when both vectors are encrypted, the running time of the receiver is millisecond order for relatively small-dimensional (e.g., 26) vectors. In the case when one vector is encrypted, the running time of the receiver is approximately 5 msec even for relatively high-dimensional (e.g., 213) vectors.
Smartphones have become the pervasive personal computing platform. Recent years thus have witnessed exponential growth in research and development for secure and usable authentication schemes for smartphones. Several explicit (e.g., PIN-based) and/or implicit (e.g., biometrics-based) authentication methods have been designed and published in the literature. In fact, some of them have been embedded in commercial mobile products as well. However, the published studies report only the brighter side of the proposed scheme(s), e.g., higher accuracy attained by the proposed mechanism. While other associated operational issues, such as computational overhead, robustness to different environmental conditions/attacks, usability, are intentionally or unintentionally ignored. More specifically, most publicly available frameworks did not discuss or explore any other evaluation criterion, usability and environment-related measures except the accuracy under zero-effort. Thus, their baseline operations usually give a false sense of progress. This paper, therefore, presents some guidelines to researchers for designing, implementation, and evaluating smartphone user authentication methods for a positive impact on future technological developments.
Insider threats can cause immense damage to organizations of different types, including government, corporate, and non-profit organizations. Being an insider, however, does not necessarily equate to being a threat. Effectively identifying valid threats, and assessing the type of threat an insider presents, remain difficult challenges. In this work, we propose a novel breakdown of eight insider threat types, identified by using three insider traits: predictability, susceptibility, and awareness. In addition to presenting this framework for insider threat types, we implement a computational model to demonstrate the viability of our framework with synthetic scenarios devised after reviewing real world insider threat case studies. The results yield useful insights into how further investigation might proceed to reveal how best to gauge predictability, susceptibility, and awareness, and precisely how they relate to the eight insider types.
During the life span of large software projects, developers often apply the same code changes to different code locations in slight variations. Since the application of these changes to all locations is time-consuming and error-prone, tools exist that learn change patterns from input examples, search for possible pattern applications, and generate corresponding recommendations. In many cases, the generated recommendations are syntactically or semantically wrong due to code movements in the input examples. Thus, they are of low accuracy and developers cannot directly copy them into their projects without adjustments. We present the Accurate REcommendation System (ARES) that achieves a higher accuracy than other tools because its algorithms take care of code movements when creating patterns and recommendations. On average, the recommendations by ARES have an accuracy of 96% with respect to code changes that developers have manually performed in commits of source code archives. At the same time ARES achieves precision and recall values that are on par with other tools.
Cloud computing is a remarkable model for permitting on-demand network access to an elastic collection of configurable adaptive resources and features including storage, software, infrastructure, and platform. However, there are major concerns about security-related issues. A very critical security function is user authentication using passwords. Although many flaws have been discovered in password-based authentication, it remains the most convenient approach that people continue to utilize. Several schemes have been proposed to strengthen its effectiveness such as salted hashes, one-time password (OTP), single-sign-on (SSO) and multi-factor authentication (MFA). This study proposes a new authentication mechanism by combining user's password and modified characters of CAPTCHA to generate a passkey. The modification of the CAPTCHA depends on a secret agreed upon between the cloud provider and the user to employ different characters for some characters in the CAPTCHA. This scheme prevents various attacks including short-password attack, dictionary attack, keylogger, phishing, and social engineering. Moreover, it can resolve the issue of password guessing and the use of a single password for different cloud providers.
Internet of Things (IoT) is an emerging trend that is changing the way devices connect and communicate. Integration of cloud computing with IoT i.e. Cloud of Things (CoT) provide scalability, virtualized control and access to the services provided by IoT. Security issues are a major obstacle in widespread deployment and application of CoT. Among these issues, authentication and identification of user is crucial. In this study paper, survey of various authentication schemes is carried out. The aim of this paper is to study a multifactor authentication system which uses secret splitting in detail. The system uses exclusive-or operations, encryption algorithms and Diffie-Hellman key exchange algorithm to share key over the network. Security analysis shows the resistance of the system against different types of attacks.
Embedded and mobile devices forming part of the Internet-of-Things (IoT) need new authentication technologies and techniques. This requirement is due to the increase in effort and time attackers will use to compromise a device, often remote, based on the possibility of a significant monetary return. This paper proposes exploiting a device's accelerometers in-built functionality to implement multi-factor authentication. An experimental embedded system designed to emulate a typical mobile device is used to implement the ideas and investigated as proof-of-concept.
Electro-hydraulic servo actuation system is a mechanical, electrical and hydraulic mixing complex system. If it can't be repaired for a long time, it is necessary to consider the possibility of occurrence of multiple faults. Considering this possibility, this paper presents an extended Kalman filter (EKF) based method for multiple faults diagnosis. Through analysing the failure modes and mechanism of the electro-hydraulic servo actuation system and modelling selected typical failure modes, the relationship between the key parameters of the system and the faults is obtained. The extended Kalman filter which is a commonly used algorithm for estimating parameters is used to on-line fault diagnosis. Then use the extended Kalman filter to diagnose potential faults. The simulation results show that the multi-fault diagnosis method based on extended Kalman filter is effective for multi-fault diagnosis of electro-hydraulic servo actuation system.
5G, the fifth generation of mobile communication networks, is considered as one of the main IoT enablers. Connecting billions of things, 5G/IoT will be dealing with trillions of GBytes of data. Securing such large amounts of data is a very challenging task. Collected data varies from simple temperature measurements to more critical transaction data. Thus, applying uniform security measures is a waste of resources (processing, memory, and network bandwidth). Alternatively, a multi-level security model needs to be applied according to the varying requirements. In this paper, we present a multi-level security scheme (BLP) applied originally in the information security domain. We review its application in the network domain, and propose a modified version of BLP for the 5G/IoT case. The proposed model is proven to be secure and compliant with the model rules.
Users search for multimedia content with different underlying motivations or intentions. Study of user search intentions is an emerging topic in information retrieval since understanding why a user is searching for a content is crucial for satisfying the user's need. In this paper, we aimed at automatically recognizing a user's intent for image search in the early stage of a search session. We designed seven different search scenarios under the intent conditions of finding items, re-finding items and entertainment. We collected facial expressions, physiological responses, eye gaze and implicit user interactions from 51 participants who performed seven different search tasks on a custom-built image retrieval platform. We analyzed the users' spontaneous and explicit reactions under different intent conditions. Finally, we trained machine learning models to predict users' search intentions from the visual content of the visited images, the user interactions and the spontaneous responses. After fusing the visual and user interaction features, our system achieved the F-1 score of 0.722 for classifying three classes in a user-independent cross-validation. We found that eye gaze and implicit user interactions, including mouse movements and keystrokes are the most informative features. Given that the most promising results are obtained by modalities that can be captured unobtrusively and online, the results demonstrate the feasibility of deploying such methods for improving multimedia retrieval platforms.
A method for the multiple faults diagnosis in linear analog circuits is presented in this paper. The proposed approach is based upon the concept named by the indirect compensation theorem. This theorem is reducing the procedure of fault diagnosis in the analog circuit to the symbolic analysis process. An extension of the indirect compensation theorem for the linear subcircuit is proposed. The indirect compensation provides equivalent replacement of the n-ports subcircuit by n norators and n fixators of voltages and currents. The proposed multiple faults diagnosis techniques can be used for evaluation of any kind of terminal characteristics of the two-port network. For calculation of the circuit determinant expressions, the Generalized Parameter Extraction Method is implemented. The main advantage of the analysis method is that it is cancellation free. It requires neither matrix nor ordinary graph description of the circuit. The process of symbolic circuit analysis is automated by the freeware computer program Cirsym which can be used online. The experimental results are presented to show the efficiency and reliability of the proposed technique.