Biblio
To add more functionality and enhance usability of web applications, JavaScript (JS) is frequently used. Even with many advantages and usefulness of JS, an annoying fact is that many recent cyberattacks such as drive-by-download attacks exploit vulnerability of JS codes. In general, malicious JS codes are not easy to detect, because they sneakily exploit vulnerabilities of browsers and plugin software, and attack visitors of a web site unknowingly. To protect users from such threads, the development of an accurate detection system for malicious JS is soliciting. Conventional approaches often employ signature and heuristic-based methods, which are prone to suffer from zero-day attacks, i.e., causing many false negatives and/or false positives. For this problem, this paper adopts a machine-learning approach to feature learning called Doc2Vec, which is a neural network model that can learn context information of texts. The extracted features are given to a classifier model (e.g., SVM and neural networks) and it judges the maliciousness of a JS code. In the performance evaluation, we use the D3M Dataset (Drive-by-Download Data by Marionette) for malicious JS codes and JSUPACK for benign ones for both training and test purposes. We then compare the performance to other feature learning methods. Our experimental results show that the proposed Doc2Vec features provide better accuracy and fast classification in malicious JS code detection compared to conventional approaches.
5G, the fifth generation of mobile communication networks, is considered as one of the main IoT enablers. Connecting billions of things, 5G/IoT will be dealing with trillions of GBytes of data. Securing such large amounts of data is a very challenging task. Collected data varies from simple temperature measurements to more critical transaction data. Thus, applying uniform security measures is a waste of resources (processing, memory, and network bandwidth). Alternatively, a multi-level security model needs to be applied according to the varying requirements. In this paper, we present a multi-level security scheme (BLP) applied originally in the information security domain. We review its application in the network domain, and propose a modified version of BLP for the 5G/IoT case. The proposed model is proven to be secure and compliant with the model rules.
Increasing interest in cyber-physical systems with integrated computational and physical capabilities that can interact with humans can be identified in research and practice. Since these systems can be classified as safety- and security-critical systems the need for safety and security assurance and certification will grow. Moreover, these systems are typically characterized by fragmentation, interconnectedness, heterogeneity, short release cycles, cross organizational nature and high interference between safety and security requirements. These properties combined with the assurance of compliance to multiple standards, carrying out certification and re-certification, and the lack of an approach to model, document and integrate safety and security requirements represent a major challenge. In order to address this gap we developed a domain agnostic approach to model security and safety requirements in an integrated view to support certification processes during design and run-time phases of cyber-physical systems.
Word representation is one of the basic word repressentation methods in natural language processing, which mapped a word into a dense real-valued vector space based on a hypothesis: words with similar context have similar meanings. Models like NNLM, C&W, CBOW, Skip-gram have been designed for word embeddings learning, and get widely used in many NLP tasks. However, these models assume that one word had only one semantics meaning which is contrary to the real language rules. In this paper we pro-pose a new word unit with multiple meanings and an algorithm to distinguish them by it's context. This new unit can be embedded in most language models and get series of efficient representations by learning variable embeddings. We evaluate a new model MCBOW that integrate CBOW with our word unit on word similarity evaluation task and some downstream experiments, the result indicated our new model can learn different meanings of a word and get a better result on some other tasks.
Provenance describes detailed information about the history of a piece of data, containing the relationships among elements such as users, processes, jobs, and workflows that contribute to the existence of data. Provenance is key to supporting many data management functionalities that are increasingly important in operations such as identifying data sources, parameters, or assumptions behind a given result; auditing data usage; or understanding details about how inputs are transformed into outputs. Despite its importance, however, provenance support is largely underdeveloped in highly parallel architectures and systems. One major challenge is the demanding requirements of providing provenance service in situ. The need to remain lightweight and to be always on often conflicts with the need to be transparent and offer an accurate catalog of details regarding the applications and systems. To tackle this challenge, we introduce a lightweight provenance service, called LPS, for high-performance computing (HPC) systems. LPS leverages a kernel instrument mechanism to achieve transparency and introduces representative execution and flexible granularity to capture comprehensive provenance with controllable overhead. Extensive evaluations and use cases have confirmed its efficiency and usability. We believe that LPS can be integrated into current and future HPC systems to support a variety of data management needs.
Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.
At the core of its nature, security is a highly contextual and dynamic challenge. However, current security policy approaches are usually static, and slow to adapt to ever-changing requirements, let alone catching up with reality. In a 2012 Sophos survey, it was stated that a unique malware is created every half a second. This gives a glimpse of the unsustainable nature of a global problem, any improvement in terms of closing the "time window to adapt" would be a significant step forward. To exacerbate the situation, a simple change in threat and attack vector or even an implementation of the so-called "bring-your-own-device" paradigm will greatly change the frequency of changed security requirements and necessary solutions required for each new context. Current security policies also typically overlook the direct and indirect costs of implementation of policies. As a result, technical teams often fail to have the ability to justify the budget to the management, from a business risk viewpoint. This paper considers both the adaptive and cost-benefit aspects of security, and introduces a novel context-aware technique for designing and implementing adaptive, optimized security policies. Our approach leverages the capabilities of stochastic programming models to optimize security policy planning, and our preliminary results demonstrate a promising step towards proactive, context-aware security policies.
This paper presents a contextual anomaly detection method and its use in the discovery of malicious voltage control actions in the low voltage distribution grid. The model-based anomaly detection uses an artificial neural network model to identify a distributed energy resource's behaviour under control. An intrusion detection system observes distributed energy resource's behaviour, control actions and the power system impact, and is tested together with an ongoing voltage control attack in a co-simulation set-up. The simulation results obtained with a real photovoltaic rooftop power plant data show that the contextual anomaly detection performs on average 55% better in the control detection and over 56% better in the malicious control detection over the point anomaly detection.
Novice programmers exhibit a repertoire of affective states over time when they are learning computer programming. The modeling of frustration is important as it informs on the need for pedagogical intervention of the student who may otherwise lose confidence and interest in the learning. In this paper, contextual and keystroke features of the students within a Java tutoring system are used to detect frustration of student within a programming exercise session. As compared to psychological sensors used in other studies, the use of contextual and keystroke logs are less obtrusive and the equipment used (keyboard) is ubiquitous in most learning environment. The technique of logistic regression with lasso regularization is utilized for the modeling to prevent over-fitting. The results showed that a model that uses only contextual and keystroke features achieved a prediction accuracy level of 0.67 and a recall measure of 0.833. Thus, we conclude that it is possible to detect frustration of a student from distilling both the contextual and keystroke logs within the tutoring system with an adequate level of accuracy.
Keystroke dynamics analysis has been applied successfully to password or fixed short texts verification as a means to reduce their inherent security limitations, because their length and the fact of being typed often makes their characteristic timings fairly stable. On the other hand, free text analysis has been neglected until recent years due to the inherent difficulties of dealing with short term behavioral noise and long term effects over the typing rhythm. In this paper we examine finite context modeling of keystroke dynamics in free text and report promising results for user verification over an extensive data set collected from a real world environment outside the laboratory setting that we make publicly available.
Despite all the current controversies, the success of the email service is still valid. The ease of use of its various features contributed to its widespread adoption. In general, the email system provides for all its users the same set of features controlled by a single monolithic policy. Such solutions are efficient but limited because they grant no place for the concept of usage which denotes a user's intention of communication: private, professional, administrative, official, military. The ability to efficiently send emails from mobile devices creates new interesting opportunities. We argue that the context (location, time, device, operating system, access network...) of the email sender appears as a new dimension we have to take into account to complete the picture. Context is clearly orthogonal to usage because a same usage may require different features depending of the context. It is clear that there is no global policy meeting requirements of all possible usages and contexts. To address this problem, we propose to define a correspondence model which for a given usage and context allows to derive a correspondence type encapsulating the exact set of required features. With this model, it becomes possible to define an advanced email system which may cope with multiple policies instead of a single monolithic one. By allowing a user to select the exact policy coping with her needs, we argue that our approach reduces the risk-taking allowing the email system to slide from a trusted one to a confident one.
Mobile social networks (MSNs) facilitate connections between mobile users and allow them to find other potential users who have similar interests through mobile devices, communicate with them, and benefit from their information. As MSNs are distributed public virtual social spaces, the available information may not be trustworthy to all. Therefore, mobile users are often at risk since they may not have any prior knowledge about others who are socially connected. To address this problem, trust inference plays a critical role for establishing social links between mobile users in MSNs. Taking into account the nonsemantical representation of trust between users of the existing trust models in social networks, this paper proposes a new fuzzy inference mechanism, namely MobiFuzzyTrust, for inferring trust semantically from one mobile user to another that may not be directly connected in the trust graph of MSNs. First, a mobile context including an intersection of prestige of users, location, time, and social context is constructed. Second, a mobile context aware trust model is devised to evaluate the trust value between two mobile users efficiently. Finally, the fuzzy linguistic technique is used to express the trust between two mobile users and enhance the human's understanding of trust. Real-world mobile dataset is adopted to evaluate the performance of the MobiFuzzyTrust inference mechanism. The experimental results demonstrate that MobiFuzzyTrust can efficiently infer trust with a high precision.