Biblio
The exponential growth of IoT-type systems has led to a reconsideration of the field of database management systems in terms of storing and handling high-volume data. Recently, many real-time Database Management Systems(DBMS) have been developed to address issues such as security, managing concurrent access to stored data, and optimizing data query performance. This paper studies methods that allow to reduce the temporal validity range for common DBMS. The primary purpose of IoT edge devices is to generate data and make it available for machine learning or statistical algorithms. This is achieved inside the Knowledge Discovery in Databases process. In order to visualize and obtain critical Data Mining results, all the device-generated data must be made available as fast as possible for selection, preprocessing and data transformation. In this research we investigate if IoT edge devices can be used with common DBMS proper configured in order to access data fast instead of working with Real Time DBMS. We will study what kind of transactions are needed in large IoT ecosystems and we will analyze the techniques of controlling concurrent access to common resources (stored data). For this purpose, we built a series of applications that are able to simulate concurrent writing operations to a common DBMS in order to investigate the performance of concurrent access to database resources. Another important procedure that will be tested with the developed applications will be to increase the availability of data for users and data mining applications. This will be achieved by using field indexing.
Security patterns are proven solutions to recurring problems in software development. The growing importance of secure software development has introduced diverse research efforts on security patterns that mostly focused on classification schemes, evolution and evaluation of the patterns. Despite a huge mature history of research and popularity among researchers, security patterns have not fully penetrated software development practices. Besides, software security education has not been benefited by these patterns though a commonly stated motivation is the dissemination of expert knowledge and experience. This is because the patterns lack a simple embodiment to help students learn about vulnerable code, and to guide new developers on secure coding. In order to address this problem, we propose to conduct intelligent data mining in the context of software engineering to discover learner-friendly software security patterns. Our proposed model entails knowledge discovery from large scale published real-world vulnerability histories in software applications. We harness association rule mining for frequent pattern discovery to mine easily comprehensible and explainable learner-friendly rules, mainly of the type "flaw implies fix" and "attack type implies flaw", so as to enhance training in secure coding which in turn would augment secure software development. We propose to build a learner-friendly intelligent tutoring system (ITS) based on the newly discovered security patterns and rules explored. We present our proposed model based on association rule mining in secure software development with the goal of building this ITS. Our proposed model and prototype experiments are discussed in this paper along with challenges and ongoing work.
The answer selection task is one of the most important issues within the automatic question answering system, and it aims to automatically find accurate answers to questions. Traditional methods for this task use manually generated features based on tf-idf and n-gram models to represent texts, and then select the right answers according to the similarity between the representations of questions and the candidate answers. Nowadays, many question answering systems adopt deep neural networks such as convolutional neural network (CNN) to generate the text features automatically, and obtained better performance than traditional methods. CNN can extract consecutive n-gram features with fixed length by sliding fixed-length convolutional kernels over the whole word sequence. However, due to the complex semantic compositionality of the natural language, there are many phrases with variable lengths and be composed of non-consecutive words in natural language, such as these phrases whose constituents are separated by other words within the same sentences. But the traditional CNN is unable to extract the variable length n-gram features and non-consecutive n-gram features. In this paper, we propose a multi-scale deformable convolutional neural network to capture the non-consecutive n-gram features by adding offset to the convolutional kernel, and also propose to stack multiple deformable convolutional layers to mine multi-scale n-gram features by the means of generating longer n-gram in higher layer. Furthermore, we apply the proposed model into the task of answer selection. Experimental results on public dataset demonstrate the effectiveness of our proposed model in answer selection.
Corpora used to learn open-domain Question-Answering (QA) models are typically collected from a wide variety of topics or domains. Since QA requires understanding natural language, open-domain QA models generally need very large training corpora. A simple way to alleviate data demand is to restrict the domain covered by the QA model, leading thus to domain-specific QA models. While learning improved QA models for a specific domain is still challenging due to the lack of sufficient training data in the topic of interest, additional training data can be obtained from related topic domains. Thus, instead of learning a single open-domain QA model, we investigate domain adaptation approaches in order to create multiple improved domain-specific QA models. We demonstrate that this can be achieved by stratifying the source dataset, without the need of searching for complementary data unlike many other domain adaptation approaches. We propose a deep architecture that jointly exploits convolutional and recurrent networks for learning domain-specific features while transferring domain-shared features. That is, we use transferable features to enable model adaptation from multiple source domains. We consider different transference approaches designed to learn span-level and sentence-level QA models. We found that domain-adaptation greatly improves sentence-level QA performance, and span-level QA benefits from sentence information. Finally, we also show that a simple clustering algorithm may be employed when the topic domains are unknown and the resulting loss in accuracy is negligible.
To enhance privacy protection and improve data availability, a differential privacy data protection method ICMD-DP is proposed. Based on insensitive clustering algorithm, ICMD-DP performs differential privacy on the results of ICMD (insensitive clustering method for mixed data). The combination of clustering and differential privacy realizes the differentiation of query sensitivity from single record to group record. At the meanwhile, it reduces the risk of information loss and information disclosure. In addition, to satisfy the requirement of maintaining differential privacy for mixed data, ICMD-DP uses different methods to calculate the distance and centroid of categorical and numerical attributes. Finally, experiments are given to illustrate the availability of the method.
In cyberspace, unknown zero-day attacks can bring safety hazards. Traditional defense methods based on signatures are ineffective. Based on the Cyberspace Mimic Defense (CMD) architecture, the paper proposes a framework to detect the attacks and respond to them. Inputs are assigned to all online redundant heterogeneous functionally equivalent modules. Their independent outputs are compared and the outputs in the majority will be the final response. The abnormal outputs can be detected and so can the attack. The damaged executive modules with abnormal outputs will be replaced with new ones from the diverse executive module pool. By analyzing the abnormal outputs, the correspondence between inputs and abnormal outputs can be built and inputs leading to recurrent abnormal outputs will be written into the zero-day attack related database and their reuses cannot work any longer, as the suspicious malicious inputs can be detected and processed. Further responses include IP blacklisting and patching, etc. The framework also uses honeypot like executive module to confuse the attacker. The proposed method can prevent the recurrent attack based on the same exploit.
With the development of Software Defined Networking, its software programmability and openness brings new idea for network security. Therefore, many Software Defined Security Architectures emerged at the right moment. Software Defined Security decouples security control plane and security data plane. In Software Defined Security Architectures, underlying security devices are abstracted as security resources in resource pool, intellectualized and automated security business management and orchestration can be realized through software programming in security control plane. However, network management has been becoming extremely complicated due to expansible network scale, varying network devices, lack of abstraction and heterogeneity of network especially. Therefore, new-type open security devices are needed in SDS Architecture for unified management so that they can be conveniently abstracted as security resources in resource pool. This paper firstly analyses why open security devices are needed in SDS architecture and proposes a method of opening security devices. Considering this new architecture requires a new security scheduling mechanism, this paper proposes a security resource scheduling algorithm which is used for managing and scheduling security resources in resource pool according to user s security demand. The security resource scheduling algorithm aims to allocate a security protection task to a suitable security resource in resource pool so that improving security protection efficiency. In the algorithm, we use BP neural network to predict the execution time of security tasks to improve the performance of the algorithm. The simulation result shows that the algorithm has ideal performance. Finally, a usage scenario is given to illustrate the role of security resource scheduling in software defined security architecture.
With the development of modern logistics industry railway freight enterprises as the main traditional logistics enterprises, the service mode is facing many problems. In the era of big data, for railway freight enterprises, coordinated development and sharing of information resources have become the requirements of the times, while how to protect the privacy of citizens has become one of the focus issues of the public. To prevent the disclosure or abuse of the citizens' privacy information, the citizens' privacy needs to be preserved in the process of information opening and sharing. However, most of the existing privacy preserving models cannot to be used to resist attacks with continuously growing background knowledge. This paper presents the method of applying differential privacy to protect associated data, which can be shared in railway freight service association information. First, the original service data need to slice by optimal shard length, then differential method and apriori algorithm is used to add Laplace noise in the Candidate sets. Thus the citizen's privacy information can be protected even if the attacker gets strong background knowledge. Last, sharing associated data to railway information resource partners. The steps and usefulness of the discussed privacy preservation method is illustrated by an example.
Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.
The identification of relevant information in large text databases is a challenging task. One of the reasons is human beings' limitations in handling large volumes of data. A common solution for scavenging data from texts are word clouds. A word cloud illustrates word usage in a document by resizing individual words in documents proportionally to how frequently they appear. Even though word clouds are easy to understand, they are not particularly efficient, because they are static. In addition, the presented information lacks context, i.e., words are not explained and they may lead to radically erroneous interpretations. To tackle these problems we developed VCloud, a tool that allows the user to interact with word clouds, therefore allowing informative and interactive data exploration. Furthermore, our tool also allows one to compare two data sets presented as word clouds. We evaluated VCloud using real data about the evolution of gastritis research through the years. The papers indexed by Pubmed related to this medical context were selected for visualization and data analysis using VCloud. A domain expert explored these visualizations, being able to extract useful information from it. This illustrates how can VCloud be a valuable tool for visual text analytics.
In this paper, an algorithm is proposed to automatically produce hierarchical graph-based representations of maritime shipping lanes extrapolated from historical vessel positioning data. Each shipping lane is generated based on the detection of the vessel behavioural changes and represented in a compact synthetic route composed of the network nodes and route segments. The outcome of the knowledge discovery process is a geographical maritime network that can be used in Maritime Situational Awareness (MSA) applications such as track reconstruction from missing information, situation/destination prediction, and detection of anomalous behaviour. Experimental results are presented, testing the algorithm in a specific scenario of interest, the Dover Strait.
Intrusion response is a new generation of technology basing on active defence idea, which has very prominent significance on the protection of network security. However, the existing automatic intrusion response systems are difficult to judge the real "danger" of invasion or attack. In this study, an immune-inspired adaptive automated intrusion response system model, named as AIAIM, was given. With the descriptions of self, non-self, memory detector, mature detector and immature detector of the network transactions, the real-time network danger evaluation equations of host and network are built up. Then, the automated response polices are taken or adjusted according to the real-time danger and attack intensity, which not only solve the problem that the current automated response system models could not detect the true intrusions or attack actions, but also greatly reduce the response times and response costs. Theory analysis and experimental results prove that AIAIM provides a positive and active network security method, which will help to overcome the limitations of traditional passive network security system.