Biblio
In the information age of today, with the rapid development and wide application of communication technology and network technology, more and more information has been transmitted through the network and information security and protection is becoming more and more important, the cryptography theory and technology have become an important research field in Information Science and technology. In recent years, many researchers have found that there is a close relationship between chaos and cryptography. Chaotic system to initial conditions is extremely sensitive and can produce a large number of with good cryptographic properties of class randomness, correlation, complexity and wide spectrum sequence, provides a new and effective means for data encryption. But chaotic cryptography, as a new cross discipline, is still in its initial stage of development. Although many chaotic encryption schemes have been proposed, the method of chaotic cryptography is not yet fully mature. The research is carried out under such a background, to be used in chaotic map of the chaotic cipher system, chaotic sequence cipher, used for key generation of chaotic random number generators and other key problems is discussed. For one-dimensional chaotic encryption algorithm, key space small, security is not higher defect, this paper selects logistic mapping coupled to generate twodimensional hyper chaotic system as the research object, the research focus on the hyper chaotic sequence in the application of data encryption, in chaotic data encryption algorithm to make some beneficial attempts, at the same time, the research on applications of chaos in data encryption to do some exploring.
Online controlled experiments (e.g., A/B tests) are now regularly used to guide product development and accelerate innovation in software. Product ideas are evaluated as scientific hypotheses, and tested in web sites, mobile applications, desktop applications, services, and operating systems. One of the key challenges for organizations that run controlled experiments is to come up with the right set of metrics [1] [2] [3]. Having good metrics, however, is not enough. In our experience of running thousands of experiments with many teams across Microsoft, we observed again and again how incorrect interpretations of metric movements may lead to wrong conclusions about the experiment's outcome, which if deployed could hurt the business by millions of dollars. Inspired by Steven Goodman's twelve p-value misconceptions [4], in this paper, we share twelve common metric interpretation pitfalls which we observed repeatedly in our experiments. We illustrate each pitfall with a puzzling example from a real experiment, and describe processes, metric design principles, and guidelines that can be used to detect and avoid the pitfall. With this paper, we aim to increase the experimenters' awareness of metric interpretation issues, leading to improved quality and trustworthiness of experiment results and better data-driven decisions.
Fuzzy density is an important part of fuzzy integral, which is used to describe the reliability of classifiers in the process of fusion. Most of the fuzzy density assignment methods are based on the training priori knowledge of the classifier and ignore the difference of the testing samples themselves. To better describe the real-time reliability of the classifier in the fusion process, the dispersion of the classifier is calculated according to the decision information which outputted by the classifier. Then the divisibility of the classifier is obtained through the information entropy of the dispersion. Finally, the divisibility and the priori knowledge are combined to get the fuzzy density which can be dynamically adjusted. Experiments on JAFFE and CK databases show that, compared with traditional fuzzy integral methods, the proposed method can effectively improve the decision performance of fuzzy integral and reduce the interference of unreliable output information to decision. And it is an effective multi-classifier fusion method.
Recently, emotion recognition has gained increasing attention in various applications related to Social Signal Processing (SSP) and human affect. The existing research is mainly focused on six basic emotions (happy, sad, fear, disgust, angry, and surprise). However human expresses many kind of emotions, including mix emotion which has not been explored due to its complexity. We model 12 types of mix emotion recognition from facial expression in a sequence of images using two-stages learning which combines Support Vector Machines (SVM) and Conditional Random Fields (CRF) as sequence classifiers. SVM classifies each image frame and produce emotion label output, subsequently it becomes the input for CRF which yields the mix emotion label of the corresponding observation sequence. We evaluate our proposed model on modified image frames of Cohn Kanade+ dataset, and on our own made mix emotion dataset. We also compare our model with the original CRF model, and our model shows a superior performance result.
Zero dynamics attack is lethal to cyber-physical systems in the sense that it is stealthy and there is no way to detect it. Fortunately, if the given continuous-time physical system is of minimum phase, the effect of the attack is negligible even if it is not detected. However, the situation becomes unfavorable again if one uses digital control by sampling the sensor measurement and using the zero-order-hold for actuation because of the `sampling zeros.' When the continuous-time system has relative degree greater than two and the sampling period is small, the sampled-data system must have unstable zeros (even if the continuous-time system is of minimum phase), so that the cyber-physical system becomes vulnerable to `sampling zero dynamics attack.' In this paper, we begin with its demonstration by a few examples. Then, we present an idea to protect the system by allocating those discrete-time zeros into stable ones. This idea is realized by employing the so-called `generalized hold' which replaces the zero-order-hold.
Network coding is a potential method that numerous investigators have move forwarded due to its significant advantages to enhance the proficiency of data communication. In this work, utilize simulations to assess the execution of various network topologies employing network coding. By contrasting the results of network and without network coding, it insists that network coding can improve the throughput, end-to-end delays, Packet Delivery Rate (PDR) and consistency. This paper presents the comparative performance analysis of network coding such as, XOR, LNC, and RLNC. The results demonstrates the XOR technique has attractive outcomes and can improve the real time performance metrics i.e.; throughput, end-to-end delay and PDR by substantial limitations. The analysis has been carried out based on packet size and also number of packets to be transmitted. Results illustrates that the network coding facilitate in dependence between networks.
Bitcoin, a peer-to-peer payment system and digital currency, is often involved in illicit activities such as scamming, ransomware attacks, illegal goods trading, and thievery. At the time of writing, the Bitcoin ecosystem has not yet been mapped and as such there is no estimate of the share of illicit activities. This paper provides the first estimation of the portion of cyber-criminal entities in the Bitcoin ecosystem. Our dataset consists of 854 observations categorised into 12 classes (out of which 5 are cybercrime-related) and a total of 100,000 uncategorised observations. The dataset was obtained from the data provider who applied three types of clustering of Bitcoin transactions to categorise entities: co-spend, intelligence-based, and behaviour-based. Thirteen supervised learning classifiers were then tested, of which four prevailed with a cross-validation accuracy of 77.38%, 76.47%, 78.46%, 80.76% respectively. From the top four classifiers, Bagging and Gradient Boosting classifiers were selected based on their weighted average and per class precision on the cybercrime-related categories. Both models were used to classify 100,000 uncategorised entities, showing that the share of cybercrime-related is 29.81% according to Bagging, and 10.95% according to Gradient Boosting with number of entities as the metric. With regard to the number of addresses and current coins held by this type of entities, the results are: 5.79% and 10.02% according to Bagging; and 3.16% and 1.45% according to Gradient Boosting.
Security at virtualization level has always been a major issue in cloud computing environment. A large number of virtual machines that are hosted on a single server by various customers/client may face serious security threats due to internal/external network attacks. In this work, we have examined and evaluated these threats and their impact on OpenStack private cloud. We have also discussed the most popular DOS (Denial-of-Service) attack on DHCP server on this private cloud platform and evaluated the vulnerabilities in an OpenStack networking component, Neutron, due to which this attack can be performed through rogue DHCP server. Finally, a solution, a game-theory based cloud architecture, that helps to detect and prevent DOS attacks in OpenStack has been proposed.
In the age of Big Data, we are witnessing a huge proliferation of digital data capturing our lives and our surroundings. Data privacy is a critical barrier to data analytics and privacy-preserving data disclosure becomes a key aspect to leveraging large-scale data analytics due to serious privacy risks. Traditional privacy-preserving data publishing solutions have focused on protecting individual's private information while considering all aggregate information about individuals as safe for disclosure. This paper presents a new privacy-aware data disclosure scheme that considers group privacy requirements of individuals in bipartite association graph datasets (e.g., graphs that represent associations between entities such as customers and products bought from a pharmacy store) where even aggregate information about groups of individuals may be sensitive and need protection. We propose the notion of $ε$g-Group Differential Privacy that protects sensitive information of groups of individuals at various defined group protection levels, enabling data users to obtain the level of information entitled to them. Based on the notion of group privacy, we develop a suite of differentially private mechanisms that protect group privacy in bipartite association graphs at different group privacy levels based on specialization hierarchies. We evaluate our proposed techniques through extensive experiments on three real-world association graph datasets and our results demonstrate that the proposed techniques are effective, efficient and provide the required guarantees on group privacy.
Location privacy has become a significant challenge of big data. Particularly, by the advantage of big data handling tools availability, huge location data can be managed and processed easily by an adversary to obtain user private information from Location-Based Services (LBS). So far, many methods have been proposed to preserve user location privacy for these services. Among them, dummy-based methods have various advantages in terms of implementation and low computation costs. However, they suffer from the spatiotemporal correlation issue when users submit consecutive requests. To solve this problem, a practical hybrid location privacy protection scheme is presented in this paper. The proposed method filters out the correlated fake location data (dummies) before submissions. Therefore, the adversary can not identify the user's real location. Evaluations and experiments show that our proposed filtering technique significantly improves the performance of existing dummy-based methods and enables them to effectively protect the user's location privacy in the environment of big data.
Supervisory Control and Data Acquisition (SCADA) systems complexity and interconnectivity increase in recent years have exposed the SCADA networks to numerous potential vulnerabilities. Several studies have shown that anomaly-based Intrusion Detection Systems (IDS) achieves improved performance to identify unknown or zero-day attacks. In this paper, we propose a hybrid model for anomaly-based intrusion detection in SCADA networks using machine learning approach. In the first part, we present a robust hybrid model for anomaly-based intrusion detection in SCADA networks. Finally, we present a feature selection model for anomaly-based intrusion detection in SCADA networks by removing redundant and irrelevant features. Irrelevant features in the dataset can affect modeling power and reduce predictive accuracy. These models were evaluated using an industrial control system dataset developed at the Distributed Analytics and Security Institute Mississippi State University Starkville, MS, USA. The experimental results show that our proposed model has a key effect in reducing the time and computational complexity and achieved improved accuracy and detection rate. The accuracy of our proposed model was measured as 99.5 % for specific-attack-labeled.
Collaborative Filtering (CF) is a successful technique that has been implemented in recommender systems and Privacy Preserving Collaborative Filtering (PPCF) aroused increasing concerns of the society. Current solutions mainly focus on cryptographic methods, obfuscation methods, perturbation methods and differential privacy methods. But these methods have some shortcomings, such as unnecessary computational cost, lower data quality and hard to calibrate the magnitude of noise. This paper proposes a (k, p, I)-anonymity method that improves the existing k-anonymity method in PPCF. The method works as follows: First, it applies Latent Factor Model (LFM) to reduce matrix sparsity. Then it improves Maximum Distance to Average Vector (MDAV) microaggregation algorithm based on importance partitioning to increase homogeneity among records in each group which can retain better data quality and (p, I)-diversity model where p is attacker's prior knowledge about users' ratings and I is the diversity among users in each group to improve the level of privacy preserving. Theoretical and experimental analyses show that our approach ensures a higher level of privacy preserving based on lower information loss.
In this paper, we address the problem of peer grouping employees in an organization for identifying security risks. Our motivation for studying peer grouping is its importance for a clear understanding of user and entity behavior analytics (UEBA) that is the primary tool for identifying insider threat through detecting anomalies in network traffic. We show that using Louvain method of community detection it is possible to automate peer group creation with feature-based weight assignments. Depending on the number of employees and their features we show that it is also possible to give each group a meaningful description. We present three new algorithms: one that allows an addition of new employees to already generated peer groups, another that allows for incorporating user feedback, and lastly one that provides the user with recommended nodes to be reassigned. We use Niara's data to validate our claims. The novelty of our method is its robustness, simplicity, scalability, and ease of deployment in a production environment.
Internet of Things (IoT) devices are resource constrained devices in terms of power, memory, bandwidth, and processing. On the other hand, multicast communication is considered more efficient in group oriented applications compared to unicast communication as transmission takes place using fewer resources. That is why many of IoT applications rely on multicast in their transmission. This multicast traffic need to be secured specially for critical applications involving actuators control. Securing multicast traffic by itself is cumbersome as it requires an efficient and scalable Group Key Management (GKM) protocol. In case of IoT, the situation is more difficult because of the dynamic nature of IoT scenarios. This paper introduces a solution based on using context aware security server accompanied with a group of key servers to efficiently distribute group encryption keys to IoT devices in order to secure the multicast sessions. The proposed solution is evaluated relative to the Logical Key Hierarchy (LKH) protocol. The comparison shows that the proposed scheme efficiently reduces the load on the key servers. Moreover, the key storage cost on both members and key servers is reduced.
Interval uncertainty can cause uncontrollable variations in the objective and constraint values, which could seriously deteriorate the performance or even change the feasibility of the optimal solutions. Robust optimization is to obtain solutions that are optimal and minimally sensitive to uncertainty. In this paper, a sequential multi-objective robust optimization (MORO) approach based on support vector machines (SVM) is proposed. Firstly, a sequential optimization structure is adopted to ease the computational burden. Secondly, SVM is used to construct a classification model to classify design alternatives into feasible or infeasible. The proposed approach is tested on a numerical example and an engineering case. Results illustrate that the proposed approach can reasonably approximate solutions obtained from the existing sequential MORO approach (SMORO), while the computational costs are significantly reduced compared with those of SMORO.
Enterprises usually provide strong controls to prevent cyberattacks and inadvertent leakage of data to external entities. However, in the case where employees and data scientists have legitimate access to analyze and derive insights from the data, there are insufficient controls and employees are usually permitted access to all information about the customers of the enterprise including sensitive and private information. Though it is important to be able to identify useful patterns of one's customers for better customization and service, customers' privacy must not be sacrificed to do so. We propose an alternative — a framework that will allow privacy preserving data analytics over big data. In this paper, we present an efficient and scalable framework for Apache Spark, a cluster computing framework, that provides strong privacy guarantees for users even in the presence of an informed adversary, while still providing high utility for analysts. The framework, titled Shade, includes two mechanisms — SparkLAP, which provides Laplacian perturbation based on a user's query and SparkSAM, which uses the contents of the database itself in order to calculate the perturbation. We show that the performance of Shade is substantially better than earlier differential privacy systems without loss of accuracy, particularly when run on datasets small enough to fit in memory, and find that SparkSAM can even exceed performance of an identical nonprivate Spark query.
This paper addresses the problem of state estimation of a linear time-invariant system when some of the sensors or/and actuators are under adversarial attack. In our set-up, the adversarial agent attacks a sensor (actuator) by manipulating its measurement (input), and we impose no constraint on how the measurements (inputs) are corrupted. We introduce the notion of ``sparse strong observability'' to characterize systems for which the state estimation is possible, given bounds on the number of attacked sensors and actuators. Furthermore, we develop a secure state estimator based on Satisfiability Modulo Theory (SMT) solvers.
Code smells may be introduced in software due to market rivalry, work pressure deadline, improper functioning, skills or inexperience of software developers. Code smells indicate problems in design or code which makes software hard to change and maintain. Detecting code smells could reduce the effort of developers, resources and cost of the software. Many researchers have proposed different techniques like DETEX for detecting code smells which have limited precision and recall. To overcome these limitations, a new technique named as SVMCSD has been proposed for the detection of code smells, based on support vector machine learning technique. Four code smells are specified namely God Class, Feature Envy, Data Class and Long Method and the proposed technique is validated on two open source systems namely ArgoUML and Xerces. The accuracy of SVMCSD is found to be better than DETEX in terms of two metrics, precision and recall, when applied on a subset of a system. While considering the entire system, SVMCSD detect more occurrences of code smells than DETEX.
Cyber-security threats are a growing concern in networked environments. The development of Intrusion Detection Systems (IDSs) is fundamental in order to provide extra level of security. We have developed an unsupervised anomaly-based IDS that uses statistical techniques to conduct the detection process. Despite providing many advantages, anomaly-based IDSs tend to generate a high number of false alarms. Machine Learning (ML) techniques have gained wide interest in tasks of intrusion detection. In this work, Support Vector Machine (SVM) is deemed as an ML technique that could complement the performance of our IDS, providing a second line of detection to reduce the number of false alarms, or as an alternative detection technique. We assess the performance of our IDS against one-class and two-class SVMs, using linear and non- linear forms. The results that we present show that linear two-class SVM generates highly accurate results, and the accuracy of the linear one-class SVM is very comparable, and it does not need training datasets associated with malicious data. Similarly, the results evidence that our IDS could benefit from the use of ML techniques to increase its accuracy when analysing datasets comprising of non- homogeneous features.
Named Data Networking (NDN) is a content-oriented future Internet architecture, which well suits the increasingly mobile and information-intensive applications that dominate today's Internet. NDN relies on in-network caching to facilitate content delivery. This makes it challenging to enforce access control since the content has been cached in the routers and the content producer has lost the control over it. Due to its salient advantages in content delivery, network coding has been introduced into NDN to improve content delivery effectiveness. In this paper, we design ACNC, the first Access Control solution specifically for Network Coding-based NDN. By combining a novel linear AONT (All Or Nothing Transform) and encryption, we can ensure that only the legitimate user who possesses the authorization key can successfully recover the encoding matrix for network coding, and hence can recover the content being transmitted. In addition, our design has two salient merits: 1) the linear AONT well suits the linear nature of network coding; 2) only one vector of the encoding matrix needs to be encrypted/decrypted, which only incurs small computational overhead. Security analysis and experimental evaluation in ndnSIM show that our design can successfully enforce access control on network coding-based NDN with an acceptable overhead.
While the growth of cloud-based technologies has benefited the society tremendously, it has also increased the surface area for cyber attacks. Given that cloud services are prevalent today, it is critical to devise systems that detect intrusions. One form of security breach in the cloud is when cyber-criminals compromise Virtual Machines (VMs) of unwitting users and, then, utilize user resources to run time-consuming, malicious, or illegal applications for their own benefit. This work proposes a method to detect unusual resource usage trends and alert the user and the administrator in real time. We experiment with three categories of methods: simple statistical techniques, unsupervised classification, and regression. So far, our approach successfully detects anomalous resource usage when experimenting with typical trends synthesized from published real-world web server logs and cluster traces. We observe the best results with unsupervised classification, which gives an average F1-score of 0.83 for web server logs and 0.95 for the cluster traces.
Vulnerability being the buzz word in the modern time is the most important jargon related to software and operating system. Since every now and then, software is developed some loopholes and incompleteness lie in the development phase, so there always remains a vulnerability of abruptness in it which can come into picture anytime. Detecting vulnerability is one thing and predicting its occurrence in the due course of time is another thing. If we get to know the vulnerability of any software in the due course of time then it acts as an active alarm for the developers to again develop sound and improvised software the second time. The proposal talks about the implementation of the idea using the artificial neural network, where different data sets are being given as input for being used for further analysis for successful results. As of now, there are models for studying the vulnerabilities in the software and networks, this paper proposal in addition to the current work, will throw light on the predictability of vulnerabilities over the due course of time.
Malware classification is a critical part in the cyber-security. Traditional methodologies for the malware classification typically use static analysis and dynamic analysis to identify malware. In this paper, a malware classification methodology based on its binary image and extracting local binary pattern (LBP) features is proposed. First, malware images are reorganized into 3 by 3 grids which is mainly used to extract LBP feature. Second, the LBP is implemented on the malware images to extract features in that it is useful in pattern or texture classification. Finally, Tensorflow, a library for machine learning, is applied to classify malware images with the LBP feature. Performance comparison results among different classifiers with different image descriptors such as GIST, a spatial envelop, and the LBP demonstrate that our proposed approach outperforms others.
Visual object tracking is challenging when the object appearances occur significant changes, such as scale change, background clutter, occlusion, and so on. In this paper, we crop different sizes of multiscale templates around object and input these multiscale templates into network to pretrain the network adaptive the size change of tracking object. Different from previous the tracking method based on deep convolutional neural network (CNN), we exploit deep Residual Network (ResNet) to offline train a multiscale object appearance model on the ImageNet, and then the features from pretrained network are transferred into tracking tasks. Meanwhile, the proposed method combines the multilayer convolutional features, it is robust to disturbance, scale change, and occlusion. In addition, we fuse multiscale search strategy into three kernelized correlation filter, which strengthens the ability of adaptive scale change of object. Unlike the previous methods, we directly learn object appearance change by integrating multiscale templates into the ResNet. We compared our method with other CNN-based or correlation filter tracking methods, the experimental results show that our tracking method is superior to the existing state-of-the-art tracking method on Object Tracking Benchmark (OTB-2015) and Visual Object Tracking Benchmark (VOT-2015).