Biblio
Resource scheduling in a computing system addresses the problem of packing tasks with multi-dimensional resource requirements and non-functional constraints. The exhibited heterogeneity of workload and server characteristics in Cloud-scale or Internet-scale systems is adding further complexity and new challenges to the problem. Compared with,,,, existing solutions based on ad-hoc heuristics, Machine Learning (ML) has the potential to improve further the efficiency of resource management in large-scale systems. In this paper we,,,, will describe and discuss how ML could be used to understand automatically both workloads and environments, and to help to cope with scheduling-related challenges such as consolidating co-located workloads, handling resource requests, guaranteeing application's QoSs, and mitigating tailed stragglers. We will introduce a generalized ML-based solution to large-scale resource scheduling and demonstrate its effectiveness through a case study that deals with performance-centric node classification and straggler mitigation. We believe that an MLbased method will help to achieve architectural optimization and efficiency improvement.
In this study, we have used the Image Similarity technique to detect the unknown or new type of malware using CNN ap- proach. CNN was investigated and tested with three types of datasets i.e. one from Vision Research Lab, which contains 9458 gray-scale images that have been extracted from the same number of malware samples that come from 25 differ- ent malware families, and second was benign dataset which contained 3000 different kinds of benign software. Benign dataset and dataset vision research lab were initially exe- cutable files which were converted in to binary code and then converted in to image files. We obtained a testing ac- curacy of 98% on Vision Research dataset.
Smart buildings are controlled by multiple cyber-physical systems that provide critical services such as heating, ventilation, lighting and access control. These building systems are becoming increasingly vulnerable to both cyber and physical attacks. We introduce a multi-model methodology for assessing the security of these systems, which utilises INTO-CPS, a suite of modelling, simulation, and analysis tools for designing cyber-physical systems. Using a fan coil unit case study we show how its security can be systematically assessed when subjected to Man-in-the-Middle attacks on the data connections between system components. We suggest our methodology would enable building managers and security engineers to design attack countermeasures and refine their effectiveness.
SQL Injection continues to be one of the most damaging security exploits in terms of personal information exposure as well as monetary loss. Injection attacks are the number one vulnerability in the most recent OWASP Top 10 report, and the number of these attacks continues to increase. Traditional defense strategies often involve static, signature-based IDS (Intrusion Detection System) rules which are mostly effective only against previously observed attacks but not unknown, or zero-day, attacks. Much current research involves the use of machine learning techniques, which are able to detect unknown attacks, but depending on the algorithm can be costly in terms of performance. In addition, most current intrusion detection strategies involve collection of traffic coming into the web application either from a network device or from the web application host, while other strategies collect data from the database server logs. In this project, we are collecting traffic from two points: at the web application host, and at a Datiphy appliance node located between the webapp host and the associated MySQL database server. In our analysis of these two datasets, and another dataset that is correlated between the two, we have been able to demonstrate that accuracy obtained with the correlated dataset using algorithms such as rule-based and decision tree are nearly the same as those with a neural network algorithm, but with greatly improved performance.
Traditional security practices focus on negative incentives that attempt to force compliance through constraints, monitoring, and punishment. This paper describes a missing dimension of most organizations' insider threat defense-one that explicitly considers positive incentives for attracting individuals to act in the interests of the organization. Positive incentives focus on properties of the organizational context of workforce management practices - including those relating to organizational supportiveness, coworker connectedness, and job engagement. Without due attention to the organizational context in which insider threats occur, insider misbehaviors may simply reoccur as a natural response to counterproductive or dysfunctional management practices. A balanced combination of positive and negative incentives can improve employees' relationships with the organization and provide a means for employees to better cope with personal and professional stressors. An insider threat program that balances organizational incentives can become an advocate for the workforce and a means for improving employee work life - a welcome message to employees who feel threatened by programs focused on discovering insider wrongdoing.
In big data era, machine learning is one of fundamental techniques in intrusion detection systems (IDSs). Poisoning attack, which is one of the most recognized security threats towards machine learning- based IDSs, injects some adversarial samples into the training phase, inducing data drifting of training data and a significant performance decrease of target IDSs over testing data. In this paper, we adopt the Edge Pattern Detection (EPD) algorithm to design a novel poisoning method that attack against several machine learning algorithms used in IDSs. Specifically, we propose a boundary pattern detection algorithm to efficiently generate the points that are near to abnormal data but considered to be normal ones by current classifiers. Then, we introduce a Batch-EPD Boundary Pattern (BEBP) detection algorithm to overcome the limitation of the number of edge pattern points generated by EPD and to obtain more useful adversarial samples. Based on BEBP, we further present a moderate but effective poisoning method called chronic poisoning attack. Extensive experiments on synthetic and three real network data sets demonstrate the performance of the proposed poisoning method against several well-known machine learning algorithms and a practical intrusion detection method named FMIFS-LSSVM-IDS.
Despite decades of research on software diversification, only address space layout randomization has seen widespread adoption. Code randomization, an effective defense against return-oriented programming exploits, has remained an academic exercise mainly due to i) the lack of a transparent and streamlined deployment model that does not disrupt existing software distribution norms, and ii) the inherent incompatibility of program variants with error reporting, whitelisting, patching, and other operations that rely on code uniformity. In this work we present compiler-assisted code randomization (CCR), a hybrid approach that relies on compiler-rewriter cooperation to enable fast and robust fine-grained code randomization on end-user systems, while maintaining compatibility with existing software distribution models. The main concept behind CCR is to augment binaries with a minimal set of transformation-assisting metadata, which i) facilitate rapid fine-grained code transformation at installation or load time, and ii) form the basis for reversing any applied code transformation when needed, to maintain compatibility with existing mechanisms that rely on referencing the original code. We have implemented a prototype of this approach by extending the LLVM compiler toolchain, and developing a simple binary rewriter that leverages the embedded metadata to generate randomized variants using basic block reordering. The results of our experimental evaluation demonstrate the feasibility and practicality of CCR, as on average it incurs a modest file size increase of 11.46% and a negligible runtime overhead of 0.28%, while it is compatible with link-time optimization and control flow integrity.
Code churn has been successfully used to identify defect inducing changes in software development. Our recent analysis of the cross-release code churn showed that several design metrics exhibit moderate correlation with the number of defects in complex systems. The goal of this paper is to explore whether cross-release code churn can be used to identify critical design change and contribute to prediction of defects for software in evolution. In our case study, we used two types of data from consecutive releases of open-source projects, with and without cross-release code churn, to build standard prediction models. The prediction models were trained on earlier releases and tested on the following ones, evaluating the performance in terms of AUC, GM and effort aware measure Pop. The comparison of their performance was used to answer our research question. The obtained results showed that the prediction model performs better when cross-release code churn is included. Practical implication of this research is to use cross-release code churn to aid in safe planning of next release in software development.
The plethora of mobile apps introduce critical challenges to digital forensics practitioners, due to the diversity and the large number (millions) of mobile apps available to download from Google play, Apple store, as well as hundreds of other online app stores. Law enforcement investigators often find themselves in a situation that on the seized mobile phone devices, there are many popular and less-popular apps with interface of different languages and functionalities. Investigators would not be able to have sufficient expert-knowledge about every single app, sometimes nor even a very basic understanding about what possible evidentiary data could be discoverable from these mobile devices being investigated. Existing literature in digital forensic field showed that most such investigations still rely on the investigator's manual analysis using mobile forensic toolkits like Cellebrite and Encase. The problem with such manual approaches is that there is no guarantee on the completeness of such evidence discovery. Our goal is to develop an automated mobile app analysis tool to analyze an app and discover what types of and where forensic evidentiary data that app generate and store locally on the mobile device or remotely on external 3rd-party server(s). With the app analysis tool, we will build a database of mobile apps, and for each app, we will create a list of app-generated evidence in terms of data types, locations (and/or sequence of locations) and data format/syntax. The outcome from this research will help digital forensic practitioners to reduce the complexity of their case investigations and provide a better completeness guarantee of evidence discovery, thereby deliver timely and more complete investigative results, and eventually reduce backlogs at crime labs. In this paper, we will present the main technical approaches for us to implement a dynamic Taint analysis tool for Android apps forensics. With the tool, we have analyzed 2,100 real-world Android apps. For each app, our tool produces the list of evidentiary data (e.g., GPS locations, device ID, contacts, browsing history, and some user inputs) that the app could have collected and stored on the devices' local storage in the forms of file or SQLite database. We have evaluated our tool using both benchmark apps and real-world apps. Our results demonstrated that the initial success of our tool in accurately discovering the evidentiary data.
In the last few decades, the relative simplicity of the logistic map made it a widely accepted point in the consideration of chaos, which is having the good properties of unpredictability, sensitiveness in the key values and ergodicity. Further, the system parameters fit the requirements of a cipher widely used in the field of cryptography, asymmetric and symmetric key chaos based cryptography, and for pseudorandom sequence generation. Also, the hardware-based embedded system is configured on FPGA devices for high performance. In this paper, a novel stream cipher using chaotic logistic map is proposed. The two chaotic logistic maps are coded using Verilog HDL and implemented on commercially available FPGA hardware using Xilinx device: XC3S250E for the part: FT256 and operated at frequency of 62.20 MHz to generate the non-recursive key which is used in key scheduling of pseudorandom number generation (PRNG) to produce the key stream. The realization of proposed cryptosystem in this FPGA device accomplishes the improved efficiency equal to 0.1186 Mbps/slice. Further, the generated binary sequence from the experiment is analyzed for X-power, thermal analysis, and randomness tests are performed using NIST statistical.
Multicast distribution employs the model of many-to-many so that it is a more efficient way of data delivery compared to traditional one-to-one unicast distribution, which can benefit many applications such as media streaming. However, the lack of security features in its nature makes multicast technology much less popular in an open environment such as the Internet. Internet Service Providers (ISPs) take advantage of IP multicast technology's high efficiency of data delivery to provide Internet Protocol Television (IPTV) to their users. But without the full control on their networks, ISPs cannot collect revenue for the services they provide. Secure Internet Group Management Protocol (SIGMP), an extension of Internet Group Management Protocol (IGMP), and Group Security Association Management Protocol (GSAM), have been proposed to enforce receiver access control at the network level of IP multicast. In this paper, we analyze operational details and issues of both SIGMP and GSAM. An examination of the performance of both protocols is also conducted.
In rural/remote areas, resource constrained smart micro-grid (RCSMG) architectures can offer a cost-effective power management and supply alternative to national power grid connections. RCSMG architectures handle communications over distributed lossy networks to minimize operation costs. However, the unreliable nature of lossy networks makes privacy an important consideration. Existing anonymisation works on data perturbation work mainly by distortion with additive noise. Apply these solutions to RCSMGs is problematic, because deliberate noise additions must be distinguishable both from system and adversarial generated noise. In this paper, we present a brief survey of privacy risks in RCSMGs centered on inference, and propose a method of mitigating these risks. The lesson here is that while RCSMGs give users more control over power management and distribution, good anonymisation is essential to protecting personal information on RCSMGs.
SQL injection is well known a method of executing SQL queries and retrieving sensitive information from a website connected database. This process poses a threat to those applications which are poorly coded in the today's world. SQL is considered as one of the top 10 vulnerabilities even in 2018. To keep a track of the vulnerabilities that each of the websites are facing, we employ a tool called Acunetix which allows us to find the vulnerabilities of a specific website. This tool also suggests measures on how to ensure preventive measures. Using this implementation, we discover vulnerabilities in an actual website. Such a real-world implementation would be useful for instructional use in a foundational cybersecurity course.
Fuzzing is a simple yet effective approach to discover software bugs utilizing randomly generated inputs. However, it is limited by coverage and cannot find bugs hidden in deep execution paths of the program because the randomly generated inputs fail complex sanity checks, e.g., checks on magic values, checksums, or hashes. To improve coverage, existing approaches rely on imprecise heuristics or complex input mutation techniques (e.g., symbolic execution or taint analysis) to bypass sanity checks. Our novel method tackles coverage from a different angle: by removing sanity checks in the target program. T-Fuzz leverages a coverage-guided fuzzer to generate inputs. Whenever the fuzzer can no longer trigger new code paths, a light-weight, dynamic tracing based technique detects the input checks that the fuzzer-generated inputs fail. These checks are then removed from the target program. Fuzzing then continues on the transformed program, allowing the code protected by the removed checks to be triggered and potential bugs discovered. Fuzzing transformed programs to find bugs poses two challenges: (1) removal of checks leads to over-approximation and false positives, and (2) even for true bugs, the crashing input on the transformed program may not trigger the bug in the original program. As an auxiliary post-processing step, T-Fuzz leverages a symbolic execution-based approach to filter out false positives and reproduce true bugs in the original program. By transforming the program as well as mutating the input, T-Fuzz covers more code and finds more true bugs than any existing technique. We have evaluated T-Fuzz on the DARPA Cyber Grand Challenge dataset, LAVA-M dataset and 4 real-world programs (pngfix, tiffinfo, magick and pdftohtml). For the CGC dataset, T-Fuzz finds bugs in 166 binaries, Driller in 121, and AFL in 105. In addition, found 3 new bugs in previously-fuzzed programs and libraries.
Assertions are helpful in program analysis, such as software testing and verification. The most challenging part of automatically recommending assertions is to design the assertion patterns and to insert assertions in proper locations. In this paper, we develop Weak-Assert, a weakness-oriented assertion recommendation toolkit for program analysis of C code. A weakness-oriented assertion is an assertion which can help to find potential program weaknesses. Weak-Assert uses well-designed patterns to match the abstract syntax trees of source code automatically. It collects significant messages from trees and inserts assertions into proper locations of programs. These assertions can be checked by using program analysis techniques. The experiments are set up on Juliet test suite and several actual projects in Github. Experimental results show that Weak-Assert helps to find 125 program weaknesses in 26 actual projects. These weaknesses are confirmed manually to be triggered by some test cases.
We explore methods of producing adversarial examples on deep generative models such as the variational autoencoder (VAE) and the VAE-GAN. Deep learning architectures are known to be vulnerable to adversarial examples, but previous work has focused on the application of adversarial examples to classification tasks. Deep generative models have recently become popular due to their ability to model input data distributions and generate realistic examples from those distributions. We present three classes of attacks on the VAE and VAE-GAN architectures and demonstrate them against networks trained on MNIST, SVHN and CelebA. Our first attack leverages classification-based adversaries by attaching a classifier to the trained encoder of the target generative model, which can then be used to indirectly manipulate the latent representation. Our second attack directly uses the VAE loss function to generate a target reconstruction image from the adversarial example. Our third attack moves beyond relying on classification or the standard loss for the gradient and directly optimizes against differences in source and target latent representations. We also motivate why an attacker might be interested in deploying such techniques against a target generative network.