Biblio
AI technologies have demonstrated great value to missions as diverse as space-based imagery analysis, cyberattack warning, supply chain logistics and analysis of microbiologic systems. At the same time, the failure modes of AI technologies are poorly understood. DARPA is working to address this shortfall, with focused R&D, both analytic and empirical. DARPA’s success is essential for the Department to deploy AI technologies, particularly to the tactical edge, where reliable performance is required.
In this paper, we propose a graph-based algorithmic technique for malware detection, utilizing the System-call Dependency Graphs (ScDG) obtained through taint analysis traces. We leverage the grouping of system-calls into system-call groups with respect to their functionality to merge disjoint vertices of ScDG graphs, transforming them to Group Relation Graphs (GrG); note that, the GrG graphs represent malware's behavior being hence more resilient to probable mutations of its structure. More precisely, we extend the use of GrG graphs by mapping their vertices on the plane utilizing the degrees and the vertex-weights of a specific underlying graph of the GrG graph as to compute domination relations. Furthermore, we investigate how the activity of each system-call group could be utilized in order to distinguish graph-representations of malware and benign software. The domination relations among the vertices of GrG graphs result to a new graph representation that we call Coverage Graph of the GrG graph. Finally, we evaluate the potentials of our detection model using graph similarity between Coverage Graphs of known malicious and benign software samples of various types.
Detecting fake accounts (sybils) in online social networks (OSNs) is vital to protect OSN operators and their users from various malicious activities. Typical graph-based sybil detection (a mainstream methodology) assumes that sybils can make friends with only a limited (or small) number of honest users. However, recent evidences showed that this assumption does not hold in real-world OSNs, leading to low detection accuracy. To address this challenge, we explore users' activities to assist sybil detection. The intuition is that honest users are much more selective in choosing who to interact with than to befriend with. We first develop the social and activity network (SAN), a two-layer hyper-graph that unifies users' friendships and their activities, to fully utilize users' activities. We also propose a more practical sybil attack model, where sybils can launch both friendship attacks and activity attacks. We then design Sybil SAN to detect sybils via coupling three random walk-based algorithms on the SAN, and prove the convergence of Sybil SAN. We develop an efficient iterative algorithm to compute the detection metric for Sybil SAN, and derive the number of rounds needed to guarantee the convergence. We use "matrix perturbation theory" to bound the detection error when sybils launch many friendship attacks and activity attacks. Extensive experiments on both synthetic and real-world datasets show that Sybil SAN is highly robust against sybil attacks, and can detect sybils accurately under practical scenarios, where current state-of-art sybil defenses have low accuracy.
Short Message Service is now-days the most used way of communication in the electronic world. While many researches exist on the email spam detection, we haven't had the insight knowledge about the spam done within the SMS's. This might be because the frequency of spam in these short messages is quite low than the emails. This paper presents different ways of analyzing spam for SMS and a new pre-processing way to get the actual dataset of spam messages. This dataset was then used on different algorithm techniques to find the best working algorithm in terms of both accuracy and recall. Random Forest algorithm was then implemented in a real world application library written in C\# for cross platform .Net development. This library is capable of using a prebuild model for classifying a new dataset for spam and ham.
Caching query results is an efficient technique for Web search engines. A state-of-the-art approach named Static-Dynamic Cache (SDC) is widely used in practice. Replacement policy is the key factor on the performance of cache system, and has been widely studied such as LIRS, ARC, CLOCK, SKLRU and RANDOM in different research areas. In this paper, we discussed replacement policies for static-dynamic cache and conducted the experiments on real large scale query logs from two famous commercial Web search engine companies. The experimental results show that ARC replacement policy could work well with static-dynamic cache, especially for large scale query results cache.
Never Alone (2016) is a generative large-scale urban screen video-sound installation, which presents the idea of generative choreographies amongst multiple video agents, or "digital performers". This generative installation questions how we navigate in urban spaces and the ubiquity and disruptive nature of encounters within the cities' landscapes. The video agents explore precarious movement paths along the façade inhabiting landscapes that are both architectural and emotional.
In this research paper, we describe an algorithm that could be implemented on an intrusion response system (IRS) designed specifically for mobile ad hoc networks (MANET). Designed to supplement a MANET's hierarchical intrusion detection system (IDS), this IRS and its associated algorithm would be implemented on the root node operating in such an IRS, and would rely on the optimized link state routing protocol (OLSR) to determine facts about the topology of the network, and use that determination to facilitate responding to network intrusions and attacks. The algorithm operates in a query-response mode, where the IRS function of the IDS root node queries the implemented algorithm, and the algorithm returns its response, formatted as an unordered list of nodes satisfying the query.
In this research paper, we describe an algorithm that could be implemented on an intrusion response system (IRS) designed specifically for mobile ad hoc networks (MANET). Designed to supplement a MANET's hierarchical intrusion detection system (IDS), this IRS and its associated algorithm would be implemented on the root node operating in such an IRS, and would rely on the optimized link state routing protocol (OLSR) to determine facts about the topology of the network, and use that determination to facilitate responding to network intrusions and attacks. The algorithm operates in a query-response mode, where the IRS function of the IDS root node queries the implemented algorithm, and the algorithm returns its response, formatted as an unordered list of nodes satisfying the query.
Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.
Dynamic taint analysis can be used as a defense against low-integrity data in applications with untrusted user interfaces. An important example is defense against XSS and injection attacks in programs with web interfaces. Data sanitization is commonly used in this context, and can be treated as a precondition for endorsement in a dynamic integrity taint analysis. However, sanitization is often incomplete in practice. We develop a model of dynamic integrity taint analysis for Java that addresses imperfect sanitization with an in-depth approach. To avoid false positives, results of sanitization are endorsed for access control (aka prospective security), but are tracked and logged for auditing and accountability (aka retrospective security). We show how this heterogeneous prospective/retrospective mechanism can be specified as a uniform policy, separate from code. We then use this policy to establish correctness conditions for a program rewriting algorithm that instruments code for the analysis. The rewriting itself is a model of existing, efficient Java taint analysis tools.
The American National Standards Institute (ANSI) has standardized an access control approach, Next Generation Access Control (NGAC), that enables simultaneous instantiation of multiple access control policies. For large complex enterprises this is critical to limiting the authorized access of insiders. However, the specifications describe the required access control capabilities but not the related algorithms. While appropriate, this leave open the important question as to whether or not NGAC is scalable. Existing cubic reference implementations indicate that it does not. For example, the primary NGAC reference implementation took several minutes to simply display the set of files accessible to a user on a moderately sized system. To solve this problem we provide an efficient access control decision algorithm, reducing the overall complexity from cubic to linear. Our other major contribution is to provide a novel mechanism for administrators and users to review allowed access rights. We provide an interface that appears to be a simple file directory hierarchy but in reality is an automatically generated structure abstracted from the underlying access control graph that works with any set of simultaneously instantiated access control policies. Our work thus provides the first efficient implementation of NGAC while enabling user privilege review through a novel visualization approach. These capabilities help limit insider access to information (and thereby limit information leakage) by enabling the efficient simultaneous instantiation of multiple access control policies.
Metaheuristics include a wide range of optimization algorithms. Some of them are very well known and with proven value, as they solve successfully many examples of combinatorial NP-hard problems. Some examples of Metaheuristics are Genetic Algorithms (GA), Simulated Annealing (SA) or Ant Colony Optimization (ACO). Our company is devoted to making steel and is the biggest steelmaker in the world. Combining several industrial processes to produce 84.6 million tones (public official data of 2015) involves huge effort. Metaheuristics are applied to different scenarios inside our operations to optimize different areas: logistics, production scheduling or resource assignment, saving costs and helping to reach operational excellence, critical for our survival in a globalized world. Rather than obtaining the global optimal solution, the main interest of an industrial company is to have "good solutions", close to the optimal, but within a very short response time, and this latter requirement is the main difference with respect to the traditional research approach from the academic world. Production is continuous and it cannot be stopped or wait for calculations, in addition, reducing production speed implies decreasing productivity and making the facilities less competitive. Disruptions are common events, making rescheduling imperative while foremen wait for new instructions to operate. This position paper explains the problem of the time response in our industrial environment, the solutions we have investigated and some results already achieved.
Machine learning is being used in a wide range of application domains to discover patterns in large datasets. Increasingly, the results of machine learning drive critical decisions in applications related to healthcare and biomedicine. Such health-related applications are often sensitive, and thus, any security breach would be catastrophic. Naturally, the integrity of the results computed by machine learning is of great importance. Recent research has shown that some machine-learning algorithms can be compromised by augmenting their training datasets with malicious data, leading to a new class of attacks called poisoning attacks. Hindrance of a diagnosis may have life-threatening consequences and could cause distrust. On the other hand, not only may a false diagnosis prompt users to distrust the machine-learning algorithm and even abandon the entire system but also such a false positive classification may cause patient distress. In this paper, we present a systematic, algorithm-independent approach for mounting poisoning attacks across a wide range of machine-learning algorithms and healthcare datasets. The proposed attack procedure generates input data, which, when added to the training set, can either cause the results of machine learning to have targeted errors (e.g., increase the likelihood of classification into a specific class), or simply introduce arbitrary errors (incorrect classification). These attacks may be applied to both fixed and evolving datasets. They can be applied even when only statistics of the training dataset are available or, in some cases, even without access to the training dataset, although at a lower efficacy. We establish the effectiveness of the proposed attacks using a suite of six machine-learning algorithms and five healthcare datasets. Finally, we present countermeasures against the proposed generic attacks that are based on tracking and detecting deviations in various accuracy metrics, and benchmark their effectiveness.
One of the main issues in the design of modern integrated circuits is power reduction. Mainly in digital circuits, the power consumption was defined by the dynamic power consumption, during decades. But in the new NanoCMOs technologies, the static power due to the leakage current is becoming the main issue in power consumption. As the leakage power is related to the amount of components, it is becoming mandatory to reduce the amount of transistors in any type of design, to reduce power consumption. So, it is important to obtain new EDA algorithms and tools to optimize the amount of components (transistors). It is also needed tools for the layout design automation that are able to design any network of components that is provided by an optimization tool that is able to reduce the size of the network of components. It is presented an example of a layout design automation tool that can do the layout of any network of transistors using transistors of any size. Another issue for power optimization is the use of tools and algorithms for gate sizing. The designer can manage the sizing of transistors to reduce power consumption, without compromising the clock frequency. There are two types of gate sizing, discrete gate sizing and continuous gate sizing. The discrete gate sizing tools are used when it is being used a cell library that has only few available sizes for each cell. The continuous gate sizing considers that the EDA tool can define any transistor sizing. In this case, the designer needs to have a layout design tool able to do the layout of transistors with any size. It will be presented the winner tools of the ISPD Contest 2012 and 2013. Also, it will be discussed the inclusion of our gate sizing algorithms in an industrial flow used to design state-of-the-art microprocessors. Another type of EDA tool that is becoming more and more useful is the visualization tools that provide an animated visual output of the running of EDA tools. This kind of tools is very usef- l to show to the tool developers how the tool is running. So, the EDA developers can use this information to improve the algorithms used in an EDA Tool.
Self-protecting software systems are a class of autonomic systems capable of detecting and mitigating security threats at runtime. They are growing in importance, as the stovepipe static methods of securing software systems have been shown to be inadequate for the challenges posed by modern software systems. Self-protection, like other self-* properties, allows the system to adapt to the changing environment through autonomic means without much human intervention, and can thereby be responsive, agile, and cost effective. While existing research has made significant progress towards autonomic and adaptive security, gaps and challenges remain. This article presents a significant extension of our preliminary study in this area. In particular, unlike our preliminary study, here we have followed a systematic literature review process, which has broadened the scope of our study and strengthened the validity of our conclusions. By proposing and applying a comprehensive taxonomy to classify and characterize the state-of-the-art research in this area, we have identified key patterns, trends and challenges in the existing approaches, which reveals a number of opportunities that will shape the focus of future research efforts.