Biblio
Machine learning is enabling a myriad innovations, including new algorithms for cancer diagnosis and self-driving cars. The broad use of machine learning makes it important to understand the extent to which machine-learning algorithms are subject to attack, particularly when used in applications where physical security or safety is at risk. In this paper, we focus on facial biometric systems, which are widely used in surveillance and access control. We define and investigate a novel class of attacks: attacks that are physically realizable and inconspicuous, and allow an attacker to evade recognition or impersonate another individual. We develop a systematic method to automatically generate such attacks, which are realized through printing a pair of eyeglass frames. When worn by the attacker whose image is supplied to a state-of-the-art face-recognition algorithm, the eyeglasses allow her to evade being recognized or to impersonate another individual. Our investigation focuses on white-box face-recognition systems, but we also demonstrate how similar techniques can be used in black-box scenarios, as well as to avoid face detection.
In the past three years, Emotion Recognition in the Wild (EmotiW) Grand Challenge has drawn more and more attention due to its huge potential applications. In the fourth challenge, aimed at the task of video based emotion recognition, we propose a multi-clue emotion fusion (MCEF) framework by modeling human emotion from three mutually complementary sources, facial appearance texture, facial action, and audio. To extract high-level emotion features from sequential face images, we employ a CNN-RNN architecture, where face image from each frame is first fed into the fine-tuned VGG-Face network to extract face feature, and then the features of all frames are sequentially traversed in a bidirectional RNN so as to capture dynamic changes of facial textures. To attain more accurate facial actions, a facial landmark trajectory model is proposed to explicitly learn emotion variations of facial components. Further, audio signals are also modeled in a CNN framework by extracting low-level energy features from segmented audio clips and then stacking them as an image-like map. Finally, we fuse the results generated from three clues to boost the performance of emotion recognition. Our proposed MCEF achieves an overall accuracy of 56.66% with a large improvement of 16.19% with respect to the baseline.
In Data mining is the method of extracting the knowledge from huge amount of data and interesting patterns. With the rapid increase of data storage, cloud and service-based computing, the risk of misuse of data has become a major concern. Protecting sensitive information present in the data is crucial and critical. Data perturbation plays an important role in privacy preserving data mining. The major challenge of privacy preserving is to concentrate on factors to achieve privacy guarantee and data utility. We propose a data perturbation method that perturbs the data using fuzzy logic and random rotation. It also describes aspects of comparable level of quality over perturbed data and original data. The comparisons are illustrated on different multivariate datasets. Experimental study has proved the model is better in achieving privacy guarantee of data, as well as data utility.
Within few years, Cloud computing has emerged as the most promising IT business model. Thanks to its various technical and financial advantages, Cloud computing continues to convince every day new users coming from scientific and industrial sectors. To satisfy the various users' requirements, Cloud providers must maximize the performance of their IT resources to ensure the best service at the lowest cost. The performance optimization efforts in the Cloud can be achieved at different levels and aspects. In the present paper, we propose to introduce a fuzzy logic process in scheduling strategy for public Cloud in order to improve the response time, processing time and total cost. In fact, fuzzy logic has proven his ability to solve the problem of optimization in several fields such as data mining, image processing, networking and much more.
Over the last few decades, accessibility scenarios have undergone a drastic change. Today the way people access information and resources is quite different from the age when internet was not evolved. The evolution of the Internet has made remarkable, epoch-making changes and has become the backbone of smart city. The vision of smart city revolves around seamless connectivity. Constant connectivity can provide uninterrupted services to users such as e-governance, e-banking, e-marketing, e-shopping, e-payment and communication through social media. And to provide uninterrupted services to such applications to citizens is our prime concern. So this paper focuses on smart handoff framework for next generation heterogeneous networks in smart cities to provide all time connectivity to anyone, anyhow and anywhere. To achieve this, three strategies have been proposed for handoff initialization phase-Mobile controlled, user controlled and network controlled handoff initialization. Each strategy considers a different set of parameters. Results show that additional parameters with RSSI and adaptive threshold and hysteresis solve ping-pong and corner effect problems in smart city.
The performance of clustering is a crucial challenge, especially for pattern recognition. The models aggregation has a positive impact on the efficiency of Data clustering. This technique is used to obtain more cluttered decision boundaries by aggregating the resulting clustering models. In this paper, we study an aggregation scheme to improve the stability and accuracy of clustering, which allows to find a reliable and robust clustering model. We demonstrate the advantages of our aggregation method by running Fuzzy C-Means (FCM) clustering on Reuters-21578 corpus. Experimental studies showed that our scheme optimized the bias-variance on the selected model and achieved enhanced clustering for unstructured textual resources.
In this paper, we describe the formatting guidelines for ACM SIG Proceedings. In order to assure safety of Chinese Train Control System (CTCS), it is necessary to ensure the operational risk is acceptable throughout its life-cycle, which requires a pragmatic risk assessment required for effective risk control. Many risk assessment techniques currently used in railway domain are qualitative, and rely on the experience of experts, which unavoidably brings in subjective judgements. This paper presents a method that combines fuzzy reasoning and analytic hierarchy process approach to quantify the experiences of experts to get the scores of risk parameters. Fuzzy reasoning is used to obtain the risk of system hazard, analytic hierarchy process approach is used to determine the risk level (RL) and its membership of the system. This method helps safety analyst to calculate overall collective risk level of system. A case study of risk assessment of CTCS system is used to demonstrate this method can give quantitative result of collective risks without much information from experts, but can support the risk assessment with risk level and its membership, which are more valuable to guide the further risk management.
Cloud and its transactions have emerged as a major challenge. This paper aims to come up with an efficient and best possible way to transfer data in cloud computing environment. This goal is achieved with the help of Soft Computing Techniques. Of the various techniques such as fuzzy logic, genetic algorithm or neural network, the paper proposes an effective method of intrusion detection using genetic algorithm. The selection of the optimized path for the data transmission proved to be effective method in cloud computing environment. Network path optimization increases data transmission speed making intrusion in network nearly impossible. Intruders are forced to act quickly for intruding the network which is quite a tough task for them in such high speed data transmission network.
In this paper, we extend the Maximum Satisfiability (MaxSAT) problem to Łukasiewicz logic. The MaxSAT problem for a set of formulae Φ is the problem of finding an assignment to the variables in Φ that satisfies the maximum number of formulae. Three possible solutions (encodings) are proposed to the new problem: (1) Disjunctive Linear Relations (DLRs), (2)Mixed Integer Linear Programming (MILP) and (3)Weighted Constraint Satisfaction Problem (WCSP). Like its Boolean counterpart, the extended fuzzy MaxSAT will have numerous applications in optimization problems that involve vagueness.
Availability is one of the most important requirements in the production system. Keeping the level of high availability in Infrastructure-as-a-Service (IaaS) cloud computing is a challenge task because of the complexity of service providing. By definition, the availability can be maintain by using fault tolerance approaches. Recently, many fault tolerance methods have been developed, but few of them focus on the fault detection aspect. In this paper, after a rigorous analysis on the nature of failures, we would like to introduce a technique to identified the failures occurring in IaaS system. By using fuzzy logic algorithm, this proposed technique can provide better performance in terms of accuracy and detection speed, which is critical for the cloud system.
Wireless sensor networks (WSN) are useful in many practical applications including agriculture, military and health care systems. However, the nodes in a sensor network are constrained by energy and hence the lifespan of such sensor nodes are limited due to the energy problem. Temporal logics provide a facility to predict the lifetime of sensor nodes in a WSN using the past and present traffic and environmental conditions. Moreover, fuzzy logic helps to perform inference under uncertainty. When fuzzy logic is combined with temporal constraints, it increases the accuracy of decision making with qualitative information. Hence, a new data collection and cluster based energy efficient routing algorithm is proposed in this paper by extending the existing LEACH protocol. Extensions are provided in this work by including fuzzy temporal rules for making data collection and routing decisions. Moreover, this proposed work uses fuzzy temporal logic for forming clusters and to perform cluster based routing. The main difference between other cluster based routing protocols and the proposed protocol is that two types of cluster heads are used here, one for data collection and other for routing. In this research work we conducted an experiment and it is observed that the proposed fuzzy cluster based routing algorithm with temporal constrains enhances the network life time reduces the energy consumption and enhances the quality of service by increasing the packet delivery ratio by reducing the delay.
Cloud has gained a wide acceptance across the globe. Despite wide acceptance and adoption of cloud computing, certain apprehensions and diffidence, related to safety and security of data still exists. The service provider needs to convince and demonstrate to the client, the confidentiality of data on the cloud. This can be broadly translated to issues related to the process of identifying, developing, maintaining and optimizing trust with clients regarding the services provided. Continuous demonstration, maintenance and optimization of trust of the agreed upon services affects the relationship with a client. The paper proposes a framework of integration of trust at the IAAS level in the cloud. It proposes a novel method of generation of trust index factor, considering the performance and the agility of the feedback received using fuzzy logic.
In this paper, we compare the effectiveness of Hidden Markov Models (HMMs) with that of Profile Hidden Markov Models (PHMMs), where both are trained on sequences of API calls. We compare our results to static analysis using HMMs trained on sequences of opcodes, and show that dynamic analysis achieves significantly stronger results in many cases. Furthermore, in comparing our two dynamic analysis approaches, we find that using PHMMs consistently outperforms our technique based on HMMs.
This paper calls for the attention to investigate real-world malwares in large scales by examining the largest real malware repository, VirusTotal. As a first step, we analyzed two fundamental characteristics of Windows executable malwares from VirusTotal. We designed offline and online tools for this analysis. Our results show that malwares appear in bursts and that distributions of malwares are highly skewed.
This paper presents a detection and containment mechanism for fast self-propagating network worm malware. The detection part of the mechanism uses two categories of network host activities to identify worm behaviour in a network. Upon an identified worm activity in a network, a data-link containment system is used to isolate the internal source of infection, and a network level containment system is used to block inbound worm datagrams. The mechanism has been demonstrated using a software prototype. A number of worm experiments have been conducted to evaluate the prototype. The empirical results show the effectiveness of the developed mechanism in containing fast network worm malware at an early stage with almost no false positives.
We present an overview of the field of malware analysis with emphasis on issues related to document engineering. We will introduce the field with a discussion of the types of malware, including executable binaries, malicious PDFs, polymorphic malware, ransomware, and exploit kits. We will conclude with our view of important research questions in the field. This is an updated version of last year's tutorial, with more information about web-based malware and malware targeting the Android market.
In an attempt to coerce useful information about the behavior of new malware families, threat analysts commonly force newly collected malicious software samples to run within a sandboxed environment. The main goal is to gather intelligence that can later be leveraged to detect and enumerate new malware infections within a network. Currently, most analysis environments "blindly" execute each newly collected malware sample for a predetermined amount of time (e.g., four to five minutes). However, a large majority of malware samples that are forced through sandbox execution are simply repackaged versions of previously seen (and already analyzed) malware. Consequently, a significant amount of time may be wasted in analyzing samples that do not generate new intelligence. In this paper, we propose MAXS, a novel probabilistic multi-hypothesis testing framework for scaling execution in malware analysis environments, including bare-metal execution environments. Our main goal is to automatically recognize whether a malware sample that is undergoing dynamic analysis has likely been seen before (e.g., in a "differently packed" form), and determine if we could therefore stop its execution early while avoiding loss of valuable malware intelligence (e.g., without missing DNS queries to never-before-seen malware command-and-control domains). We have tested our prototype implementation of MAXS over two large collections of malware execution traces obtained from two distinct production-level analysis environments. Our experimental results show that using MAXS we are able to reduce malware execution time by up to 50% in average, with less than 0.3% information loss. This roughly translates into the ability to double the capacity of malware sandbox environments, thus significantly optimizing the resources dedicated to malware execution and analysis. Our results are particularly important for bare-metal execution environments, in which it is not easy to leverage the economies of scale that characterize virtual-machine or emulation based malware sandboxes. For example, MAXS could be used to significantly cut the cost of bare-metal analysis environments by reducing the hardware resources needed to analyze a predetermined daily number of new malware samples.
Android malware growth has been increasing dramatically as well as the diversity and complicity of their developing techniques. Machine learning techniques have been applied to detect malware by modeling patterns of static features and dynamic behaviors of malware. The accuracy rates of the machine learning classifiers differ depending on the quality of the features. We increase the quality of the features by relating between the apps' features and the features that are required to deliver its category's functionality. To measure the benign app references, the features of the top rated apps in a specific category are utilized to train a malware detection classifier for that given category. Android apps stores such as Google Play organize apps into different categories. Each category has its distinct functionalities which means the apps under a specific category are similar in their static and dynamic features. In other words, benign apps under a certain category tend to share a common set of features. On the contrary, malicious apps tend to have abnormal features, which are uncommon for the category that they belong to. This paper proposes category-based machine learning classifiers to enhance the performance of classification models at detecting malicious apps under a certain category. The intensive machine learning experiments proved that category-based classifiers report a remarkable higher average performance compared to non-category based.
Mobile malware has recently become an acute problem. Existing solutions either base static reasoning on syntactic properties, such as exception handlers or configuration fields, or compute data-flow reachability over the program, which leads to scalability challenges. We explore a new and complementary category of features, which strikes a middleground between the above two categories. This new category focuses on security-relevant operations (communcation, lifecycle, etc) –- and in particular, their multiplicity and happens-before order –- as a means to distinguish between malicious and benign applications. Computing these features requires semantic, yet lightweight, modeling of the program's behavior. We have created a malware detection system for Android, MassDroid, that collects traces of security-relevant operations from the call graph via a scalable form of data-flow analysis. These are reduced to happens-before and multiplicity features, then fed into a supervised learning engine to obtain a malicious/benign classification. MassDroid also embodies a novel reporting interface, containing pointers into the code that serve as evidence supporting the determination. We have applied MassDroid to 35,000 Android apps from the wild. The results are highly encouraging with an F-score of 95% in standard testing, and textgreater90% when applied to previously unseen malware signatures. MassDroid is also efficient, requiring about two minutes per app. MassDroid is publicly available as a cloud service for malware detection.
Modern malware is designed with mutation characteristics, namely polymorphism and metamorphism, which causes an enormous growth in the number of variants of malware samples. Categorization of malware samples on the basis of their behaviors is essential for the computer security community, because they receive huge number of malware everyday, and the signature extraction process is usually based on malicious parts characterizing malware families. Microsoft released a malware classification challenge in 2015 with a huge dataset of near 0.5 terabytes of data, containing more than 20K malware samples. The analysis of this dataset inspired the development of a novel paradigm that is effective in categorizing malware variants into their actual family groups. This paradigm is presented and discussed in the present paper, where emphasis has been given to the phases related to the extraction, and selection of a set of novel features for the effective representation of malware samples. Features can be grouped according to different characteristics of malware behavior, and their fusion is performed according to a per-class weighting paradigm. The proposed method achieved a very high accuracy (\$\textbackslashapprox\$ 0.998) on the Microsoft Malware Challenge dataset.
Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.
Tree structures such as breadth-first search (BFS) trees and minimum spanning trees (MST) are among the most fundamental graph structures in distributed network algorithms. However, by definition, these structures are not robust against failures and even a single edge's removal can disrupt their functionality. A well-studied concept which attempts to circumvent this issue is Fault-Tolerant Tree Structures, where the tree gets augmented with additional edges from the network so that the functionality of the structure is maintained even when an edge fails. These structures, or other equivalent formulations, have been studied extensively from a centralized viewpoint. However, despite the fact that the main motivations come from distributed networks, their distributed construction has not been addressed before. In this paper, we present distributed algorithms for constructing fault tolerant BFS and MST structures. The time complexity of our algorithms are nearly optimal in the following strong sense: they almost match even the lower bounds of constructing (basic) BFS and MST trees.
Let G be a finite connected graph, and let ρ be the spectral radius of its universal cover. For example, if G is k-regular then ρ=2√k−1. We show that for every r, there is an r-covering (a.k.a. an r-lift) of G where all the new eigenvalues are bounded from above by ρ. It follows that a bipartite Ramanujan graph has a Ramanujan r-covering for every r. This generalizes the r=2 case due to Marcus, Spielman and Srivastava (2013). Every r-covering of G corresponds to a labeling of the edges of G by elements of the symmetric group Sr. We generalize this notion to labeling the edges by elements of various groups and present a broader scenario where Ramanujan coverings are guaranteed to exist. In particular, this shows the existence of richer families of bipartite Ramanujan graphs than was known before. Inspired by Marcus-Spielman-Srivastava, a crucial component of our proof is the existence of interlacing families of polynomials for complex reflection groups. The core argument of this component is taken from Marcus-Spielman-Srivastava (2015). Another important ingredient of our proof is a new generalization of the matching polynomial of a graph. We define the r-th matching polynomial of G to be the average matching polynomial of all r-coverings of G. We show this polynomial shares many properties with the original matching polynomial. For example, it is real rooted with all its roots inside [−ρ,ρ].
In this paper, we propose a new risk analysis framework that enables to supervise risks in complex and distributed systems. Our contribution is twofold. First, we provide the Risk Assessment Graphs (RAGs) as a model of risk analysis. This graph-based model is adaptable to the system changes over the time. We also introduce the potentiality and the accessibility functions which, during each time slot, evaluate respectively the chance of exploiting the RAG's nodes, and the connection time between these nodes. In addition, we provide a worst-case risk evaluation approach, based on the assumption that the intruder threats usually aim at maximising their benefits by inflicting the maximum damage to the target system (i.e. choosing the most likely paths in the RAG). We then introduce three security metrics: the propagated risk, the node risk and the global risk. We illustrate the use of our framework through the simple example of an enterprise email service. Our framework achieves both flexibility and generality requirements, it can be used to assess the external threats as well as the insider ones, and it applies to a wide set of applications.
In this work, we introduce a new Markov operator associated with a digraph, which we refer to as a nonlinear Laplacian. Unlike previous Laplacians for digraphs, the nonlinear Laplacian does not rely on the stationary distribution of the random walk process and is well defined on digraphs that are not strongly connected. We show that the nonlinear Laplacian has nontrivial eigenvalues and give a Cheeger-like inequality, which relates the conductance of a digraph and the smallest non-zero eigenvalue of its nonlinear Laplacian. Finally, we apply the nonlinear Laplacian to the analysis of real-world networks and obtain encouraging results.