Biblio
Interconnected everyday objects, either via public or private networks, are gradually becoming reality in modern life - often referred to as the Internet of Things (IoT) or Cyber-Physical Systems (CPS). One stand-out example are those systems based on Unmanned Aerial Vehicles (UAVs). Fleets of such vehicles (drones) are prophesied to assume multiple roles from mundane to high-sensitive applications, such as prompt pizza or shopping deliveries to the home, or to deployment on battlefields for battlefield and combat missions. Drones, which we refer to as UAVs in this paper, can operate either individually (solo missions) or as part of a fleet (group missions), with and without constant connection with a base station. The base station acts as the command centre to manage the drones' activities; however, an independent, localised and effective fleet control is necessary, potentially based on swarm intelligence, for several reasons: 1) an increase in the number of drone fleets; 2) fleet size might reach tens of UAVs; 3) making time-critical decisions by such fleets in the wild; 4) potential communication congestion and latency; and 5) in some cases, working in challenging terrains that hinders or mandates limited communication with a control centre, e.g. operations spanning long period of times or military usage of fleets in enemy territory. This self-aware, mission-focused and independent fleet of drones may utilise swarm intelligence for a), air-traffic or flight control management, b) obstacle avoidance, c) self-preservation (while maintaining the mission criteria), d) autonomous collaboration with other fleets in the wild, and e) assuring the security, privacy and safety of physical (drones itself) and virtual (data, software) assets. In this paper, we investigate the challenges faced by fleet of drones and propose a potential course of action on how to overcome them.
Detecting malicious code with exact match on collected datasets is becoming a large-scale identification problem due to the existence of new malware variants. Being able to promptly and accurately identify new attacks enables security experts to respond effectively. My proposal is to develop an automated framework for identification of unknown vulnerabilities by leveraging current neural network techniques. This has a significant and immediate value for the security field, as current anti-virus software is typically able to recognize the malware type only after its infection, and preventive measures are limited. Artificial Intelligence plays a major role in automatic malware classification: numerous machine-learning methods, both supervised and unsupervised, have been researched to try classifying malware into families based on features acquired by static and dynamic analysis. The value of automated identification is clear, as feature engineering is both a time-consuming and time-sensitive task, with new malware studied while being observed in the wild.
In content-based security, encrypted content as well as wrapped access keys are made freely available by an Information Centric Network: Only those clients which are able to unwrap the encryption key can access the protected content. In this paper we extend this model to computation chains where derived data (e.g. produced by a Named Function Network) also has to comply to the content-based security approach. A central problem to solve is the synchronized on-demand publishing of encrypted results and wrapped keys as well as defining the set of consumers which are authorized to access the derived data. In this paper we introduce "content-attendant policies" and report on a running prototype that demonstrates how to enforce data owner-defined access control policies despite fully decentralized and arbitrarily long computation chains.
Steganography enables user to hide confidential data in any digital medium such that its existence cannot be concealed by the third party. Several research work is being is conducted to improve steganography algorithm's efficiency. Recent trends in computing technology use steganography as an important tool for hiding confidential data. This paper summarizes some of the research work conducted in the field of image steganography in spatial domain along with their advantages and disadvantages. Future research work and experimental results of some techniques is also being discussed. The key goal is to show the powerful impact of steganography in information hiding and image processing domain.
In the area of the Internet of Things, cloud-based camera surveillance systems are ubiquitously available for industrial and private environments. However, the sensitive nature of the surveillance use case imposes high requirements on privacy/confidentiality, authenticity, and availability of such systems. In this work, we investigate how currently available mass-market camera systems comply with these requirements. Considering two attacker models, we test the cameras for weaknesses and analyze for their implications. We reverse-engineered the security implementation and discovered several vulnerabilities in every tested system. These weaknesses impair the users' privacy and, as a consequence, may also damage the camera system manufacturer's reputation. We demonstrate how an attacker can exploit these vulnerabilities to blackmail users and companies by denial-of-service attacks, injecting forged video streams, and by eavesdropping private video data - even without physical access to the device. Our analysis shows that current systems lack in practice the necessary care when implementing security for IoT devices.
Android malware growth has been increasing dramatically as well as the diversity and complicity of their developing techniques. Machine learning techniques have been applied to detect malware by modeling patterns of static features and dynamic behaviors of malware. The accuracy rates of the machine learning classifiers differ depending on the quality of the features. We increase the quality of the features by relating between the apps' features and the features that are required to deliver its category's functionality. To measure the benign app references, the features of the top rated apps in a specific category are utilized to train a malware detection classifier for that given category. Android apps stores such as Google Play organize apps into different categories. Each category has its distinct functionalities which means the apps under a specific category are similar in their static and dynamic features. In other words, benign apps under a certain category tend to share a common set of features. On the contrary, malicious apps tend to have abnormal features, which are uncommon for the category that they belong to. This paper proposes category-based machine learning classifiers to enhance the performance of classification models at detecting malicious apps under a certain category. The intensive machine learning experiments proved that category-based classifiers report a remarkable higher average performance compared to non-category based.
We demonstrate the infrastructure used in the TREC 2015 Total Recall track to facilitate controlled simulation of "assessor in the loop" high-recall retrieval experimentation. The implementation and corresponding design decisions are presented for this platform. This includes the necessary considerations to ensure that experiments are privacy-preserving when using test collections that cannot be distributed. Furthermore, we describe the use of virtual machines as a means of system submission in order to to promote replicable experiments while also ensuring the security of system developers and data providers.
Cloud service providers typically adopt the multi-tenancy model to optimize resources usage and achieve the promised cost-effectiveness. Sharing resources between different tenants and the underlying complex technology increase the necessity of transparency and accountability. In this regard, auditing security compliance of the provider's infrastructure against standards, regulations and customers' policies takes on an increasing importance in the cloud to boost the trust between the stakeholders. However, virtualization and scalability make compliance verification challenging. In this work, we propose an automated framework that allows auditing the cloud infrastructure from the structural point of view while focusing on virtualization-related security properties and consistency between multiple control layers. Furthermore, to show the feasibility of our approach, we integrate our auditing system into OpenStack, one of the most used cloud infrastructure management systems. To show the scalability and validity of our framework, we present our experimental results on assessing several properties related to auditing inter-layer consistency, virtual machines co-residence, and virtual resources isolation.
In international military coalitions, situation awareness is achieved by gathering critical intel from different authorities. Authorities want to retain control over their data, as they are sensitive by nature, and, thus, usually employ their own authorization solutions to regulate access to them. In this paper, we highlight that harmonizing authorization solutions at the coalition level raises many challenges. We demonstrate how we address authorization challenges in the context of a scenario defined by military experts using a prototype implementation of SAFAX, an XACML-based architectural framework tailored to the development of authorization services for distributed systems.
Unlike most social media, where automatic archiving of data is the default, Snapchat defaults to ephemerality: deleting content shortly after it is viewed by a receiver. Interviews with 25 Snapchat users show that ephemerality plays a key role in shaping their practices. Along with friend-adding features that facilitate a network of mostly close relations, default deletion affords everyday, mundane talk and reduces self-consciousness while encouraging playful interaction. Further, although receivers can save content through screenshots, senders are notified; this selective saving with notification supports complex information norms that preserve the feel of ephemeral communication while supporting the capture of meaningful content. This dance of giving and taking, sharing and showing, and agency for both senders and receivers provides the basis for a rich design space of mechanisms, levels, and domains for ephemerality.
In this demo, we will display a smartphone authentication system that can automatically validate every touch interaction made on a smartphone using a smart watch worn by the phone's owner. The IMU sensors on a smart watch monitor the motion of the hand for specific signal characteristics, which is relayed to the phone. If the signal features match certain criteria then the touch is authenticated and the phone responds appropriately. If not, the phone's screen remains locked/unresponsive to the touch action. The challenge here is to be able to validate every touch gesture within acceptable limits of human perception.
Cyber-physical systems (CPS) are often network integrated to enable remote management, monitoring, and reporting. Such integration has made them vulnerable to cyber attacks originating from an untrusted network (e.g., the internet). Once an attacker breaches the network security, he could corrupt operations of the system in question, which may in turn lead to catastrophes. Hence there is a critical need to detect intrusions into mission-critical CPS. Signature based detection may not work well for CPS, whose complexity may preclude any succinct signatures that we will need. Specification based detection requires accurate definitions of system behaviour that similarly can be hard to obtain, due to the CPS's complexity and dynamics, as well as inaccuracies and incompleteness of design documents or operation manuals. Formal models, to be tractable, are often oversimplified, in which case they will not support effective detection. In this paper, we study a behaviour-based machine learning (ML) approach for the intrusion detection. Whereas prior unsupervised ML methods have suffered from high missed detection or false-positive rates, we use a high-fidelity CPS testbed, which replicates all main physical and control components of a modern water treatment facility, to generate systematic training data for a supervised method. The method does not only detect the occurrence of a cyber attack at the physical process layer, but it also identifies the specific type of the attack. Its detection is fast and robust to noise. Furthermore, its adaptive system model can learn quickly to match dynamics of the CPS and its operating environment. It exhibits a low false positive (FP) rate, yet high precision and recall.
Explicit non-linear transformations of existing steganalysis features are shown to boost their ability to detect steganography in combination with existing simple classifiers, such as the FLD-ensemble. The non-linear transformations are learned from a small number of cover features using Nyström approximation on pilot vectors obtained with kernelized PCA. The best performance is achieved with the exponential form of the Hellinger kernel, which improves the detection accuracy by up to 2-3% for spatial-domain contentadaptive steganography. Since the non-linear map depends only on the cover source and its learning has a low computational complexity, the proposed approach is a practical and low cost method for boosting the accuracy of existing detectors built as binary classifiers. The map can also be used to significantly reduce the feature dimensionality (by up to factor of ten) without performance loss with respect to the non-transformed features.
The ability to identify mobile apps in network traffic has significant implications in many domains, including traffic management, malware detection, and maintaining user privacy. App identification methods in the literature typically use deep packet inspection (DPI) and analyze HTTP headers to extract app fingerprints. However, these methods cannot be used if HTTP traffic is encrypted. We investigate whether Android apps can be identified from their launch-time network traffic using only TCP/IP headers. We first capture network traffic of 86,109 app launches by repeatedly running 1,595 apps on 4 distinct Android devices. We then use supervised learning methods used previously in the web page identification literature, to identify the apps that generated the traffic. We find that: (i) popular Android apps can be identified with 88% accuracy, by using the packet sizes of the first 64 packets they generate, when the learning methods are trained and tested on the data collected from same device; (ii) when the data from an unseen device (but similar operating system/vendor) is used for testing, the apps can be identified with 67% accuracy; (iii) the app identification accuracy does not drop significantly even if the training data are stale by several days, and (iv) the accuracy does drop quite significantly if the operating system/vendor is very different. We discuss the implications of our findings as well as open issues.
This article deals with color images steganalysis based on machine learning. The proposed approach enriches the features from the Color Rich Model by adding new features obtained by applying steerable Gaussian filters and then computing the co-occurrence of pixel pairs. Adding these new features to those obtained from Color-Rich Models allows us to increase the detectability of hidden messages in color images. The Gaussian filters are angled in different directions to precisely compute the tangent of the gradient vector. Then, the gradient magnitude and the derivative of this tangent direction are estimated. This refined method of estimation enables us to unearth the minor changes that have occurred in the image when a message is embedded. The efficiency of the proposed framework is demonstrated on three stenographic algorithms designed to hide messages in images: S-UNIWARD, WOW, and Synch-HILL. Each algorithm is tested using different payload sizes. The proposed approach is compared to three color image steganalysis methods based on computation features and Ensemble Classifier classification: the Spatial Color Rich Model, the CFA-aware Rich Model and the RGB Geometric Color Rich Model.
Graph reordering is a powerful technique to increase the locality of the representations of graphs, which can be helpful in several applications. We study how the technique can be used to improve compression of graphs and inverted indexes. We extend the recent theoretical model of Chierichetti et al. (KDD 2009) for graph compression, and show how it can be employed for compression-friendly reordering of social networks and web graphs and for assigning document identifiers in inverted indexes. We design and implement a novel theoretically sound reordering algorithm that is based on recursive graph bisection. Our experiments show a significant improvement of the compression rate of graph and indexes over existing heuristics. The new method is relatively simple and allows efficient parallel and distributed implementations, which is demonstrated on graphs with billions of vertices and hundreds of billions of edges.
Learning to rank (L2R) algorithms use a labeled training set to generate a ranking model that can be later used to rank new query results. These training sets are very costly and laborious to produce, requiring human annotators to assess the relevance or order of the documents in relation to a query. Active learning (AL) algorithms are able to reduce the labeling effort by actively sampling an unlabeled set and choosing data instances that maximize the effectiveness of a learning function. But AL methods require constant supervision, as documents have to be labeled at each round of the process. In this paper, we propose that certain characteristics of unlabeled L2R datasets allow for an unsupervised, compression-based selection process to be used to create small and yet highly informative and effective initial sets that can later be labeled and used to bootstrap a L2R system. We implement our ideas through a novel unsupervised selective sampling method, which we call Cover, that has several advantages over AL methods tailored to L2R. First, it does not need an initial labeled seed set and can select documents from scratch. Second, selected documents do not need to be labeled as the iterations of the method progress since it is unsupervised (i.e., no learning model needs to be updated). Thus, an arbitrarily sized training set can be selected without human intervention depending on the available budget. Third, the method is efficient and can be run on unlabeled collections containing millions of query-document instances. We run various experiments with two important L2R benchmarking collections to show that the proposed method allows for the creation of small, yet very effective training sets. It achieves full training-like performance with less than 10% of the original sets selected, outperforming the baselines in both effectiveness and scalability.
Compressive sensing is a new technique by which sparse signals are sampled and recovered from a few measurements. To address the disadvantages of traditional space image compressing methods, a complete new compressing scheme under the compressive sensing framework was developed in this paper. Firstly, in the coding stage, a simple binary measurement matrix was constructed to obtain signal measurements. Secondly, the input image was divided into small blocks. The image blocks then would be used as training sets to get a dictionary basis for sparse representation with learning algorithm. At last, sparse reconstruction algorithm was used to recover the original input image. Experimental results show that both the compressing rate and image recovering quality of the proposed method are high. Besides, as the computation cost is very low in the sampling stage, it is suitable for on-board applications in astronomy.
The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as "runtime" in core statistical theory and the lack of a role for statistical concepts such as "risk" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.