Biblio
Recognition of facial expressions authenticity is quite troublesome for humans. Therefore, it is an interesting topic for the computer vision community, as the developed algorithms for facial expressions authenticity estimation may be used as indicators of deception. This paper discusses the state-of-the art methods developed for smile veracity estimation and proposes a plan of development and validation of a novel approach to automated discrimination between genuine and posed facial expressions. The proposed fully automated technique is based on the extension of the high-dimensional Local Binary Patterns (LBP) to the spatio-temporal domain and combines them with the dynamics of facial landmarks movements. The proposed technique will be validated on several existing smile databases and a novel database created with the use of a high speed camera. Finally, the developed framework will be applied for the detection of deception in real life scenarios.
Abstractions make building complex systems possible. Many facilities provided by a modern programming language are directly designed to build a certain style of abstraction. Abstractions also aim to enhance code reusability, thus enhancing programmer productivity and effectiveness. Real-world software systems can grow to have a complicated hierarchy of abstractions. Often, the hierarchy grows unnecessarily deep, because the programmers have envisioned the most generic use cases for a piece of code to make it reusable. Sometimes, the abstractions used in the program are not the appropriate ones, and it would be simpler for the higher level client to circumvent such abstractions. Another problem is the impedance mismatch between different pieces of code or libraries coming from different projects that are not designed to work together. Interoperability between such libraries are often hindered by abstractions, by design, in the name of hiding implementation details and encapsulation. These problems necessitate forms of abstraction that are easy to manipulate if needed. In this paper, we describe a powerful mechanism to create white-box abstractions, that encourage flatter hierarchies of abstraction and ease of manipulation and customization when necessary: program refinement. In so doing, we rely on the basic principle that writing directly in the host programming language is as least restrictive as one can get in terms of expressiveness, and allow the programmer to reuse and customize existing code snippets to address their specific needs.
Malware detection has been widely studied by analysing either file dropping relationships or characteristics of the file distribution network. This paper, for the first time, studies a global heterogeneous malware delivery graph fusing file dropping relationship and the topology of the file distribution network. The integration offers a unique ability of structuring the end-to-end distribution relationship. However, it brings large heterogeneous graphs to analysis. In our study, an average daily generated graph has more than 4 million edges and 2.7 million nodes that differ in type, such as IPs, URLs, and files. We propose a novel Bayesian label propagation model to unify the multi-source information, including content-agnostic features of different node types and topological information of the heterogeneous network. Our approach does not need to examine the source codes nor inspect the dynamic behaviours of a binary. Instead, it estimates the maliciousness of a given file through a semi-supervised label propagation procedure, which has a linear time complexity w.r.t. the number of nodes and edges. The evaluation on 567 million real-world download events validates that our proposed approach efficiently detects malware with a high accuracy.
Recent technology shifts such as cloud computing, the Internet of Things, and big data lead to a significant transfer of sensitive data out of trusted edge networks. To counter resulting privacy concerns, we must ensure that this sensitive data is not inadvertently forwarded to third-parties, used for unintended purposes, or handled and stored in violation of legal requirements. Related work proposes to solve this challenge by annotating data with privacy policies before data leaves the control sphere of its owner. However, we find that existing privacy policy languages are either not flexible enough or require excessive processing, storage, or bandwidth resources which prevents their widespread deployment. To fill this gap, we propose CPPL, a Compact Privacy Policy Language which compresses privacy policies by taking advantage of flexibly specifiable domain knowledge. Our evaluation shows that CPPL reduces policy sizes by two orders of magnitude compared to related work and can check several thousand of policies per second. This allows for individual per-data item policies in the context of cloud computing, the Internet of Things, and big data.
In the last couple of years, organizations have demonstrated an increased willingness to participate in threat intelligence sharing platforms. The open exchange of information and knowledge regarding threats, vulnerabilities, incidents and mitigation strategies results from the organizations' growing need to protect against today's sophisticated cyber attacks. To investigate data quality challenges that might arise in threat intelligence sharing, we conducted focus group discussions with ten expert stakeholders from security operations centers of various globally operating organizations. The study addresses several factors affecting shared threat intelligence data quality at multiple levels, including collecting, processing, sharing and storing data. As expected, the study finds that the main factors that affect shared threat intelligence data stem from the limitations and complexities associated with integrating and consolidating shared threat intelligence from different sources while ensuring the data's usefulness for an inhomogeneous group of participants.Data quality is extremely important for shared threat intelligence. As our study has shown, there are no fundamentally new data quality issues in threat intelligence sharing. However, as threat intelligence sharing is an emerging domain and a large number of threat intelligence sharing tools are currently being rushed to market, several data quality issues – particularly related to scalability and data source integration – deserve particular attention.
In this paper, we present Deola, an Online system for Author Entity Linking with DBLP. Unlike most existing entity linking systems which focus on linking entities with Wikipedia and depend largely on the special features associated with Wikipedia (e.g., Wikipedia articles), Deola links author names appearing in the web document which belongs to the domain of computer science with their corresponding entities existing in the DBLP network. This task is helpful for the enrichment of the DBLP network and the understanding of the domain-specific document. This task is challenging due to name ambiguity and limited knowledge existing in DBLP. Given a fragment of domain-specific web document belonging to the domain of computer science, Deola can return the mapping entity in DBLP for each author name appearing in the input document.
In settings where data instances are generated sequentially or in streaming fashion, online learning algorithms can learn predictors using incremental training algorithms such as stochastic gradient descent. In some security applications such as training anomaly detectors, the data streams may consist of private information or transactions and the output of the learning algorithms may reveal information about the training data. Differential privacy is a framework for quantifying the privacy risk in such settings. This paper proposes two differentially private strategies to mitigate privacy risk when training a classifier for anomaly detection in an online setting. The first is to use a randomized active learning heuristic to screen out uninformative data points in the stream. The second is to use mini-batching to improve classifier performance. Experimental results show how these two strategies can trade off privacy, label complexity, and generalization performance.
Browser fingerprinting is a widely used technique to uniquely identify web users and to track their online behavior. Until now, different tools have been proposed to protect the user against browser fingerprinting. However, these tools have usability restrictions as they deactivate browser features and plug-ins (like Flash) or the HTML5 canvas element. In addition, all of them only provide limited protection, as they randomize browser settings with unrealistic parameters or have methodical flaws, making them detectable for trackers. In this work we demonstrate the first anti-fingerprinting strategy, which protects against Flash fingerprinting without deactivating it, provides robust and undetectable anti-canvas fingerprinting, and uses a large set of real word data to hide the actual system and browser properties without losing usability. We discuss the methods and weaknesses of existing anti-fingerprinting tools in detail and compare them to our enhanced strategies. Our evaluation against real world fingerprinting tools shows a successful fingerprinting protection in over 99% of 70.000 browser sessions.
We present the largest kinship recognition dataset to date, Families in the Wild (FIW). Motivated by the lack of a single, unified dataset for kinship recognition, we aim to provide a dataset that captivates the interest of the research community. With only a small team, we were able to collect, organize, and label over 10,000 family photos of 1,000 families with our annotation tool designed to mark complex hierarchical relationships and local label information in a quick and efficient manner. We include several benchmarks for two image-based tasks, kinship verification and family recognition. For this, we incorporate several visual features and metric learning methods as baselines. Also, we demonstrate that a pre-trained Convolutional Neural Network (CNN) as an off-the-shelf feature extractor outperforms the other feature types. Then, results were further boosted by fine-tuning two deep CNNs on FIW data: (1) for kinship verification, a triplet loss function was learned on top of the network of pre-train weights; (2) for family recognition, a family-specific softmax classifier was added to the network.
Recently, proactive systems such as Google Now and Microsoft Cortana have become increasingly popular in reforming the way users access information on mobile devices. In these systems, relevant content is presented to users based on their context without a query in the form of information cards that do not require a click to satisfy the users. As a result, prior approaches based on clicks cannot provide reliable measurements of user satisfaction with such systems. It is also unclear how much of the previous findings regarding good abandonment with reactive Web searches can be applied to these proactive systems due to the intrinsic difference in user intent, the greater variety of content types and their presentations. In this paper, we present the first large-scale analysis of viewing behavior based on the viewport (the visible fraction of a Web page) of the mobile devices, towards measuring user satisfaction with the information cards of the mobile proactive systems. In particular, we identified and analyzed a variety of factors that may influence the viewing behavior, including biases from ranking positions, the types and attributes of the information cards, and the touch interactions with the mobile devices. We show that by modeling the various factors we can better measure user satisfaction with the mobile proactive systems, enabling stronger statistical power in large-scale online A/B testing.
Twitter provides a public streaming API that is strictly limited, making it difficult to simultaneously achieve good coverage and relevance when monitoring tweets for a specific topic of interest. In this paper, we address the tweet acquisition challenge to enhance monitoring of tweets based on the client/application needs in an online adaptive manner such that the quality and quantity of the results improves over time. We propose a Tweet Acquisition System (TAS), that iteratively selects phrases to track based on an explore-exploit strategy. Our experimental studies show that TAS significantly improves recall of relevant tweets and the performance improves when the topics are more specific.
Physical Unclonable Functions (PUFs) measure manufacturing variations inside integrated circuits to derive internal secrets during run-time and avoid to store secrets permanently in non-volatile memory. PUF responses are noisy such that they require error correction to generate reliable cryptographic keys. To date, when needed one single key is reproduced in the field and always used, regardless of its reliability. In this work, we compute online reliability information for a reproduced key and perform multiple PUF readout and error correction steps in case of an unreliable result. This permits to choose the most reliable key among multiple derived key candidates with different corrected error patterns. We achieve the same average key error probability from less PUF response bits with this approach. Our proof of concept design for a popular reference scenario uses Differential Sequence Coding (DSC) and a Viterbi decoder with reliability output information. It requires 39% less PUF response bits and 16% less helper data bits than the regular approach without the option for multiple readouts.
This paper describes our proposed method targeting at the MSR Image Recognition Challenge MS-Celeb-1M. The challenge is to recognize one million celebrities from their face images captured in the real world. The challenge provides a large scale dataset crawled from the Web, which contains a large number of celebrities with many images for each subject. Given a new testing image, the challenge requires an identify for the image and the corresponding confidence score. To complete the challenge, we propose a two-stage approach consisting of data cleaning and multi-view deep representation learning. The data cleaning can effectively reduce the noise level of training data and thus improves the performance of deep learning based face recognition models. The multi-view representation learning enables the learned face representations to be more specific and discriminative. Thus the difficulties of recognizing faces out of a huge number of subjects are substantially relieved. Our proposed method achieves a coverage of 46.1% at 95% precision on the random set and a coverage of 33.0% at 95% precision on the hard set of this challenge.
Cryptocurrencies, such as Bitcoin and 250 similar alt-coins, embody at their core a blockchain protocol –- a mechanism for a distributed network of computational nodes to periodically agree on a set of new transactions. Designing a secure blockchain protocol relies on an open challenge in security, that of designing a highly-scalable agreement protocol open to manipulation by byzantine or arbitrarily malicious nodes. Bitcoin's blockchain agreement protocol exhibits security, but does not scale: it processes 3–7 transactions per second at present, irrespective of the available computation capacity at hand. In this paper, we propose a new distributed agreement protocol for permission-less blockchains called ELASTICO. ELASTICO scales transaction rates almost linearly with available computation for mining: the more the computation power in the network, the higher the number of transaction blocks selected per unit time. ELASTICO is efficient in its network messages and tolerates byzantine adversaries of up to one-fourth of the total computational power. Technically, ELASTICO uniformly partitions or parallelizes the mining network (securely) into smaller committees, each of which processes a disjoint set of transactions (or "shards"). While sharding is common in non-byzantine settings, ELASTICO is the first candidate for a secure sharding protocol with presence of byzantine adversaries. Our scalability experiments on Amazon EC2 with up to \$1, 600\$ nodes confirm ELASTICO's theoretical scaling properties.
The IoT will host a large number of co-existing cyber-physical applications. Continuous change, application interference, environment dynamics and uncertainty lead to complex effects which must be controlled to give performance and application guarantees. Application and platform self-configuration and self-awareness are one paradigm to approach this challenge. They can leverage context knowledge to control platform and application functions and their interaction. They could play a dominant role in large scale cyber-physical systems and systems-of-systems, simply because no person can oversee the whole system functionality and dynamics. IoT adds a new dimension because Internet based services will increasingly be used in such system functions. Autonomous vehicles accessing cloud services for efficiency and comfort as well as to reach the required level of safety and security are an example. Such vehicle platforms will communicate with a service infrastructure that must be reliable and highly responsive. Automated continuous self-configuration of data storage might be a good basis for such services up to the point where the different self-x strategies might affect each other, in a positive or negative form. This paper contains three contributions from different domains representing the current status of self-aware systems as they will meet in the Internet-of-Things and closes with a short discussion of upcoming challenges.
Signal noise reduction can improve the performance of machine learning systems dealing with time signals such as audio. Real-life applicability of these recognition technologies requires the system to uphold its performance level in variable, challenging conditions such as noisy environments. In this contribution, we investigate audio signal denoising methods in cepstral and log-spectral domains and compare them with common implementations of standard techniques. The different approaches are first compared generally using averaged acoustic distance metrics. They are then applied to automatic recognition of spontaneous and natural emotions under simulated smartphone-recorded noisy conditions. Emotion recognition is implemented as support vector regression for continuous-valued prediction of arousal and valence on a realistic multimodal database. In the experiments, the proposed methods are found to generally outperform standard noise reduction algorithms.
Recent incidents have once again brought the topic of encryption to public discourse, while researchers continue to demonstrate attacks that highlight the difficulty of implementing encryption even without the presence of "backdoors". However, apart from the threat of implementation flaws in encryption libraries, another significant threat arises when web services fail to enforce ubiquitous encryption. A recent study explored this phenomenon in popular services, and demonstrated how users are exposed to cookie hijacking attacks with severe privacy implications. Many security mechanisms purport to eliminate this problem, ranging from server-controlled options such as HSTS to user-controlled options such as HTTPS Everywhere and other browser extensions. In this paper, we create a taxonomy of available mechanisms and evaluate how they perform in practice. We design an automated testing framework for these mechanisms, and evaluate them using a dataset of 30 days of HTTP requests collected from the public wireless network of our university's campus. We find that all mechanisms suffer from implementation flaws or deployment issues and argue that, as long as servers continue to not support ubiquitous encryption across their entire domain (including all subdomains), no mechanism can effectively protect users from cookie hijacking and information leakage.
This paper builds on prior psychological studies that identify signals of trustworthiness between two human negotiators. Unlike prior work, the current work tracks such signals automatically and fuses them into computational models that predict trustworthiness. To achieve this goal, we apply automatic trackers to recordings of human dyads negotiating in a multi-issue bargaining task. We identify behavioral indicators in different modalities (facial expressions, gestures, gaze, and conversational features) that are predictive of trustworthiness. We predict both objective trustworthiness (i.e., are they honest) and perceived trustworthiness (i.e., do they seem honest to their interaction partner). Our experiments show that people are poor judges of objective trustworthiness (i.e., objective and perceived trustworthiness are predicted by different indicators), and that multimodal approaches better predict objective trustworthiness, whereas people overly rely on facial expressions when judging the honesty of their partner. Moreover, domain knowledge (from the literature and prior analysis of behaviors) facilitates the model development process.
Process-based isolation, suggested by several research prototypes, is a cornerstone of modern browser security architectures. Google Chrome is the first commercial browser that adopts this architecture. Unlike several research prototypes, Chrome's process-based design does not isolate different web origins, but primarily promises to protect "the local system" from "the web". However, as billions of users now use web-based cloud services (e.g., Dropbox and Google Drive), which are integrated into the local system, the premise that browsers can effectively isolate the web from the local system has become questionable. In this paper, we argue that, if the process-based isolation disregards the same-origin policy as one of its goals, then its promise of maintaining the "web/local system (local)" separation is doubtful. Specifically, we show that existing memory vulnerabilities in Chrome's renderer can be used as a stepping-stone to drop executables/scripts in the local file system, install unwanted applications and misuse system sensors. These attacks are purely data-oriented and do not alter any control flow or import foreign code. Thus, such attacks bypass binary-level protection mechanisms, including ASLR and in-memory partitioning. Finally, we discuss various full defenses and present a possible way to mitigate the attacks presented.
With its high penetration rate and relatively good clock accuracy, smartphones are replacing watches in several market segments. Modern smartphones have more than one clock source to complement each other: NITZ (Network Identity and Time Zone), NTP (Network Time Protocol), and GNSS (Global Navigation Satellite System) including GPS. NITZ information is delivered by the cellular core network, indicating the network name and clock information. NTP provides a facility to synchronize the clock with a time server. Among these clock sources, only NITZ and NTP are updated without user interaction, as location services require manual activation. In this paper, we analyze security aspects of these clock sources and their impact on security features of modern smartphones. In particular, we investigate NITZ and NTP procedures over cellular networks (2G, 3G and 4G) and Wi-Fi communication respectively. Furthermore, we analyze several European, Asian, and American cellular networks from NITZ perspective. We identify three classes of vulnerabilities: specification issues in a cellular protocol, configurational issues in cellular network deployments, and implementation issues in different mobile OS's. We demonstrate how an attacker with low cost setup can spoof NITZ and NTP messages to cause Denial of Service attacks. Finally, we propose methods for securely synchronizing the clock on smartphones.
Utility networks are part of every nation's critical infrastructure, and their protection is now seen as a high priority objective. In this paper, we propose a threat awareness architecture for critical infrastructures, which we believe will raise security awareness and increase resilience in utility networks. We first describe an investigation of trends and threats that may impose security risks in utility networks. This was performed on the basis of a viewpoint approach that is capable of identifying technical and non-technical issues (e.g., behaviour of humans). The result of our analysis indicated that utility networks are affected strongly by technological trends, but that humans comprise an important threat to them. This provided evidence and confirmed that the protection of utility networks is a multi-variable problem, and thus, requires the examination of information stemming from various viewpoints of a network. In order to accomplish our objective, we propose a systematic threat awareness architecture in the context of a resilience strategy, which ultimately aims at providing and maintaining an acceptable level of security and safety in critical infrastructures. As a proof of concept, we demonstrate partially via a case study the application of the proposed threat awareness architecture, where we examine the potential impact of attacks in the context of social engineering in a European utility company.