Biblio
The Netflix experience is driven by a number of recommendation algorithms: personalized ranking, page generation, similarity, ratings, search, etc. On the January 6th, 2016 we simultaneously launched Netflix in 130 new countries around the world, which brought the total to over 190 countries. Preparing for such a rapid expansion while ensuring each algorithm was ready to work seamlessly created new challenges for our recommendation and search teams. In this talk, we will highlight the four most interesting challenges we encountered in making our algorithms operate globally and how this improved our ability to connect members worldwide with stories they'll love. In particular, we will dive into the problems of uneven availability across catalogs, balancing personal and cultural tastes, handling language, and tracking quality of recommendations. Uneven catalog availability is a challenge because many recommendation algorithms assume that people could interact with any item and then use the absence of interaction implicitly or explicitly as negative information in the model. However, this assumption does not hold globally and across time where item availability differs. Running algorithms globally means needing a notion of location so that we can handle local variations in taste while also providing a good basis for personalization. Language is another challenge in recommending video content because people can typically only enjoy content that has assets (audio, subtitles) in languages they understand. The preferences for how people enjoy such content also vary between people and depend on their familiarity with a language. Also, while would like our recommendations to work well for every one of our members, tracking quality becomes difficult because with so many members in so many countries speaking so many languages, it can be hard to determine when an algorithm or system is performing sub-optimally for some subset of them. Thus, to support this global launch, we examined each and every algorithm that is part of our service and began to address these challenges.
Industrial Control System (ICS) consists of large number of electronic devices connected to field devices to execute the physical processes. Communication network of ICS supports wide range of packet based applications. A growing issue with network security and its impact on ICS have highlighted some fundamental risks to critical infrastructure. To address network security issues for ICS a clear understanding of security specific defensive countermeasures is required. Reconnaissance of ICS network by deep packet inspection (DPI) consists analysis of the contents of the captured packets in order to get accurate measures of process that uses specific countermeasure to create an aggregated posture. In this paper we focus on novel approach by presenting a technique with captured network traffic. This technique is capable to identify the protocols and extract different features for classification of traffic based on network protocol, header information and payload to understand the whole architecture of complex system. Here we have segregated possible types of attacks on ICS.
Mutex locks have traditionally been the most common mechanism for protecting shared data structures in parallel programs. However, the robustness of such locks against process failures has not been studied thoroughly. Most (user-level) mutex algorithms are designed around the assumption that processes are reliable, meaning that a process may not fail while executing the lock acquisition and release code, or while inside the critical section. If such a failure does occur, then the liveness properties of a conventional mutex lock may cease to hold until the application or operating system intervenes by cleaning up the internal structure of the lock. For example, a process that is attempting to acquire an otherwise starvation-free mutex may be blocked forever waiting for a failed process to release the critical section. Adding to the difficulty, if the failed process recovers and attempts to acquire the same mutex again without appropriate cleanup, then the mutex may become corrupted to the point where it loses safety, notably the mutual exclusion property. We address this challenge by formalizing the problem of recoverable mutual exclusion, and proposing several solutions that vary both in their assumptions regarding hardware support for synchronization, and in their time complexity. Compared to known solutions, our algorithms are more robust as they do not restrict where or when a process may crash, and provide stricter guarantees in terms of time complexity, which we define in terms of remote memory references.
Software applications run on a variety of platforms (filesystems, virtual slices, mobile hardware, etc.) that do not provide 100% uptime. As such, these applications may crash at any unfortunate moment losing volatile data and, when re-launched, they must be able to correctly recover from potentially inconsistent states left on persistent storage. From a verification perspective, crash recovery bugs can be particularly frustrating because, even when it has been formally proved for a program that it satisfies a property, the proof is foiled by these external events that crash and restart the program. In this paper we first provide a hierarchical formal model of what it means for a program to be crash recoverable. Our model captures the recoverability of many real world programs, including those in our evaluation which use sophisticated recovery algorithms such as shadow paging and write-ahead logging. Next, we introduce a novel technique capable of automatically proving that a program correctly recovers from a crash via a reduction to reachability. Our technique takes an input control-flow automaton and transforms it into an encoding that blends the capture of snapshots of pre-crash states into a symbolic search for a proof that recovery terminates and every recovered execution simulates some crash-free execution. Our encoding is designed to enable one to apply existing abstraction techniques in order to do the work that is necessary to prove recoverability. We have implemented our technique in a tool called Eleven82, capable of analyzing C programs to detect recoverability bugs or prove their absence. We have applied our tool to benchmark examples drawn from industrial file systems and databases, including GDBM, LevelDB, LMDB, PostgreSQL, SQLite, VMware and ZooKeeper. Within minutes, our tool is able to discover bugs or prove that these fragments are crash recoverable.
Provenance for transactional updates is critical for many applications such as auditing and debugging of transactions. Recently, we have introduced MV-semirings, an extension of the semiring provenance model that supports updates and transactions. Furthermore, we have proposed reenactment, a declarative form of replay with provenance capture, as an efficient and non-invasive method for computing this type of provenance. However, this approach is limited to the snapshot isolation (SI) concurrency control protocol while many real world applications apply the read committed version of snapshot isolation (RC-SI) to improve performance at the cost of consistency. We present non trivial extensions of the model and reenactment approach to be able to compute provenance of RC-SI transactions efficiently. In addition, we develop techniques for applying reenactment across multiple RC-SI transactions. Our experiments demonstrate that our implementation in the GProM system supports efficient re-construction and querying of provenance.
This paper describes the ASPIRE reference architecture designed to tackle one major problem in this domain: the lack of a clear process and an open software architecture for the composition and deployment of multiple software protections on software applications.
This paper describes the ASPIRE reference architecture designed to tackle one major problem in this domain: the lack of a clear process and an open software architecture for the composition and deployment of multiple software protections on software applications.
The isomorphism of polynomials with two secret (IP2S) problem is one candidate of computational assumptions for post- quantum cryptography. The only identification scheme based on IP2S is introduced in 1996 by Patarin. However, the security of the scheme has not been formally proven and we discover that the originally proposed parameters are no longer secure based on the most recent research. In this paper, we present the first formal security proof of identification scheme based on IP2S against impersonation under passive attack, sequential active attack, and concurrent active attack. We propose new secure parameters and methods to reduce the implementation cost. Using the proposed methods, we are able to cut the storage cost and average communication cost in a drastic way that the scheme is implementable even on the lightweight devices in the current market.
Modern Systems-on-Chip(SoCs) frequently power-off individual voltage domains to save leakage power across a variety of applications, from large-scale heterogeneous computing to ultra-low power systems in IoT applications. However, the considerable energy stored within the capacitance of the powered-off domain is lost through leakage. In this paper, we present an approach to leverage existing voltage regulators to recover this energy from the disabled voltage-domain back into the supply using a low-overhead all-digital runtime control system. Simulation experiments conducted in an industrial 65nm CMOS process indicate that over 90% of the stored energy can be recovered across a range of operating system voltages from 0.4V–1V.
By reflecting the degree of proximity or remoteness of documents, similarity measure plays the key role in text analytics. Traditional measures, e.g. cosine similarity, assume that documents are represented in an orthogonal space formed by words as dimensions. Words are considered independent from each other and document similarity is computed based on lexical overlap. This assumption is also made in the bag of concepts representation of documents while the space is formed by concepts. This paper proposes new semantic similarity measures without relying on the orthogonality assumption. By employing Wikipedia as an external resource, we introduce five similarity measures using concept-concept relatedness. Experimental results on real text datasets reveal that eliminating the orthogonality assumption improves the quality of text clustering algorithms.
Different wireless Peer-to-Peer (P2P) routing protocols rely on cooperative protocols of interaction among peers, yet, most of the surveyed provide little detail on how the peers can take into consideration the peers' reliability for improving routing efficiency in collaborative networks. Previous research has shown that in most of the trust and reputation evaluation schemes, the peers' rating behaviour can be improved to include the peers' attributes for understanding peers' reliability. This paper proposes a reliability based trust model for dynamic trust evaluation between the peers in P2P networks for collaborative routing. Since the peers' routing attributes vary dynamically, our proposed model must also accommodate the dynamic changes of peers' attributes and behaviour. We introduce peers' buffers as a scaling factor for peers' trust evaluation in the trust and reputation routing protocols. The comparison between reliability and non-reliability based trust models using simulation shows the improved performance of our proposed model in terms of delivery ratio and average message latency.
Different wireless Peer-to-Peer (P2P) routing protocols rely on cooperative protocols of interaction among peers, yet, most of the surveyed provide little detail on how the peers can take into consideration the peers' reliability for improving routing efficiency in collaborative networks. Previous research has shown that in most of the trust and reputation evaluation schemes, the peers' rating behaviour can be improved to include the peers' attributes for understanding peers' reliability. This paper proposes a reliability based trust model for dynamic trust evaluation between the peers in P2P networks for collaborative routing. Since the peers' routing attributes vary dynamically, our proposed model must also accommodate the dynamic changes of peers' attributes and behaviour. We introduce peers' buffers as a scaling factor for peers' trust evaluation in the trust and reputation routing protocols. The comparison between reliability and non-reliability based trust models using simulation shows the improved performance of our proposed model in terms of delivery ratio and average message latency.
Security of embedded devices is a timely and important issue, due to the proliferation of these devices into numerous and diverse settings, as well as their growing popularity as attack targets, especially, via remote malware infestations. One important defense mechanism is remote attestation, whereby a trusted, and possibly remote, party (verifier) checks the internal state of an untrusted, and potentially compromised, device (prover). Despite much prior work, remote attestation remains a vibrant research topic. However, most attestation schemes naturally focus on the scenario where the verifier is trusted and the prover is not. The opposite setting–-where the prover is benign, and the verifier is malicious–-has been side-stepped. To this end, this paper considers the issue of prover security, including: verifier impersonation, denial-of-service (DoS) and replay attacks, all of which result in unauthorized invocation of attestation functionality on the prover. We argue that protection of the prover from these attacks must be treated as an important component of any remote attestation method. We formulate a new roaming adversary model for this scenario and present the trade-offs involved in countering this threat. We also identify new features and methods needed to protect the prover with minimal additional requirements.
We present ReproZip, the recommended packaging tool for the SIGMOD Reproducibility Review. ReproZip was designed to simplify the process of making an existing computational experiment reproducible across platforms, even when the experiment was put together without reproducibility in mind. The tool creates a self-contained package for an experiment by automatically tracking and identifying all its required dependencies. The researcher can share the package with others, who can then use ReproZip to unpack the experiment, reproduce the findings on their favorite operating system, as well as modify the original experiment for reuse in new research, all with little effort. The demo will consist of examples of non-trivial experiments, showing how these can be packed in a Linux machine and reproduced on different machines and operating systems. Demo visitors will also be able to pack and reproduce their own experiments.
Based on Storm, a distributed, reliable, fault-tolerant real-time data stream processing system, we propose a recognition system of web intrusion detection. The system is based on machine learning, feature selection algorithm by TF-IDF(Term Frequency–Inverse Document Frequency) and the optimised cosine similarity algorithm, at low false positive rate and a higher detection rate of attacks and malicious behavior in real-time to protect the security of user data. From comparative analysis of experiments we find that the system for intrusion recognition rate and false positive rate has improved to some extent, it can be better to complete the intrusion detection work.
Expressing and matching the security policy of each participant accurately is the precondition to construct a secure service composition. Most schemes presently use syntactic approaches to represent and match the security policy for service composition process, which is prone to result in false negative because of lacking semantics. In this paper, a novel approach based on semantics is proposed to express and match the security policies in service composition. Through constructing a general security ontology, the definition method and matching algorithm of the semantic security policy for service composition are presented, and the matching problem of policy is translated into the subsumption reasoning problem of semantic concept. Both the theoretical analysis and experimental evaluation show that, the proposed approach can present the necessary semantic information in the representation of policy and effectively improve the accuracy of matching result, thus overcome the deficiency of the syntactic approaches, and can also simplify the definition and management of the policy at the same time, which thereby provides a more effective solution for building the secure service composition based on security policy.
In order to realize the accurate positioning and recognition effectively of the analog circuit, the feature extraction of fault information is an extremely important port. This arrival based on the experimental circuit which is designed as a failure mode to pick-up the fault sample set. We have chosen two methods, one is the combination of wavelet transform and principal component analysis, the other is the factorial analysis for the fault data's feature extraction, and we also use the extreme learning machine to train and diagnose the data, to compare the performance of these two methods through the accuracy of the diagnosis. The results of the experiment shows that the data which we get from the experimental circuit, after dealing with these two methods can quickly get the fault location.
In this paper, we describe the formatting guidelines for ACM SIG Proceedings. In order to assure safety of Chinese Train Control System (CTCS), it is necessary to ensure the operational risk is acceptable throughout its life-cycle, which requires a pragmatic risk assessment required for effective risk control. Many risk assessment techniques currently used in railway domain are qualitative, and rely on the experience of experts, which unavoidably brings in subjective judgements. This paper presents a method that combines fuzzy reasoning and analytic hierarchy process approach to quantify the experiences of experts to get the scores of risk parameters. Fuzzy reasoning is used to obtain the risk of system hazard, analytic hierarchy process approach is used to determine the risk level (RL) and its membership of the system. This method helps safety analyst to calculate overall collective risk level of system. A case study of risk assessment of CTCS system is used to demonstrate this method can give quantitative result of collective risks without much information from experts, but can support the risk assessment with risk level and its membership, which are more valuable to guide the further risk management.
Implementation attacks and more specifically Power Analysis (PA) (the dominant type of side channel attack) and fault injection (FA) attacks constitute a pragmatic hazard for scalar multiplication, the main operation behind Elliptic Curve Cryptography. There exists a wide variety of countermeasures attempting to thwart such attacks that, however, few of them explore the potential of alternative number systems like the Residue Number System (RNS). In this paper, we explore the potential of RNS as an PA-FA countermeasure and propose an PA-FA resistant scalar multiplication algorithm and provide an extensive security analysis against the most effective PA-FA techniques. We argue through a security analysis that combining traditional PA-FA countermeasures with lightweight RNS countermeasures can provide strong PA-FA resistance.
At the very high level of abstraction, the Internet of Things (IoT) can be modeled as the hyper-scale, hyper-complex cyber-physical system. Study of resilience of IoT systems is the first step towards engineering of the future IoT eco-systems. Exploration of this domain is highly promising avenue for many aspiring Ph.D. and M.Sc. students.