Biblio
Instead of developing single-server software system for the powerful computers, the software is turning from large single-server to multi-server system such as distributed system. This change introduces a new challenge for the software quality measurement, since the current software analysis methods for single-server software could not observe and assess the correlation among the components on different nodes. In this paper, a new dynamic cohesion approach is proposed for distributed system. We extend Calling Network model for distributed system by differentiating methods of components deployed on different nodes. Two new cohesion metrics are proposed to describe the correlation at component level, by extending the cohesion metric of single-server software system. The experiments, conducted on a distributed systems-Netflix RSS Reader, present how to trace the various system functions accomplished on three nodes, how to abstract dynamic behaviors using our model among different nodes and how to evaluate the software cohesion on distributed system.
Cloud computing is rapidly reshaping the server administration landscape. The widespread use of virtualization and the increasingly high server consolidation ratios, in particular, have introduced unprecedented security challenges for users, increasing the exposure to intrusions and opening up new opportunities for attacks. Deploying security mechanisms in the hypervisor to detect and stop intrusion attempts is a promising strategy to address this problem. Existing hypervisor-based solutions, however, are typically limited to very specific classes of attacks and introduce exceedingly high performance overhead for production use. In this paper, we present Slick (Storage-Level Intrusion ChecKer), an intrusion detection system (IDS) for virtualized storage devices. Slick detects intrusion attempts by efficiently and transparently monitoring write accesses to critical regions on storage devices. The low-overhead monitoring component operates entirely inside the hypervisor, with no introspection or modifications required in the guest VMs. Using Slick, users can deploy generic IDS rules to detect a broad range of real-world intrusions in a flexible and practical way. Experimental results confirm that Slick is effective at enhancing the security of virtualized servers, while imposing less than 5% overhead in production.
Embedded virtualization has emerged as a valuable way to reduce costs, improve software quality, and decrease design time. Additionally, virtualization can enforce the overall system's security from several perspectives. One is security due to separation, where the hypervisor ensures that one domain does not compromise the execution of other domains. At the same time, the advances in the development of IoT applications opened discussions about the security flaws that were introduced by IoT devices. In a few years, billions of these devices will be connected to the cloud exchanging information. This is an opportunity for hackers to exploit their vulnerabilities, endangering applications connected to such devices. At this point, it is inevitable to consider virtualization as a possible approach for IoT security. In this paper we discuss how embedded virtualization could take place on IoT devices as a sound solution for security.
Cloud service providers typically adopt the multi-tenancy model to optimize resources usage and achieve the promised cost-effectiveness. Sharing resources between different tenants and the underlying complex technology increase the necessity of transparency and accountability. In this regard, auditing security compliance of the provider's infrastructure against standards, regulations and customers' policies takes on an increasing importance in the cloud to boost the trust between the stakeholders. However, virtualization and scalability make compliance verification challenging. In this work, we propose an automated framework that allows auditing the cloud infrastructure from the structural point of view while focusing on virtualization-related security properties and consistency between multiple control layers. Furthermore, to show the feasibility of our approach, we integrate our auditing system into OpenStack, one of the most used cloud infrastructure management systems. To show the scalability and validity of our framework, we present our experimental results on assessing several properties related to auditing inter-layer consistency, virtual machines co-residence, and virtual resources isolation.
This paper presents SplitBox, an efficient system for privacy-preserving processing of network functions that are outsourced as software processes to the cloud. Specifically, cloud providers processing the network functions do not learn the network policies instructing how the functions are to be processed. First, we propose an abstract model of a generic network function based on match-action pairs. We assume that this function is processed in a distributed manner by multiple honest-but-curious cloud service providers. Then, we introduce our SplitBox system for private network function virtualization and present a proof-of-concept implementation on FastClick, an extension of the Click modular router, using a firewall as a use case. Our experimental results achieve a throughput of over 2 Gbps with 1 kB-sized packets on average, traversing up to 60 firewall rules.
Network monitoring is vital to the administration and operation of networks, but it requires privileged access that only highly trusted parties are granted. This severely limits the opportunity for external parties, such as service or equipment providers, auditors, or even clients, to measure the health or operation of a network in which they are stakeholders, but do not have access to its internal structure. In this position paper we propose the use of middleboxes to open up network monitoring to external parties using privacy-preserving technology. This will allow distrusted parties to make more inferences about the network state than currently possible, without learning any precise information about the network or the data that crosses it. Thus the state of the network will be more transparent to external stakeholders, who will be empowered to verify claims made by network operators. Network operators will be able to provide more information about their network without compromising security or privacy.
Screen lock is vulnerable against shoulder surfing since password, personal identification numbers (PIN) and pattern can be seen when smart phones are used in public space although important information is stored in them and they are often used in public space. In this paper, we propose a new method in which passwords are combined with biometrics authentication which cannot be seen by shoulder surfing and difficult to be guessed by brute-force attacks. In our method, the motion of a finger is measured by sensors when a user controls a mobile terminal, and the motion which includes characteristics of the user is registered. In our method, registered characteristics are classified by learning with self-organizing maps. Users are identified by referring the self-organizing maps when they input passwords on mobile terminals.
Fixing a non-deadlock concurrency bug is a difficult job that sometimes introduces additional bugs and requires a long time. To overcome this difficulty and efficiently perform fixing jobs, engineers should have broad knowledge of various fix patterns, and the ability to select the most proper one among those patterns based on quantitative data gathered from real-world bug databases. In this paper, we provide a real-world characteristic study on the fixes of non-deadlock concurrency bugs to help engineers responsible for program maintenance. In particular, we examine various fix patterns and the factors that influence the selection of those patterns with respect to the preexistence of locks and failure types. Our results will provide useful information for engineers who write bug patches, and researchers who study efficient testing and fixing techniques.
Formal specification is a vital ingredient to scalable verification of software systems. In the case of efficient implementations of concurrent objects like atomic registers, queues, and locks, symbolic formal representations of their abstract data types (ADTs) enable efficient modular reasoning, decoupling clients from implementations. Writing adequate formal specifications, however, is a complex task requiring rare expertise. In practice, programmers write reference implementations as informal specifications. In this work we demonstrate that effective symbolic ADT representations can be automatically generated from the executions of reference implementations. Our approach exploits two key features of naturally-occurring ADTs: violations can be decomposed into a small set of representative patterns, and these patterns manifest in executions with few operations. By identifying certain algebraic properties of naturally-occurring ADTs, and exhaustively sampling executions up to a small number of operations, we generate concise symbolic ADT representations which are complete in practice, enabling the application of efficient symbolic verification algorithms without the burden of manual specification. Furthermore, the concise ADT violation patterns we generate are human-readable, and can serve as useful, formal documentation.
NoSQL solutions become emerging for large scaled, high performance, schema-flexible applications. WiredTiger is cost effective, non-locking, no-overwrite storage used as default storage engine in MongoDB. Understanding I/O characteristics of storage engine is important not only for choosing suitable solution with an application but also opening opportunities for researchers optimizing current working system, especially building more flash-awareness NoSQL DBMS. This paper explores background of MongoDB internals then analyze I/O characteristics of WiredTiger storage engine in detail. We also exploit space management mechanism in WiredTiger by using TRIM command.
We introduce OPTIK, a new practical design pattern for designing and implementing fast and scalable concurrent data structures. OPTIK relies on the commonly-used technique of version numbers for detecting conflicting concurrent operations. We show how to implement the OPTIK pattern using the novel concept of OPTIK locks. These locks enable the use of version numbers for implementing very efficient optimistic concurrent data structures. Existing state-of-the-art lock-based data structures acquire the lock and then check for conflicts. In contrast, with OPTIK locks, we merge the lock acquisition with the detection of conflicting concurrency in a single atomic step, similarly to lock-free algorithms. We illustrate the power of our OPTIK pattern and its implementation by introducing four new algorithms and by optimizing four state-of-the-art algorithms for linked lists, skip lists, hash tables, and queues. Our results show that concurrent data structures built using OPTIK are more scalable than the state of the art.
The rapid progress of multi-/many-core architectures has caused data-intensive parallel applications not yet be fully suited for getting the maximum performance. The advent of parallel programming frameworks offering structured patterns has alleviated developers' burden adapting such applications to parallel platforms. For example, the use of synchronization mechanisms in multithreaded applications is essential on shared-cache multi-core architectures. However, ensuring an appropriate use of their interfaces can be challenging, since different memory models plus instruction reordering at compiler/processor levels may influence the occurrence of data races. The benefits of race detectors are formidable in this sense, nevertheless if lock-free data structures with no high-level atomics are used, they may emit false positives. In this paper, we extend the ThreadSanitizer race detection tool in order to support semantics of the general Single-Producer/Single-Consumer (SPSC) lock-free parallel queue and to detect benign data races where it was correctly used. To perform our analysis, we leverage the FastFlow SPSC bounded lock-free queue implementation to test our extensions over a set of μ-benchmarks and real applications on a dual-socket Intel Xeon CPU E5-2695 platform. We demonstrate that this approach can reduce, on average, 30% the number of data race warning messages.
To prevent unauthorized parties from accessing data stored on their smartphones, users have the option of enabling a "lock screen" that requires a secret code (e.g., PIN, drawing a pattern, or biometric) to gain access to their devices. We present a detailed analysis of the smartphone locking mechanisms currently available to billions of smartphone users worldwide. Through a month-long field study, we logged events from a panel of users with instrumented smartphones (N=134). We are able to show how existing lock screen mechanisms provide users with distinct tradeoffs between usability (unlocking speed vs. unlocking frequency) and security. We find that PIN users take longer to enter their codes, but commit fewer errors than pattern users, who unlock more frequently and are very prone to errors. Overall, PIN and pattern users spent the same amount of time unlocking their devices on average. Additionally, unlock performance seemed unaffected for users enabling the stealth mode for patterns. Based on our results, we identify areas where device locking mechanisms can be improved to result in fewer human errors – increasing usability – while also maintaining security.
This paper presents a new type of online password guessing attack called "WiPING" (Wi-Fi signal-based PIN Guessing attack) to guess a victim's PIN (Personal Identification Number) within a small number of unlock attempts. WiPING uses wireless signal patterns identified from observing sequential finger movements involved in typing a PIN to unlock a mobile device. A list of possible PIN candidates is generated from the wireless signal patterns, and is used to improve performance of PIN guessing attacks. We implemented a proof-of-concept attack to demonstrate the feasibility of WiPING. Our results showed that WiPING could be practically effective: while pure guessing attacks failed to guess all 20 PINs, WiPING successfully guessed two PINs.
Multi-core is widely used for mobile devices due to high performance and good energy efficiency. For maintaining cores' cache coherency, mobile multi-core integrated new hardware ARM CCI. In this study, we focus on the security aspect of mobile multi-core. We monitor cache coherency operations that occur among PSL related processes' inter-core communication. After simple analysis, we can sneak android PSL information. Some preliminary results show that we could efficiently identify PSL pattern. This is a significant security violation in terms of confidentiality. In addition, mobile multi-cores are already prevalent, the attack is practical, and it can be easily spread.
Wake locks are widely used in Android apps to protect critical computations from being disrupted by device sleeping. Inappropriate use of wake locks often seriously impacts user experience. However, little is known on how wake locks are used in real-world Android apps and the impact of their misuses. To bridge the gap, we conducted a large-scale empirical study on 44,736 commercial and 31 open-source Android apps. By automated program analysis and manual investigation, we observed (1) common program points where wake locks are acquired and released, (2) 13 types of critical computational tasks that are often protected by wake locks, and (3) eight patterns of wake lock misuses that commonly cause functional and non-functional issues, only three of which had been studied by existing work. Based on our findings, we designed a static analysis technique, Elite, to detect two most common patterns of wake lock misuses. Our experiments on real-world subjects showed that Elite is effective and can outperform two state-of-the-art techniques.
In this paper we show how word embeddings can be used to increase the effectiveness of a state-of-the art Locality Sensitive Hashing (LSH) based first story detection (FSD) system over a standard tweet corpus. Vocabulary mismatch, in which related tweets use different words, is a serious hindrance to the effectiveness of a modern FSD system. In this case, a tweet could be flagged as a first story even if a related tweet, which uses different but synonymous words, was already returned as a first story. In this work, we propose a novel approach to mitigate this problem of lexical variation, based on tweet expansion. In particular, we propose to expand tweets with semantically related paraphrases identified via automatically mined word embeddings over a background tweet corpus. Through experimentation on a large data stream comprised of 50 million tweets, we show that FSD effectiveness can be improved by 9.5% over a state-of-the-art FSD system.
We present RENE –- a novel encoding scheme for short ranges on Ternary content addressable memory (TCAM), which, unlike previous solutions, does not impose row expansion, and uses bits proportionally to the maximal range length. We provide theoretical analysis to show that our encoding is the closest to the lower bound of number of bits used. In addition, we show several applications of our technique in the field of packet classification, and also, how the same technique could be used to efficiently solve other hard problems such as the nearest-neighbor search problem and its variants. We show that using TCAM, one could solve such problems in much higher rates than previously suggested solutions, and outperform known lower bounds in traditional memory models. We show by experiments that the translation process of RENE on switch hardware induces only a negligible 2.5% latency overhead. Our nearest neighbor implementation on a TCAM device provides search rates that are up to four orders of magnitude higher than previous best prior-art solutions.
Providing recommendations on social systems has been in the spotlight of both academics and industry for some time already. Social network giants like Facebook, LinkedIn, Myspace, etc., are eager to find the silver bullet of recommendation. These applications permit clients to shape a few certain social networks through their day-by-day social cooperative communications. In the meantime, today's online experience depends progressively on social association. One of the main concerns in social network is establishing a successful business plan to make more profit from the social network. Doing a business on every platform needs a good business plan with some important solutions such as advertise the products or services of other companies which would be a kind of marketing for those external businesses. In this study a philosophy of a system speaking to of a comprehensive structure of advertisement recommender system for social networks will be presented. The framework uses a semantic logic to provide the recommended products and this capability can differentiate the recommender part of the framework from classical recommender methods. Briefly, the framework proposed in this study has been designed in a form that can generate advertisement recommendations in a simplified and effective way for social network users.
With the massive amounts of data available today, it is common to store and process data using multiple machines. Parallel programming platforms such as MapReduce and its variants are popular frameworks for handling such large data. We present the first provably efficient algorithms to compute, store, and query data structures for range queries and approximate nearest neighbor queries in a popular parallel computing abstraction that captures the salient features of MapReduce and other massively parallel communication (MPC) models. In particular, we describe algorithms for \$kd\$-trees, range trees, and BBD-trees that only require O(1) rounds of communication for both preprocessing and querying while staying competitive in terms of running time and workload to their classical counterparts. Our algorithms are randomized, but they can be made deterministic at some increase in their running time and workload while keeping the number of rounds of communication to be constant.
We study the problem of approximate nearest neighbor search in \$d\$-dimensional Hamming space \0,1\d. We study the complexity of the problem in the famous cell-probe model, a classic model for data structures. We consider algorithms in the cell-probe model with limited adaptivity, where the algorithm makes k rounds of parallel accesses to the data structure for a given k. For any k ≥ 1, we give a simple randomized algorithm solving the approximate nearest neighbor search using k rounds of parallel memory accesses, with O(k(log d)1/k) accesses in total. We also give a more sophisticated randomized algorithm using O(k+(1/k log d)O(1/k)) memory accesses in k rounds for large enough k. Both algorithms use data structures of size polynomial in n, the number of points in the database. We prove an Ω(1/k(log d)1/k) lower bound for the total number of memory accesses required by any randomized algorithm solving the approximate nearest neighbor search within k ≤ (log log d)/(2 log log log d) rounds of parallel memory accesses on any data structures of polynomial size. This lower bound shows that our first algorithm is asymptotically optimal for any constant round k. And our second algorithm approaches the asymptotically optimal tradeoff between rounds and memory accesses, in a sense that the lower bound of memory accesses for any k1 rounds can be matched by the algorithm within k2=O(k1) rounds. In the extreme, for some large enough k=Θ((log log d)/(log log log d)), our second algorithm matches the Θ((log log d)/(log log log d)) tight bound for fully adaptive algorithms for approximate nearest neighbor search due to Chakrabarti and Regev.
Due to the "curse of dimensionality" problem, it is very expensive to process the nearest neighbor (NN) query in high-dimensional spaces; and hence, approximate approaches, such as Locality-Sensitive Hashing (LSH), are widely used for their theoretical guarantees and empirical performance. Current LSH-based approaches target at the L1 and L2 spaces, while as shown in previous work, the fractional distance metrics (Lp metrics with 0 textless p textless 1) can provide more insightful results than the usual L1 and L2 metrics for data mining and multimedia applications. However, none of the existing work can support multiple fractional distance metrics using one index. In this paper, we propose LazyLSH that answers approximate nearest neighbor queries for multiple Lp metrics with theoretical guarantees. Different from previous LSH approaches which need to build one dedicated index for every query space, LazyLSH uses a single base index to support the computations in multiple Lp spaces, significantly reducing the maintenance overhead. Extensive experiments show that LazyLSH provides more accurate results for approximate kNN search under fractional distance metrics.
With the advantage in compact representation and efficient comparison, binary hashing has been extensively investigated for approximate nearest neighbor search. In this paper, we propose a novel and general hashing framework, which simultaneously considers a new linear pair-wise distance preserving objective and point-wise constraint. The direct distance preserving objective aims to keep the linear relationships between the Euclidean distance and the Hamming distance of data points. Based on different point-wise constraints, we propose two methods to instantiate this framework. The first one is a pseudo-supervised hashing method, which uses existing unsupervised hashing methods to generate binary codes as pseudo-supervised information. The second one is an unsupervised hashing method, in which quantization loss is considered. We validate our framework on two large-scale datasets. The experiments demonstrate that our pseudo-supervised method achieves consistent improvement for the state-of-the-art unsupervised hashing methods, while our unsupervised method outperforms the state-of-the-art methods.
Today's applications asking for finding spatial protests nearest to a predefined area in the meantime fulfill limitation of keywords. Best answer for such questions depends on the IR2-tree, which has some inadequacies that truly affect system s efficiency. To defeat those inadequacies another access strategy is produced called the Spatial-inverted Index (SI) that extends the modified file to adapt to multidimensional information, and accompanies calculations that can answer closest neighbor queries with keywords continuously. This new technique SI is produced broadens the capacities of routine modified record makes do with multidimensional information, alongside the arrangement of using so as to move reach queries replied SI results to calculation which tackles the issue continuously.
This paper describes an approach where group testing helps in enforcing security and privacy in identification. We detail a particular scheme based on embedding and group testing. We add a second layer of defense, group vectors, where each group vector represents a set of dataset vectors. Whereas the selected embedding poorly protects the data when used alone, the group testing approach makes it much harder to reconstruct the data when combined with the embedding. Even when curious server and user collude to disclose the secret parameters, they cannot accurately recover the data. Another byproduct of our approach is that it reduces the complexity of the search and the required storage space. We show the interest of our work in a benchmark biometrics dataset, where we verify our theoretical analysis with real data.