Visible to the public Biblio

Found 100 results

Filters: Keyword is Memory management  [Clear All Filters]
2020-03-23
Qin, Peng, Tan, Cheng, Zhao, Lei, Cheng, Yueqiang.  2019.  Defending against ROP Attacks with Nearly Zero Overhead. 2019 IEEE Global Communications Conference (GLOBECOM). :1–6.
Return-Oriented Programming (ROP) is a sophisticated exploitation technique that is able to drive target applications to perform arbitrary unintended operations by constructing a gadget chain reusing existing small code sequences (gadgets) collected across the entire code space. In this paper, we propose to address ROP attacks from a different angle-shrinking available code space at runtime. We present ROPStarvation , a generic and transparent ROP countermeasure that defend against all types of ROP attacks with almost zero run-time overhead. ROPStarvation does not aim to completely stop ROP attacks, instead it attempts to significantly increase the bar by decreasing the possibility of launching a successful ROP exploit in reality. Moreover, shrinking available code space at runtime is lightweight that makes ROPStarvation practical for being deployed with high performance requirement. Results show that ROPStarvation successfully reduces the code space of target applications by 85%. With the reduced code segments, ROPStarvation decreases the probability of building a valid ROP gadget chain by 100% and 83% respectively, with the assumptions that whether the adversary knows the vulnerable applications are protected by ROPStarvation . Evaluations on the SPEC CPU2006 benchmark show that ROPStarvation introduces nearly zero (0.2% on average) run-time performance overhead.
2020-03-02
Serpanos, Dimitrios, Stachoulis, Dimitrios.  2019.  Secure Memory for Embedded Tamper-Proof Systems. 2019 14th International Conference on Design Technology of Integrated Systems In Nanoscale Era (DTIS). :1–4.

Data leakage and disclosure to attackers is a significant problem in embedded systems, considering the ability of attackers to get physical access to the systems. We present methods to protect memory data leakage in tamper-proof embedded systems. We present methods that exploit memory supply voltage manipulation to change the memory contents, leading to an operational and reusable memory or to destroy memory cell circuitry. For the case of memory data change, we present scenaria for data change to a known state and to a random state. The data change scenaria are effective against attackers who cannot detect the existence of the protection circuitry; furthermore, original data can be calculated in the case of data change to a known state, if the attacker identifies the protection circuitry and its operation. The methods that change memory contents to a random state or destroy memory cell circuitry lead to irreversible loss of the original data. However, since the known state can be used to calculate the original data.

2020-02-24
Srivastava, Ankush, Ghosh, Prokash.  2019.  An Efficient Memory Zeroization Technique Under Side-Channel Attacks. 2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID). :76–81.
Protection of secured data content in volatile memories (processor caches, embedded RAMs etc) is essential in networking, wireless, automotive and other embedded secure applications. It is utmost important to protect secret data, like authentication credentials, cryptographic keys etc., stored over volatile memories which can be hacked during normal device operations. Several security attacks like cold boot, disclosure attack, data remanence, physical attack, cache attack etc. can extract the cryptographic keys or secure data from volatile memories of the system. The content protection of memory is typically done by assuring data deletion in minimum possible time to minimize data remanence effects. In today's state-of-the-art SoCs, dedicated hardwares are used to functionally erase the private memory contents in case of security violations. This paper, in general, proposes a novel approach of using existing memory built-in-self-test (MBIST) hardware to zeroize (initialize memory to all zeros) on-chip memory contents before it is being hacked either through different side channels or secuirty attacks. Our results show that the proposed MBIST based content zeroization approach is substantially faster than conventional techniques. By adopting the proposed approach, functional hardware requirement for memory zeroization can be waived.
2019-12-30
Sharma, Mukesh Kumar, Somwanshi, Devendra.  2018.  Improvement in Homomorphic Encryption Algorithm with Elliptic Curve Cryptography and OTP Technique. 2018 3rd International Conference and Workshops on Recent Advances and Innovations in Engineering (ICRAIE). :1–6.
Cloud computing is a technology is where client require not to stress over the expense of equipment establishment and their support cost. Distributed computing is presently turned out to be most prominent innovation on account of its accessibility, ease and some different elements. Yet, there is a few issues in distributed computing, the principle one is security in light of the fact that each client store their valuable information on the system so they need their information ought to be shielded from any unapproved get to, any progressions that isn't done for client's benefit. To take care of the issue of Key administration, Key Sharing different plans have been proposed. The outsider examiner is the plan for key administration and key sharing. The primary preferred standpoint of this is the cloud supplier can encourage the administration which was accessible by the customary outsider evaluator and make it trustful. The outsider examining plan will be fizzled, if the outsider's security is endangered or of the outsider will be malignant. To take care of the issue, there is another modular for key sharing and key administration in completely Homomorphic Encryption conspire is outlined. In this paper we utilized the symmetric key understanding calculation named Diffie Hellman to make session key between two gatherings who need to impart and elliptic curve cryptography to create encryption keys rather than RSA and have utilized One Time Password (OTP) for confirming the clients.
2019-10-15
Vyakaranal, S., Kengond, S..  2018.  Performance Analysis of Symmetric Key Cryptographic Algorithms. 2018 International Conference on Communication and Signal Processing (ICCSP). :0411–0415.
Data's security being important aspect of the today's internet is gaining more importance day by day. With the increase in online data exchange, transactions and payments; secure payment and secure data transfers have become an area of concern. Cryptography makes the data transmission over the internet secure by various methods, algorithms. Cryptography helps in avoiding the unauthorized people accessing the data by authentication, confidentiality, integrity and non-repudiation. In order to securely transmit the data many cryptographic algorithms are present, but the algorithm to be used should be robust, efficient, cost effective, high performance and easily deployable. Choosing an algorithm which suits the customer's requirement is an utmost important task. The proposed work discusses different symmetric key cryptographic algorithms like DES, 3DES, AES and Blowfish by considering encryption time, decryption time, entropy, memory usage, throughput, avalanche effect and energy consumption by practical implementation using java. Practical implementation of algorithms has been highlighted in proposed work considering tradeoff performance in terms of cost of various parameters rather than mere theoretical concepts. Battery consumption and avalanche effect of algorithms has been discussed. It reveals that AES performs very well in overall performance analysis among considered algorithms.
2019-09-26
Pfeffer, T., Herber, P., Druschke, L., Glesner, S..  2018.  Efficient and Safe Control Flow Recovery Using a Restricted Intermediate Language. 2018 IEEE 27th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE). :235-240.

Approaches for the automatic analysis of security policies on source code level cannot trivially be applied to binaries. This is due to the lacking high-level semantics of low-level object code, and the fundamental problem that control-flow recovery from binaries is difficult. We present a novel approach to recover the control-flow of binaries that is both safe and efficient. The key idea of our approach is to use the information contained in security mechanisms to approximate the targets of computed branches. To achieve this, we first define a restricted control transition intermediate language (RCTIL), which restricts the number of possible targets for each branch to a finite number of given targets. Based on this intermediate language, we demonstrate how a safe model of the control flow can be recovered without data-flow analyses. Our evaluation shows that that makes our solution more efficient than existing solutions.

2019-02-13
Ahmed, N., Talib, M. A., Nasir, Q..  2018.  Program-flow attestation of IoT systems software. 2018 15th Learning and Technology Conference (L T). :67–73.
Remote attestation is the process of measuring the integrity of a device over the network, by detecting modification of software or hardware from the original configuration. Several remote software-based attestation mechanisms have been introduced, that rely on strict time constraints and other impractical constraints that make them inconvenient for IoT systems. Although some research is done to address these issues, they integrated trusted hardware devices to the attested devices to accomplish their aim, which is costly and not convenient for many use cases. In this paper, we propose “Dual Attestation” that includes two stages: static and dynamic. The static attestation phase checks the memory of the attested device. The dynamic attestation technique checks the execution correctness of the application code and can detect the runtime attacks. The objectives are to minimize the overhead and detect these attacks, by developing an optimized dynamic technique that checks the application program flow. The optimization will be done in the prover and the verifier sides.
2019-01-31
Larisch, James, Mickens, James, Kohler, Eddie.  2018.  Alto: Lightweight VMs Using Virtualization-Aware Managed Runtimes. Proceedings of the 15th International Conference on Managed Languages & Runtimes. :8:1–8:7.

Virtualization enables datacenter operators to safely run computations that belong to untrusted tenants. An ideal virtual machine has three properties: a small memory footprint; strong isolation from other VMs and the host OS; and the ability to maintain in-memory state across client requests. Unfortunately, modern virtualization technologies cannot provide all three properties at once. In this paper, we explain why, and propose a new virtualization approach, called Alto, that virtualizes at the layer of a managed runtime interface. Through careful design of (1) the application-facing managed interface and (2) the internal runtime architecture, Alto provides VMs that are small, secure, and stateful. Conveniently, Alto also simplifies VM operations like suspension, migration, and resumption. We provide several details about the proposed design, and discuss the remaining challenges that must be solved to fully realize the Alto vision.

2018-12-03
Chakrabarti, Somnath, Leslie-Hurd, Rebekah, Vij, Mona, McKeen, Frank, Rozas, Carlos, Caspi, Dror, Alexandrovich, Ilya, Anati, Ittai.  2017.  Intel® Software Guard Extensions (Intel® SGX) Architecture for Oversubscription of Secure Memory in a Virtualized Environment. Proceedings of the Hardware and Architectural Support for Security and Privacy. :7:1–7:8.

As workloads and data move to the cloud, it is essential that software writers are able to protect their applications from untrusted hardware, systems software, and co-tenants. Intel® Software Guard Extensions (SGX) enables a new mode of execution that is protected from attacks in such an environment with strong confidentiality, integrity, and replay protection guarantees. Though SGX supports memory oversubscription via paging, virtualizing the protected memory presents a significant challenge to Virtual Machine Monitor (VMM) writers and comes with a high performance overhead. This paper introduces SGX Oversubscription Extensions that add additional instructions and virtualization support to the SGX architecture so that cloud service providers can oversubscribe secure memory in a less complex and more performant manner.

2018-06-11
Zhang, X., Li, R., Zhao, H..  2017.  Neighbor-aware based forwarding strategy in NDN-MANET. 2017 11th IEEE International Conference on Anti-counterfeiting, Security, and Identification (ASID). :125–129.

Named Data Networking (NDN) is a future Internet architecture, NDN forwarding strategy is a hot research topic in MANET. At present, there are two categories of forwarding strategies in NDN. One is the blind forwarding(BF), the other is the aware forwarding(AF). Data packet return by the way that one came forwarding strategy(DRF) as one of the BF strategy may fail for the interruptions of the path that are caused by the mobility of nodes. Consumer need to wait until the interest packet times out to request the data packet again. To solve the insufficient of DRF, in this paper a Forwarding Strategy, called FN based on Neighbor-aware is proposed for NDN MANET. The node maintains the neighbor information and the request information of neighbor nodes. In the phase of data packet response, in order to improve request satisfaction rate, node specifies the next hop node; Meanwhile, in order to reduce packet loss rate, node assists the last hop node to forward packet to the specific node. The simulation results show that compared with DRF and greedy forwarding(GF) strategy, FN can improve request satisfaction rate when node density is high.

Moons, B., Goetschalckx, K., Berckelaer, N. Van, Verhelst, M..  2017.  Minimum energy quantized neural networks. 2017 51st Asilomar Conference on Signals, Systems, and Computers. :1921–1925.
This work targets the automated minimum-energy optimization of Quantized Neural Networks (QNNs) - networks using low precision weights and activations. These networks are trained from scratch at an arbitrary fixed point precision. At iso-accuracy, QNNs using fewer bits require deeper and wider network architectures than networks using higher precision operators, while they require less complex arithmetic and less bits per weights. This fundamental trade-off is analyzed and quantified to find the minimum energy QNN for any benchmark and hence optimize energy-efficiency. To this end, the energy consumption of inference is modeled for a generic hardware platform. This allows drawing several conclusions across different benchmarks. First, energy consumption varies orders of magnitude at iso-accuracy depending on the number of bits used in the QNN. Second, in a typical system, BinaryNets or int4 implementations lead to the minimum energy solution, outperforming int8 networks up to 2-10× at iso-accuracy. All code used for QNN training is available from https://github.com/BertMoons/.
2018-06-07
Marques, J., Andrade, J., Falcao, G..  2017.  Unreliable memory operation on a convolutional neural network processor. 2017 IEEE International Workshop on Signal Processing Systems (SiPS). :1–6.

The evolution of convolutional neural networks (CNNs) into more complex forms of organization, with additional layers, larger convolutions and increasing connections, established the state-of-the-art in terms of accuracy errors for detection and classification challenges in images. Moreover, as they evolved to a point where Gigabytes of memory are required for their operation, we have reached a stage where it becomes fundamental to understand how their inference capabilities can be impaired if data elements somehow become corrupted in memory. This paper introduces fault-injection in these systems by simulating failing bit-cells in hardware memories brought on by relaxing the 100% reliable operation assumption. We analyze the behavior of these networks calculating inference under severe fault-injection rates and apply fault mitigation strategies to improve on the CNNs resilience. For the MNIST dataset, we show that 8x less memory is required for the feature maps memory space, and that in sub-100% reliable operation, fault-injection rates up to 10-1 (with most significant bit protection) can withstand only a 1% error probability degradation. Furthermore, considering the offload of the feature maps memory to an embedded dynamic RAM (eDRAM) system, using technology nodes from 65 down to 28 nm, up to 73 80% improved power efficiency can be obtained.

2018-02-21
Muñoz, C., Wang, L., Solana, E., Crowcroft, J..  2017.  I(FIB)F: Iterated bloom filters for routing in named data networks. 2017 International Conference on Networked Systems (NetSys). :1–8.

Named Data Networks provide a clean-slate redesign of the Future Internet for efficient content distribution. Because Internet of Things are expected to compose a significant part of Future Internet, most content will be managed by constrained devices. Such devices are often equipped with limited CPU, memory, bandwidth, and energy supply. However, the current Named Data Networks design neglects the specific requirements of Internet of Things scenarios and many data structures need to be further optimized. The purpose of this research is to provide an efficient strategy to route in Named Data Networks by constructing a Forwarding Information Base using Iterated Bloom Filters defined as I(FIB)F. We propose the use of content names based on iterative hashes. This strategy leads to reduce the overhead of packets. Moreover, the memory and the complexity required in the forwarding strategy are lower than in current solutions. We compare our proposal with solutions based on hierarchical names and Standard Bloom Filters. We show how to further optimize I(FIB)F by exploiting the structure information contained in hierarchical content names. Finally, two strategies may be followed to reduce: (i) the overall memory for routing or (ii) the probability of false positives.

2018-02-06
Detken, K. O., Jahnke, M., Rix, T., Rein, A..  2017.  Software-Design for Internal Security Checks with Dynamic Integrity Measurement (DIM). 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS). 1:367–373.

Most security software tools try to detect malicious components by cryptographic hashes, signatures or based on their behavior. The former, is a widely adopted approach based on Integrity Measurement Architecture (IMA) enabling appraisal and attestation of system components. The latter, however, may induce a very long time until misbehavior of a component leads to a successful detection. Another approach is a Dynamic Runtime Attestation (DRA) based on the comparison of binary code loaded in the memory and well-known references. Since DRA is a complex approach, involving multiple related components and often complex attestation strategies, a flexible and extensible architecture is needed. In a cooperation project an architecture was designed and a Proof of Concept (PoC) successfully developed and evaluated. To achieve needed flexibility and extensibility, the implementation facilitates central components providing attestation strategies (guidelines). These guidelines define and implement the necessary steps for all relevant attestation operations, i.e. measurement, reference generation and verification.

2018-02-02
Sprabery, R., Estrada, Z. J., Kalbarczyk, Z., Iyer, R., Bobba, R. B., Campbell, R..  2017.  Trustworthy Services Built on Event-Based Probing for Layered Defense. 2017 IEEE International Conference on Cloud Engineering (IC2E). :215–225.

Numerous event-based probing methods exist for cloud computing environments allowing a hypervisor to gain insight into guest activities. Such event-based probing has been shown to be useful for detecting attacks, system hangs through watchdogs, and for inserting exploit detectors before a system can be patched, among others. Here, we illustrate how to use such probing for trustworthy logging and highlight some of the challenges that existing event-based probing mechanisms do not address. Challenges include ensuring a probe inserted at given address is trustworthy despite the lack of attestation available for probes that have been inserted dynamically. We show how probes can be inserted to ensure proper logging of every invocation of a probed instruction. When combined with attested boot of the hypervisor and guest machines, we can ensure the output stream of monitored events is trustworthy. Using these techniques we build a trustworthy log of certain guest-system-call events. The log powers a cloud-tuned Intrusion Detection System (IDS). New event types are identified that must be added to existing probing systems to ensure attempts to circumvent probes within the guest appear in the log. We highlight the overhead penalties paid by guests to increase guarantees of log completeness when faced with attacks on the guest kernel. Promising results (less that 10% for guests) are shown when a guest relaxes the trade-off between log completeness and overhead. Our demonstrative IDS detects common attack scenarios with simple policies built using our guest behavior recording system.

2018-01-16
Vavala, B., Neves, N., Steenkiste, P..  2017.  Secure Tera-scale Data Crunching with a Small TCB. 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :169–180.

Outsourcing services to third-party providers comes with a high security cost-to fully trust the providers. Using trusted hardware can help, but current trusted execution environments do not adequately support services that process very large scale datasets. We present LASTGT, a system that bridges this gap by supporting the execution of self-contained services over a large state, with a small and generic trusted computing base (TCB). LASTGT uses widely deployed trusted hardware to guarantee integrity and verifiability of the execution on a remote platform, and it securely supplies data to the service through simple techniques based on virtual memory. As a result, LASTGT is general and applicable to many scenarios such as computational genomics and databases, as we show in our experimental evaluation based on an implementation of LAST-GT on a secure hypervisor. We also describe a possible implementation on Intel SGX.

2017-12-20
Rubin, S. H., Grefe, W. K., Bouabana-Tebibel, T., Chen, S. C., Shyu, M. L., Simonsen, K. S..  2017.  Cyber-Secure UAV Communications Using Heuristically Inferred Stochastic Grammars and Hard Real-Time Adaptive Waveform Synthesis and Evolution. 2017 IEEE International Conference on Information Reuse and Integration (IRI). :9–15.
Summary form only given. Strong light-matter coupling has been recently successfully explored in the GHz and THz [1] range with on-chip platforms. New and intriguing quantum optical phenomena have been predicted in the ultrastrong coupling regime [2], when the coupling strength Ω becomes comparable to the unperturbed frequency of the system ω. We recently proposed a new experimental platform where we couple the inter-Landau level transition of an high-mobility 2DEG to the highly subwavelength photonic mode of an LC meta-atom [3] showing very large Ω/ωc = 0.87. Our system benefits from the collective enhancement of the light-matter coupling which comes from the scaling of the coupling Ω ∝ √n, were n is the number of optically active electrons. In our previous experiments [3] and in literature [4] this number varies from 104-103 electrons per meta-atom. We now engineer a new cavity, resonant at 290 GHz, with an extremely reduced effective mode surface Seff = 4 × 10-14 m2 (FE simulations, CST), yielding large field enhancements above 1500 and allowing to enter the few (\textbackslashtextless;100) electron regime. It consist of a complementary metasurface with two very sharp metallic tips separated by a 60 nm gap (Fig.1(a, b)) on top of a single triangular quantum well. THz-TDS transmission experiments as a function of the applied magnetic field reveal strong anticrossing of the cavity mode with linear cyclotron dispersion. Measurements for arrays of only 12 cavities are reported in Fig.1(c). On the top horizontal axis we report the number of electrons occupying the topmost Landau level as a function of the magnetic field. At the anticrossing field of B=0.73 T we measure approximately 60 electrons ultra strongly coupled (Ω/ω- \textbackslashtextbar\textbackslashtextbar
2017-11-20
Haq, M. S. Ul, Lejian, L., Lerong, M..  2016.  Transitioning Native Application into Virtual Machine by Using Hardware Virtualization Extensions. 2016 International Symposium on Computer, Consumer and Control (IS3C). :397–403.

In presence of known and unknown vulnerabilities in code and flow control of programs, virtual machine alike isolation and sandboxing to confine maliciousness of process, by monitoring and controlling the behaviour of untrusted application, is an effective strategy. A confined malicious application cannot effect system resources and other applications running on same operating system. But present techniques used for sandboxing have some drawbacks ranging from scope to methodology. Some of proposed techniques restrict specific aspect of execution e.g. system calls and file system access. In the same way techniques that truly isolate the application by providing separate execution environment either require modification in kernel or full blown operating system. Moreover these do not provide isolation from top to bottom but only virtualize operating system services. In this paper, we propose a design to confine native Linux process in virtual machine equivalent isolation by using hardware virtualization extensions with nominal initialization and acceptable execution overheads. We implemented our prototype called Process Virtual Machine that transition a native process into virtual machine, provides minimal possible execution environment, intercept and virtualize system calls to execute it on host kernel. Experimental results show effectiveness of our proposed technique.

2017-04-20
Nikolenko, S. I., Kogan, K., Rétvári, G., Bérczi-Kovács, E. R., Shalimov, A..  2016.  How to represent IPv6 forwarding tables on IPv4 or MPLS dataplanes. 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). :521–526.

The Internet routing ecosystem is facing substantial scalability challenges on the data plane. Various “clean slate” architectures for representing forwarding tables (FIBs), such as IPv6, introduce additional constraints on efficient implementations from both lookup time and memory footprint perspectives due to significant classification width. In this work, we propose an abstraction layer able to represent IPv6 FIBs on existing IP and even MPLS infrastructure. Feasibility of the proposed representations is confirmed by an extensive simulation study on real IPv6 forwarding tables, including low-level experimental performance evaluation.

Agarwal, N., Paul, K..  2016.  XEBRA: XEn Based Remote Attestation. 2016 IEEE Region 10 Conference (TENCON). :2383–2386.

Modern computing environments are increasingly getting distributed with one machine executing programs on the other remotely. Often, multiple machines work together to complete a task. Its important for collaborating machines to trust each other in order to perform properly. Such scenarios have brought up a key security issue of trustably and securely executing critical code on remote machines. We present a purely software based remote attestation technique XEBRA(XEn Based Remote Attestation) that guarantees the execution of correct code on a remote host, termed as remote attestation. XEBRA can be used to establish dynamic root of trust in a remote computing device using virtualization. We also show our approach to be feasible on embedded platforms by implementing it on an Intel Galileo board.

2017-02-21
M. Moradi, F. Qian, Q. Xu, Z. M. Mao, D. Bethea, M. K. Reiter.  2015.  "Caesar: high-speed and memory-efficient forwarding engine for future internet architecture". 2015 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). :171-182.

In response to the critical challenges of the current Internet architecture and its protocols, a set of so-called clean slate designs has been proposed. Common among them is an addressing scheme that separates location and identity with self-certifying, flat and non-aggregatable address components. Each component is long, reaching a few kilobits, and would consume an amount of fast memory in data plane devices (e.g., routers) that is far beyond existing capacities. To address this challenge, we present Caesar, a high-speed and length-agnostic forwarding engine for future border routers, performing most of the lookups within three fast memory accesses. To compress forwarding states, Caesar constructs scalable and reliable Bloom filters in Ternary Content Addressable Memory (TCAM). To guarantee correctness, Caesar detects false positives at high speed and develops a blacklisting approach to handling them. In addition, we optimize our design by introducing a hashing scheme that reduces the number of hash computations from k to log(k) per lookup based on hash coding theory. We handle routing updates while keeping filters highly utilized in address removals. We perform extensive analysis and simulations using real traffic and routing traces to demonstrate the benefits of our design. Our evaluation shows that Caesar is more energy-efficient and less expensive (in terms of total cost) compared to optimized IPv6 TCAM-based solutions by up to 67% and 43% respectively. In addition, the total cost of our design is approximately the same for various address lengths.

2015-05-06
Kanizo, Y., Hay, D., Keslassy, I..  2015.  Maximizing the Throughput of Hash Tables in Network Devices with Combined SRAM/DRAM Memory. Parallel and Distributed Systems, IEEE Transactions on. 26:796-809.

Hash tables form a core component of many algorithms as well as network devices. Because of their large size, they often require a combined memory model, in which some of the elements are stored in a fast memory (for example, cache or on-chip SRAM) while others are stored in much slower memory (namely, the main memory or off-chip DRAM). This makes the implementation of real-life hash tables particularly delicate, as a suboptimal choice of the hashing scheme parameters may result in a higher average query time, and therefore in a lower throughput. In this paper, we focus on multiple-choice hash tables. Given the number of choices, we study the tradeoff between the load of a hash table and its average lookup time. The problem is solved by analyzing an equivalent problem: the expected maximum matching size of a random bipartite graph with a fixed left-side vertex degree. Given two choices, we provide exact results for any finite system, and also deduce asymptotic results as the fast memory size increases. In addition, we further consider other variants of this problem and model the impact of several parameters. Finally, we evaluate the performance of our models on Internet backbone traces, and illustrate the impact of the memories speed difference on the choice of parameters. In particular, we show that the common intuition of entirely avoiding slow memory accesses by using highly efficient schemes (namely, with many fast-memory choices) is not always optimal.
 

Yang Xu, Zhaobo Liu, Zhuoyuan Zhang, Chao, H.J..  2014.  High-Throughput and Memory-Efficient Multimatch Packet Classification Based on Distributed and Pipelined Hash Tables. Networking, IEEE/ACM Transactions on. 22:982-995.

The emergence of new network applications, such as the network intrusion detection system and packet-level accounting, requires packet classification to report all matched rules instead of only the best matched rule. Although several schemes have been proposed recently to address the multimatch packet classification problem, most of them require either huge memory or expensive ternary content addressable memory (TCAM) to store the intermediate data structure, or they suffer from steep performance degradation under certain types of classifiers. In this paper, we decompose the operation of multimatch packet classification from the complicated multidimensional search to several single-dimensional searches, and present an asynchronous pipeline architecture based on a signature tree structure to combine the intermediate results returned from single-dimensional searches. By spreading edges of the signature tree across multiple hash tables at different stages, the pipeline can achieve a high throughput via the interstage parallel access to hash tables. To exploit further intrastage parallelism, two edge-grouping algorithms are designed to evenly divide the edges associated with each stage into multiple work-conserving hash tables. To avoid collisions involved in hash table lookup, a hybrid perfect hash table construction scheme is proposed. Extensive simulation using realistic classifiers and traffic traces shows that the proposed pipeline architecture outperforms HyperCuts and B2PC schemes in classification speed by at least one order of magnitude, while having a similar storage requirement. Particularly, with different types of classifiers of 4K rules, the proposed pipeline architecture is able to achieve a throughput between 26.8 and 93.1 Gb/s using perfect hash tables.

Hyesook Lim, Kyuhee Lim, Nara Lee, Kyong-Hye Park.  2014.  On Adding Bloom Filters to Longest Prefix Matching Algorithms. Computers, IEEE Transactions on. 63:411-423.

High-speed IP address lookup is essential to achieve wire-speed packet forwarding in Internet routers. Ternary content addressable memory (TCAM) technology has been adopted to solve the IP address lookup problem because of its ability to perform fast parallel matching. However, the applicability of TCAMs presents difficulties due to cost and power dissipation issues. Various algorithms and hardware architectures have been proposed to perform the IP address lookup using ordinary memories such as SRAMs or DRAMs without using TCAMs. Among the algorithms, we focus on two efficient algorithms providing high-speed IP address lookup: parallel multiple-hashing (PMH) algorithm and binary search on level algorithm. This paper shows how effectively an on-chip Bloom filter can improve those algorithms. A performance evaluation using actual backbone routing data with 15,000-220,000 prefixes shows that by adding a Bloom filter, the complicated hardware for parallel access is removed without search performance penalty in parallel-multiple hashing algorithm. Search speed has been improved by 30-40 percent by adding a Bloom filter in binary search on level algorithm.
 

2015-05-05
Elwell, J., Riley, R., Abu-Ghazaleh, N., Ponomarev, D..  2014.  A Non-Inclusive Memory Permissions architecture for protection against cross-layer attacks. High Performance Computer Architecture (HPCA), 2014 IEEE 20th International Symposium on. :201-212.

Protecting modern computer systems and complex software stacks against the growing range of possible attacks is becoming increasingly difficult. The architecture of modern commodity systems allows attackers to subvert privileged system software often using a single exploit. Once the system is compromised, inclusive permissions used by current architectures and operating systems easily allow a compromised high-privileged software layer to perform arbitrary malicious activities, even on behalf of other software layers. This paper presents a hardware-supported page permission scheme for the physical pages that is based on the concept of non-inclusive sets of memory permissions for different layers of system software such as hypervisors, operating systems, and user-level applications. Instead of viewing privilege levels as an ordered hierarchy with each successive level being more privileged, we view them as distinct levels each with its own set of permissions. Such a permission mechanism, implemented as part of a processor architecture, provides a common framework for defending against a range of recent attacks. We demonstrate that such a protection can be achieved with negligible performance overhead, low hardware complexity and minimal changes to the commodity OS and hypervisor code.