Visible to the public Secure-by-construction Composable Componentry for Network Processing

TitleSecure-by-construction Composable Componentry for Network Processing
Publication TypeConference Paper
Year of Publication2014
AuthorsDurbeck, Lisa J. K., Athanas, Peter M., Macias, Nicholas J.
Conference NameProceedings of the 2014 Symposium and Bootcamp on the Science of Security
PublisherACM
Conference LocationRaleigh, NC, USA
ISBN Number978-1-4503-2907-1
Keywords100 Gbps, ACM CCS, Block and Stream Ciphers, CPS Technologies, cryptography, Data Anonymization and Sanitization, Database and Storage Security, embedded hardware, Embedded Software, Embedded Systems Security, Foundations, hardware-software co-design, line-speed processor, network processor, science of security, secure-by-construction, Security in Hardware, stream processing, Symmetric Cryptography and Hash Functions, Systems Engineering, Tamper-Proof/Resistant Designs
Abstract

Techniques commonly used for analyzing streaming video, audio, SIGINT, and network transmissions, at less-than-streaming rates, such as data decimation and ad-hoc sampling, can miss underlying structure, trends and specific events held in the data[3]. This work presents a secure-by-construction approach [7] for the upper-end data streams with rates from 10- to 100 Gigabits per second. The secure-by-construction approach strives to produce system security through the composition of individually secure hardware and software components. The proposed network processor can be used not only at data centers but also within networks and onboard embedded systems at the network periphery for a wide range of tasks, including preprocessing and data cleansing, signal encoding and compression, complex event processing, flow analysis, and other tasks related to collecting and analyzing streaming data. Our design employs a four-layer scalable hardware/software stack that can lead to inherently secure, easily constructed specialized high-speed stream processing. This work addresses the following contemporary problems: (1) There is a lack of hardware/software systems providing stream processing and data stream analysis operating at the target data rates; for high-rate streams the implementation options are limited: all-software solutions can't attain the target rates[1]. GPUs and GPGPUs are also infeasible: they were not designed for I/O at 10-100Gbps; they also have asymmetric resources for input and output and thus cannot be pipelined[4, 2], whereas custom chip-based solutions are costly and inflexible to changes, and FPGA-based solutions are historically hard to program[6]; (2) There is a distinct advantage to utilizing high-bandwidth or line-speed analytics to reduce time-to-discovery of information, particularly ones that can be pipelined together to conduct a series of processing tasks or data tests without impeding data rates; (3) There is potentially significant network infrastructure cost savings possible from compact and power-efficient analytic support deployed at the network periphery on the data source or one hop away; (4) There is a need for agile deployment in response to changing objectives; (5) There is an opportunity to constrain designs to use only secure components to achieve their specific objectives. We address these five problems in our stream processor design to provide secure, easily specified processing for low-latency, low-power 10-100Gbps in-line processing on top of a commodity high-end FPGA-based hardware accelerator network processor. With a standard interface a user can snap together various filter blocks, like Legos(tm), to form a custom processing chain. The overall design is a four-layer solution in which the structurally lowest layer provides the vast computational power to process line-speed streaming packets, and the uppermost layer provides the agility to easily shape the system to the properties of a given application. Current work has focused on design of the two lowest layers, highlighted in the design detail in Figure 1. The two layers shown in Figure 1 are the embeddable portion of the design; these layers, operating at up to 100Gbps, capture both the low- and high frequency components of a signal or stream, analyze them directly, and pass the lower frequency components, residues to the all-software upper layers, Layers 3 and 4; they also optionally supply the data-reduced output up to Layers 3 and 4 for additional processing. Layer 1 is analogous to a systolic array of processors on which simple low-level functions or actions are chained in series[5]. Examples of tasks accomplished at the lowest layer are: (a) check to see if Field 3 of the packet is greater than 5, or (b) count the number of X.75 packets, or (c) select individual fields from data packets. Layer 1 provides the lowest latency, highest throughput processing, analysis and data reduction, formulating raw facts from the stream; Layer 2, also accelerated in hardware and running at full network line rate, combines selected facts from Layer 1, forming a first level of information kernels. Layer 2 is comprised of a number of combiners intended to integrate facts extracted from Layer 1 for presentation to Layer 3. Still resident in FPGA hardware and hardware-accelerated, a Layer 2 combiner is comprised of state logic and soft-core microprocessors. Layer 3 runs in software on a host machine, and is essentially the bridge to the embeddable hardware; this layer exposes an API for the consumption of information kernels to create events and manage state. The generated events and state are also made available to an additional software Layer 4, supplying an interface to traditional software-based systems. As shown in the design detail, network data transitions systolically through Layer 1, through a series of light-weight processing filters that extract and/or modify packet contents. All filters have a similar interface: streams enter from the left, exit the right, and relevant facts are passed upward to Layer 2. The output of the end of the chain in Layer 1 shown in the Figure 1 can be (a) left unconnected (for purely monitoring activities), (b) redirected into the network (for bent pipe operations), or (c) passed to another identical processor, for extended processing on a given stream (scalability).

URLhttp://doi.acm.org/10.1145/2600176.2600203
DOI10.1145/2600176.2600203
Citation KeyDurbeck:2014:SCC:2600176.2600203