Biblio
Data analytics and telemetry have become paramount to monitoring and maintaining quality-of-service in addition to business analytics. Stream processing-a model where a network of operators receives and processes continuously arriving discrete elements-is well-suited for these needs. Current and previous studies and frameworks have focused on continuity of operations and aggregate performance metrics. However, real-time performance and tail latency are also important. Timing errors caused by either performance or failed communication faults also affect real-time performance more drastically than aggregate metrics. In this paper, we introduce redundancy in the stream data to improve the real-time performance and resiliency to timing errors caused by either performance or failed communication faults. We also address limitations in previous solutions using a fine-grained acknowledgment tracking scheme to both increase the effectiveness for resiliency to performance faults and enable effectiveness for failed communication faults. Our results show that fine-grained acknowledgment schemes can improve the tail and mean latencies by approximately 30%. We also show that these schemes can improve resiliency to performance faults compared to existing work. Our improvements result in 47.4% to 92.9% fewer missed deadlines compared to 17.3% to 50.6% for comparable topologies and redundancy levels in the state of the art. Finally, we show that redundancies of 25% to 100% can reduce the number of data elements that miss their deadline constraints by 0.76% to 14.04% for applications with high fan-out and by 7.45% up to 50% for applications with no fan-out.
Techniques commonly used for analyzing streaming video, audio, SIGINT, and network transmissions, at less-than-streaming rates, such as data decimation and ad-hoc sampling, can miss underlying structure, trends and specific events held in the data[3]. This work presents a secure-by-construction approach [7] for the upper-end data streams with rates from 10- to 100 Gigabits per second. The secure-by-construction approach strives to produce system security through the composition of individually secure hardware and software components. The proposed network processor can be used not only at data centers but also within networks and onboard embedded systems at the network periphery for a wide range of tasks, including preprocessing and data cleansing, signal encoding and compression, complex event processing, flow analysis, and other tasks related to collecting and analyzing streaming data. Our design employs a four-layer scalable hardware/software stack that can lead to inherently secure, easily constructed specialized high-speed stream processing. This work addresses the following contemporary problems: (1) There is a lack of hardware/software systems providing stream processing and data stream analysis operating at the target data rates; for high-rate streams the implementation options are limited: all-software solutions can't attain the target rates[1]. GPUs and GPGPUs are also infeasible: they were not designed for I/O at 10-100Gbps; they also have asymmetric resources for input and output and thus cannot be pipelined[4, 2], whereas custom chip-based solutions are costly and inflexible to changes, and FPGA-based solutions are historically hard to program[6]; (2) There is a distinct advantage to utilizing high-bandwidth or line-speed analytics to reduce time-to-discovery of information, particularly ones that can be pipelined together to conduct a series of processing tasks or data tests without impeding data rates; (3) There is potentially significant network infrastructure cost savings possible from compact and power-efficient analytic support deployed at the network periphery on the data source or one hop away; (4) There is a need for agile deployment in response to changing objectives; (5) There is an opportunity to constrain designs to use only secure components to achieve their specific objectives. We address these five problems in our stream processor design to provide secure, easily specified processing for low-latency, low-power 10-100Gbps in-line processing on top of a commodity high-end FPGA-based hardware accelerator network processor. With a standard interface a user can snap together various filter blocks, like Legos™, to form a custom processing chain. The overall design is a four-layer solution in which the structurally lowest layer provides the vast computational power to process line-speed streaming packets, and the uppermost layer provides the agility to easily shape the system to the properties of a given application. Current work has focused on design of the two lowest layers, highlighted in the design detail in Figure 1. The two layers shown in Figure 1 are the embeddable portion of the design; these layers, operating at up to 100Gbps, capture both the low- and high frequency components of a signal or stream, analyze them directly, and pass the lower frequency components, residues to the all-software upper layers, Layers 3 and 4; they also optionally supply the data-reduced output up to Layers 3 and 4 for additional processing. Layer 1 is analogous to a systolic array of processors on which simple low-level functions or actions are chained in series[5]. Examples of tasks accomplished at the lowest layer are: (a) check to see if Field 3 of the packet is greater than 5, or (b) count the number of X.75 packets, or (c) select individual fields from data packets. Layer 1 provides the lowest latency, highest throughput processing, analysis and data reduction, formulating raw facts from the stream; Layer 2, also accelerated in hardware and running at full network line rate, combines selected facts from Layer 1, forming a first level of information kernels. Layer 2 is comprised of a number of combiners intended to integrate facts extracted from Layer 1 for presentation to Layer 3. Still resident in FPGA hardware and hardware-accelerated, a Layer 2 combiner is comprised of state logic and soft-core microprocessors. Layer 3 runs in software on a host machine, and is essentially the bridge to the embeddable hardware; this layer exposes an API for the consumption of information kernels to create events and manage state. The generated events and state are also made available to an additional software Layer 4, supplying an interface to traditional software-based systems. As shown in the design detail, network data transitions systolically through Layer 1, through a series of light-weight processing filters that extract and/or modify packet contents. All filters have a similar interface: streams enter from the left, exit the right, and relevant facts are passed upward to Layer 2. The output of the end of the chain in Layer 1 shown in the Figure 1 can be (a) left unconnected (for purely monitoring activities), (b) redirected into the network (for bent pipe operations), or (c) passed to another identical processor, for extended processing on a given stream (scalability).