Visible to the public Biblio

Filters: Keyword is multicore  [Clear All Filters]
2022-02-22
Ordouie, Navid, Soundararajan, Nirmala, Karne, Ramesh, Wijesinha, Alexander L..  2021.  Developing Computer Applications without any OS or Kernel in a Multi-core Architecture. 2021 International Symposium on Networks, Computers and Communications (ISNCC). :1—8.
Over the years, operating systems (OSs) have grown significantly in complexity and size providing attackers with more avenues to compromise their security. By eliminating the OS, it becomes possible to develop general-purpose non-embedded applications that are free of typical OS-related vulnerabilities. Such applications are simpler and smaller in size, making it easier secure the application code. Bare machine computing (BMC) applications run on ordinary desktops and laptops without the support of any operating system or centralized kernel. Many BMC applications have been developed previously for single-core systems. We show how to build BMC applications for multicore systems by presenting the design and implementation of a novel UDP-based bare machine prototype Web server for a multicore architecture. We also include preliminary experimental results from running the server on the Internet. This work provides a foundation for building secure computer applications that run on multicore systems without the need for intermediary software.
Philomina, Josna.  2021.  A Study on the Effect of Hardware Trojans in the Performance of Network on Chip Architectures. 2021 8th International Conference on Smart Computing and Communications (ICSCC). :314—318.
Network on chip (NoC) is the communication infrastructure used in multicores which has been subject to a surfeit of security threats like degrading the system performance, changing the system functionality or leaking sensitive information. Because of the globalization of the advanced semiconductor industry, many third-party venders take part in the hardware design of system. As a result, a malicious circuit, called Hardware Trojans (HT) can be added anywhere into the NoC design and thus making the hardware untrusted. In this paper, a detailed study on the taxonomy of hardware trojans, its detection and prevention mechanisms are presented. Two case studies on HT-assisted Denial of service attacks and its analysis in the performance of network on Chip architecture is also presented in this paper.
2021-09-30
Gava, Jonas, Reis, Ricardo, Ost, Luciano.  2020.  RAT: A Lightweight System-Level Soft Error Mitigation Technique. 2020 IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SOC). :165–170.
To achieve a substantial reliability and safety level, it is imperative to provide electronic computing systems with appropriate mechanisms to tackle soft errors. This paper proposes a low-cost system-level soft error mitigation technique, which allocates the critical application function to a pool of specific general-purpose processor registers. Both the critical function and the register pool are automatically selected by a developed profiling tool. The proposed technique was validated through more than 320K fault injections considering a Linux kernel, different benchmarks and two multicore ARM processors. Results show that our technique significantly reduces the code size and performance overheads while providing reliability improvement, w.r.t. the Triple Modular Redundancy (TMR) technique.
Gautam, Savita, Umar, M. Sarosh, Samad, Abdus.  2020.  Multi-Fold Scheduling Algorithm for Multi-Core Multi-Processor Systems. 2020 5th International Conference on Computing, Communication and Security (ICCCS). :1–5.
Adapting parallel scheduling function in the design of multi-scheduling algorithm results significant impact in the operation of high performance parallel systems. The various methods of parallelizing scheduling functions are widely applied in traditional multiprocessor systems. In this paper a novel algorithm is introduced which works not only for parallel execution of jobs but also focuses the parallelization of scheduling function. It gives attention on reducing the execution time, minimizing the load balance performance by selecting the volume of tasks for migration in terms of packets. Jobs are grouped into packets consisting of 2n jobs which are scheduled in parallel. Thus, an enhancement in the scheduling mechanism by packet formation is made to carry out high utilization of underlying architecture with increased throughput. The proposed method is assessed on a desktop computer equipped with multi-core processors in cube based multiprocessor systems. The algorithm is implemented with different configuration of multi-core systems. The simulation results indicate that the proposed technique reduces the overall makespan of execution with an improved performance of the system.
2020-11-16
Huyck, P..  2019.  Safe and Secure Data Fusion — Use of MILS Multicore Architecture to Reduce Cyber Threats. 2019 IEEE/AIAA 38th Digital Avionics Systems Conference (DASC). :1–9.
Data fusion, as a means to improve aircraft and air traffic safety, is a recent focus of some researchers and system developers. Increases in data volume and processing needs necessitate more powerful hardware and more flexible software architectures to satisfy these needs. Such improvements in processed data also mean the overall system becomes more complex and correspondingly, resulting in a potentially significantly larger cyber-attack space. Today's multicore processors are one means of satisfying the increased computational needs of data fusion-based systems. When coupled with a real-time operating system (RTOS) capable of flexible core and application scheduling, large cabinets of (power hungry) single-core processors may be avoided. The functional and assurance capabilities of such an RTOS can be critical elements in providing application isolation, constrained data flows, and restricted hardware access (including covert channel prevention) necessary to reduce the overall cyber-attack space. This paper examines fundamental considerations of a multiple independent levels of security (MILS) architecture when supported by a multicore-based real-time operating system. The paper draws upon assurance activities and functional properties associated with a previous Common Criteria evaluation assurance level (EAL) 6+ / High-Robustness Separation Kernel certification effort and contrast those with activities performed as part of a MILS multicore related project. The paper discusses key characteristics and functional capabilities necessary to achieve overall system security and safety. The paper defines architectural considerations essential for scheduling applications on a multicore processor to reduce security risks. For civil aircraft systems, the paper discusses the applicability of the security assurance and architecture configurations to system providers looking to increase their resilience to cyber threats.
2018-09-28
Qayum, Mohammad A., Badawy, Abdel-Hameed A., Cook, Jeanine.  2017.  DyAdHyTM: A Low Overhead Dynamically Adaptive Hybrid Transactional Memory with Application to Large Graphs. Proceedings of the International Symposium on Memory Systems. :327–336.
Big data is a buzzword used to describe massive volumes of data that provides opportunities of exploring new insights through data analytics. However, big data is mostly structured but can be semi-structured or unstructured. It is normally so large that it is not only difficult but also slow to process using traditional computing systems. One of the solutions is to format the data as graph data structures and process them on shared memory architecture to use fast and novel policies such as transactional memory. In most graph applications in big data type problems such as bioinformatics, social networks, and cybersecurity, graphs are sparse in nature. Due to this sparsity, we have the opportunity to use Transactional Memory (TM) as the synchronization policy for critical sections to speedup applications. At low conflict probability TM performs better than most synchronization policies due to its inherent non-blocking characteristics. TM can be implemented in Software, Hardware or a combination of both. However, hardware TM implementations are fast but limited by scarce hardware resources while software implementations have high overheads which can degrade performance. In this paper, we develop a low overhead, yet simple, dynamically adaptive (i.e., at runtime) hybrid (i.e., combines hardware and software) TM (DyAd-HyTM) scheme that combines the best features of both Hardware TM (HTM) and Software TM (STM) while adapting to application's requirements. It performs better than coarse-grain lock by up to 8.12x, a low overhead STM by up to 2.68x, a couple of implementations of HTMs (by up to 2.59x), and other HyTMs (by up to 1.55x) for SSCA-2 graph benchmark running on a multicore machine with a large shared memory.
2017-05-18
Park, Jungho, Jung, Wookeun, Jo, Gangwon, Lee, Ilkoo, Lee, Jaejin.  2016.  PIPSEA: A Practical IPsec Gateway on Embedded APUs. Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. :1255–1267.

Accelerated Processing Unit (APU) is a heterogeneous multicore processor that contains general-purpose CPU cores and a GPU in a single chip. It also supports Heterogeneous System Architecture (HSA) that provides coherent physically-shared memory between the CPU and the GPU. In this paper, we present the design and implementation of a high-performance IPsec gateway using a low-cost commodity embedded APU. The HSA supported by the APUs eliminates the data copy overhead between the CPU and the GPU, which is unavoidable in the previous discrete GPU approaches. The gateway is implemented in OpenCL to exploit the GPU and uses zero-copy packet I/O APIs in DPDK. The IPsec gateway handles the real-world network traffic where each packet has a different workload. The proposed packet scheduling algorithm significantly improves GPU utilization for such traffic. It works not only for APUs but also for discrete GPUs. With three CPU cores and one GPU in the APU, the IPsec gateway achieves a throughput of 10.36 Gbps with an average latency of 2.79 ms to perform AES-CBC+HMAC-SHA1 for incoming packets of 1024 bytes.

Brookes, Scott, Taylor, Stephen.  2016.  Rethinking Operating System Design: Asymmetric Multiprocessing for Security and Performance. Proceedings of the 2016 New Security Paradigms Workshop. :68–79.

Developers and academics are constantly seeking to increase the speed and security of operating systems. Unfortunately, an increase in either one often comes at the cost of the other. In this paper, we present an operating system design that challenges a long-held tenet of multicore operating systems in order to produce an alternative architecture that has the potential to deliver both increased security and faster performance. In particular, we propose decoupling the operating system kernel from user processes by running each on completely separate processor cores instead of at different privilege levels within shared cores. Without using the hardware's privilege modes, virtualization and virtual memory contexts enforce the security policies necessary to maintain process isolation and protection. Our new kernel design paradigm offers the opportunity to simultaneously increase both performance and security; utilizing the hardware facilities for inter-core communication in place of those for privilege mode switching offers the opportunity for increased system call performance, while the hard separation between user processes and the kernel provides several strong security properties.

Chachmon, Nadav, Richins, Daniel, Cohn, Robert, Christensson, Magnus, Cui, Wenzhi, Reddi, Vijay Janapa.  2016.  Simulation and Analysis Engine for Scale-Out Workloads. Proceedings of the 2016 International Conference on Supercomputing. :22:1–22:13.

We introduce a system-level Simulation and Analysis Engine (SAE) framework based on dynamic binary instrumentation for fine-grained and customizable instruction-level introspection of everything that executes on the processor. SAE can instrument the BIOS, kernel, drivers, and user processes. It can also instrument multiple systems simultaneously using a single instrumentation interface, which is essential for studying scale-out applications. SAE is an x86 instruction set simulator designed specifically to enable rapid prototyping, evaluation, and validation of architectural extensions and program analysis tools using its flexible APIs. It is fast enough to execute full platform workloads–-a modern operating system can boot in a few minutes–-thus enabling research, evaluation, and validation of complex functionalities related to multicore configurations, virtualization, security, and more. To reach high speeds, SAE couples tightly with a virtual platform and employs both a just-in-time (JIT) compiler that helps simulate simple instructions efficiently and a fast interpreter for simulating new or complex instructions. We describe SAE's architecture and instrumentation engine design and show the framework's usefulness for single- and multi-system architectural and program analysis studies.

Bartolini, Davide B., Miedl, Philipp, Thiele, Lothar.  2016.  On the Capacity of Thermal Covert Channels in Multicores. Proceedings of the Eleventh European Conference on Computer Systems. :24:1–24:16.

Modern multicore processors feature easily accessible temperature sensors that provide useful information for dynamic thermal management. These sensors were recently shown to be a potential security threat, since otherwise isolated applications can exploit them to establish a thermal covert channel and leak restricted information. Previous research showed experiments that document the feasibility of (low-rate) communication over this channel, but did not further analyze its fundamental characteristics. For this reason, the important questions of quantifying the channel capacity and achievable rates remain unanswered. To address these questions, we devise and exploit a new methodology that leverages both theoretical results from information theory and experimental data to study these thermal covert channels on modern multicores. We use spectral techniques to analyze data from two representative platforms and estimate the capacity of the channels from a source application to temperature sensors on the same or different cores. We estimate the capacity to be in the order of 300 bits per second (bps) for the same-core channel, i.e., when reading the temperature on the same core where the source application runs, and in the order of 50 bps for the 1-hop channel, i.e., when reading the temperature of the core physically next to the one where the source application runs. Moreover, we show a communication scheme that achieves rates of more than 45 bps on the same-core channel and more than 5 bps on the 1-hop channel, with less than 1% error probability. The highest rate shown in previous work was 1.33 bps on the 1-hop channel with 11% error probability.

Wang, Xiao, Sabne, Amit, Kisner, Sherman, Raghunathan, Anand, Bouman, Charles, Midkiff, Samuel.  2016.  High Performance Model Based Image Reconstruction. Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. :2:1–2:12.

Computed Tomography (CT) Image Reconstruction is an important technique used in a wide range of applications, ranging from explosive detection, medical imaging to scientific imaging. Among available reconstruction methods, Model Based Iterative Reconstruction (MBIR) produces higher quality images and allows for the use of more general CT scanner geometries than is possible with more commonly used methods. The high computational cost of MBIR, however, often makes it impractical in applications for which it would otherwise be ideal. This paper describes a new MBIR implementation that significantly reduces the computational cost of MBIR while retaining its benefits. It describes a novel organization of the scanner data into super-voxels (SV) that, combined with a super-voxel buffer (SVB), dramatically increase locality and prefetching, enable parallelism across SVs and lead to an average speedup of 187 on 20 cores.