Biblio

Filters: Keyword is multithreading  [Clear All Filters]
2023-05-12
Zhang, Tong, Cui, Xiangjie, Wang, Yichuan, Du, Yanning, Gao, Wen.  2022.  TCS Security Analysis in Intel SGX Enclave MultiThreading. 2022 International Conference on Networking and Network Applications (NaNA). :276–281.

With the rapid development of Internet Technology in recent years, the demand for security support for complex applications is becoming stronger and stronger. Intel Software Guard Extensions (Intel SGX) is created as an extension of Intel Systems to enhance software security. Intel SGX allows application developers to create so-called enclave. Sensitive application code and data are encapsulated in Trusted Execution Environment (TEE) by enclave. TEE is completely isolated from other applications, operating systems, and administrative programs. Enclave is the core structure of Intel SGX Technology. Enclave supports multi-threading. Thread Control Structure (TCS) stores special information for restoring enclave threads when entering or exiting enclave. Each execution thread in enclave is associated with a TCS. This paper analyzes and verifies the possible security risks of enclave under concurrent conditions. It is found that in the case of multithread concurrency, a single enclave cannot resist flooding attacks, and related threads also throw TCS exception codes.

2022-11-08
Shomron, Gil, Weiser, Uri.  2020.  Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks. 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). :256–269.
Deep neural networks (DNNs) are known for their inability to utilize underlying hardware resources due to hard-ware susceptibility to sparse activations and weights. Even in finer granularities, many of the non-zero values hold a portion of zero-valued bits that may cause inefficiencies when executed on hard-ware. Inspired by conventional CPU simultaneous multithreading (SMT) that increases computer resource utilization by sharing them across several threads, we propose non-blocking SMT (NB-SMT) designated for DNN accelerators. Like conventional SMT, NB-SMT shares hardware resources among several execution flows. Yet, unlike SMT, NB-SMT is non-blocking, as it handles structural hazards by exploiting the algorithmic resiliency of DNNs. Instead of opportunistically dispatching instructions while they wait in a reservation station for available hardware, NB-SMT temporarily reduces the computation precision to accommodate all threads at once, enabling a non-blocking operation. We demonstrate NB-SMT applicability using SySMT, an NB-SMT-enabled output-stationary systolic array (OS-SA). Compared with a conventional OS-SA, a 2-threaded SySMT consumes 1.4× the area and delivers 2× speedup with 33% energy savings and less than 1% accuracy degradation of state-of-the-art CNNs with ImageNet. A 4-threaded SySMT consumes 2.5× the area and delivers, for example, 3.4× speedup and 39%×energy savings with 1% accuracy degradation of 40%-pruned ResNet-18.
2020-03-16
Kholidy, Hisham A..  2019.  Towards A Scalable Symmetric Key Cryptographic Scheme: Performance Evaluation and Security Analysis. 2019 2nd International Conference on Computer Applications Information Security (ICCAIS). :1–6.
In most applications, security attributes are pretty difficult to meet but it becomes even a bigger challenge when talking about Grid Computing. To secure data passes in Grid Systems, we need a professional scheme that does not affect the overall performance of the grid system. Therefore, we previously developed a new security scheme “ULTRA GRIDSEC” that is used to accelerate the performance of the symmetric key encryption algorithms for both stream and block cipher encryption algorithms. The scheme is used to accelerate the security of data pass between elements of our newly developed pure peer-to-peer desktop grid framework, “HIMAN”. It also enhances the security of the encrypted data resulted from the scheme and prevents the problem of weak keys of the encryption algorithms. This paper covers the analysis and evaluation of this scheme showing the different factors affecting the scheme performance, and covers the efficiency of the scheme from the security prospective. The experimental results are highlighted for two types of encryption algorithms, TDES as an example for the block cipher algorithms, and RC4 as an example for the stream cipher algorithms. The scheme speeds up the former algorithm by 202.12% and the latter one by 439.7%. These accelerations are also based on the running machine's capabilities.
2020-06-08
Tang, Deyou, Zhang, Yazhuo, Zeng, Qingmiao.  2019.  Optimization of Hardware-oblivious and Hardware-conscious Hash-join Algorithms on KNL. 2019 4th International Conference on Cloud Computing and Internet of Things (CCIOT). :24–28.
Investigation of hash join algorithm on multi-core and many-core platforms showed that carefully tuned hash join implementations could outperform simple hash joins on most multi-core servers. However, hardware-oblivious hash join has shown competitive performance on many-core platforms. Knights Landing (KNL) has received attention in the field of parallel computing for its massively data-parallel nature and high memory bandwidth, but both hardware-oblivious and hardware-conscious hash join algorithms have not been systematically discussed and evaluated for KNL's characteristics (high bandwidth, cluster mode, etc.). In this paper, we present the design and implementation of the state-of-the-art hardware-oblivious and hardware-conscious hash joins that are tuned to exploit various KNL hardware characteristics. Using a thorough evaluation, we show that:1) Memory allocation strategies based on KNL's architecture are effective for both hardware-oblivious and hardware-conscious hash join algorithms; 2) In order to improve the efficiency of the hash join algorithms, hardware architecture features are still non-negligible factors.
2015-01-12
Yu, Tingting, Srisa-an, Witawas, Rothermel, Gregg.  2014.  SimRT: An Automated Framework to Support Regression Testing for Data Races. International Conference on Software Engineering (ICSE) 2014, .

Concurrent programs are prone to various classes of difficult-to- detect faults, of which data races are particularly prevalent. Prior work has attempted to increase the cost-effectiveness of approaches for testing for data races by employing race detection techniques, but to date, no work has considered cost-effective approaches for re-testing for races as programs evolve. In this paper we present SIMRT, an automated regression testing framework for use in de- tecting races introduced by code modifications. SIMRT employs a regression test selection technique, focused on sets of program ele- ments related to race detection, to reduce the number of test cases that must be run on a changed program to detect races that occur due to code modifications, and it employs a test case prioritiza- tion technique to improve the rate at which such races are detected. Our empirical study of SIMRT reveals that it is more efficient and effective for revealing races than other approaches, and that its con- stituent test selection and prioritization components each contribute to its performance.

2015-05-04
Gimenez, A., Gamblin, T., Rountree, B., Bhatele, A., Jusufi, I., Bremer, P.-T., Hamann, B..  2014.  Dissecting On-Node Memory Access Performance: A Semantic Approach. High Performance Computing, Networking, Storage and Analysis, SC14: International Conference for. :166-176.

Optimizing memory access is critical for performance and power efficiency. CPU manufacturers have developed sampling-based performance measurement units (PMUs) that report precise costs of memory accesses at specific addresses. However, this data is too low-level to be meaningfully interpreted and contains an excessive amount of irrelevant or uninteresting information. We have developed a method to gather fine-grained memory access performance data for specific data objects and regions of code with low overhead and attribute semantic information to the sampled memory accesses. This information provides the context necessary to more effectively interpret the data. We have developed a tool that performs this sampling and attribution and used the tool to discover and diagnose performance problems in real-world applications. Our techniques provide useful insight into the memory behaviour of applications and allow programmers to understand the performance ramifications of key design decisions: domain decomposition, multi-threading, and data motion within distributed memory systems.