Biblio

Filters: Keyword is parallel processing  [Clear All Filters]
2020-05-22
Sheth, Utsav, Dutta, Sanghamitra, Chaudhari, Malhar, Jeong, Haewon, Yang, Yaoqing, Kohonen, Jukka, Roos, Teemu, Grover, Pulkit.  2018.  An Application of Storage-Optimal MatDot Codes for Coded Matrix Multiplication: Fast k-Nearest Neighbors Estimation. 2018 IEEE International Conference on Big Data (Big Data). :1113—1120.
We propose a novel application of coded computing to the problem of the nearest neighbor estimation using MatDot Codes (Fahim et al., Allerton'17) that are known to be optimal for matrix multiplication in terms of recovery threshold under storage constraints. In approximate nearest neighbor algorithms, it is common to construct efficient in-memory indexes to improve query response time. One such strategy is Multiple Random Projection Trees (MRPT), which reduces the set of candidate points over which Euclidean distance calculations are performed. However, this may result in a high memory footprint and possibly paging penalties for large or high-dimensional data. Here we propose two techniques to parallelize MRPT that exploit data and model parallelism respectively by dividing both the data storage and the computation efforts among different nodes in a distributed computing cluster. This is especially critical when a single compute node cannot hold the complete dataset in memory. We also propose a novel coded computation strategy based on MatDot codes for the model-parallel architecture that, in a straggler-prone environment, achieves the storage-optimal recovery threshold, i.e., the number of nodes that are required to serve a query. We experimentally demonstrate that, in the absence of straggling, our distributed approaches require less query time than execution on a single processing node, providing near-linear speedups with respect to the number of worker nodes. Our experiments on real systems with simulated straggling, we also show that in a straggler-prone environment, our strategy achieves a faster query execution than the uncoded strategy.
2020-12-01
Kalyanaraman, A., Halappanavar, M..  2018.  Guest Editorial: Advances in Parallel Graph Processing: Algorithms, Architectures, and Application Frameworks. IEEE Transactions on Multi-Scale Computing Systems. 4:188—189.

The papers in this special section explore recent advancements in parallel graph processing. In the sphere of modern data science and data-driven applications, graph algorithms have achieved a pivotal place in advancing the state of scientific discovery and knowledge. Nearly three centuries of ideas have made graph theory and its applications a mature area in computational sciences. Yet, today we find ourselves at a crossroads between theory and application. Spurred by the digital revolution, data from a diverse range of high throughput channels and devices, from across internet-scale applications, are starting to mark a new era in data-driven computing and discovery. Building robust graph models and implementing scalable graph application frameworks in the context of this new era are proving to be significant challenges. Concomitant to the digital revolution, we have also experienced an explosion in computing architectures, with a broad range of multicores, manycores, heterogeneous platforms, and hardware accelerators (CPUs, GPUs) being actively developed and deployed within servers and multinode clusters. Recent advances have started to show that in more than one way, these two fields—graph theory and architectures–are capable of benefiting and in fact spurring new research directions in one another. This special section is aimed at introducing some of the new avenues of cutting-edge research happening at the intersection of graph algorithm design and their implementation on advanced parallel architectures.

2017-12-20
Li, S., Wang, B..  2017.  A Method for Hybrid Bayesian Network Structure Learning from Massive Data Using MapReduce. 2017 ieee 3rd international conference on big data security on cloud (bigdatasecurity), ieee international conference on high performance and smart computing (hpsc), and ieee international conference on intelligent data and security (ids). :272–276.
Bayesian Network is the popular and important data mining model for representing uncertain knowledge. For large scale data it is often too costly to learn the accurate structure. To resolve this problem, much work has been done on migrating the structure learning algorithms to the MapReduce framework. In this paper, we introduce a distributed hybrid structure learning algorithm by combining the advantages of constraint-based and score-and-search-based algorithms. By reusing the intermediate results of MapReduce, the algorithm greatly simplified the computing work and got good results in both efficiency and accuracy.
2018-02-06
Badii, A., Faulkner, R., Raval, R., Glackin, C., Chollet, G..  2017.  Accelerated Encryption Algorithms for Secure Storage and Processing in the Cloud. 2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP). :1–6.

The objective of this paper is to outline the design specification, implementation and evaluation of a proposed accelerated encryption framework which deploys both homomorphic and symmetric-key encryptions to serve the privacy preserving processing; in particular, as a sub-system within the Privacy Preserving Speech Processing framework architecture as part of the PPSP-in-Cloud Platform. Following a preliminary study of GPU efficiency gains optimisations benchmarked for AES implementation we have addressed and resolved the Big Integer processing challenges in parallel implementation of bilinear pairing thus enabling the creation of partially homomorphic encryption schemes which facilitates applications such as speech processing in the encrypted domain on the cloud. This novel implementation has been validated in laboratory tests using a standard speech corpus and can be used for other application domains to support secure computation and privacy preserving big data storage/processing in the cloud.

Kebande, V. R., Karie, N. M., Venter, H. S..  2017.  Cloud-Centric Framework for Isolating Big Data as Forensic Evidence from IoT Infrastructures. 2017 1st International Conference on Next Generation Computing Applications (NextComp). :54–60.

Cloud computing paradigm continues to revolutionize the way business processes are being conducted through the provision of massive resources, reliability across networks and ability to offer parallel processing. However, miniaturization, proliferation and nanotechnology within devices has enabled digitization of almost every object which eventually has seen the rise of a new technological marvel dubbed Internet of Things (IoT). IoT enables self-configurable/smart devices to connect intelligently through Radio Frequency Identification (RFID), WI-FI, LAN, GPRS and other methods by further enabling timeously processing of information. Based on these developments, the integration of the cloud and IoT infrastructures has led to an explosion of the amount of data being exchanged between devices which have in turn enabled malicious actors to use this as a platform to launch various cybercrime activities. Consequently, digital forensics provides a significant approach that can be used to provide an effective post-event response mechanism to these malicious attacks in cloud-based IoT infrastructures. Therefore, the problem being addressed is that, at the time of writing this paper, there still exist no accepted standards or frameworks for conducting digital forensic investigation on cloud-based IoT infrastructures. As a result, the authors have proposed a cloud-centric framework that is able to isolate Big data as forensic evidence from IoT (CFIBD-IoT) infrastructures for proper analysis and examination. It is the authors' opinion that if the CFIBD-IoT framework is implemented fully it will support cloud-based IoT tool creation as well as support future investigative techniques in the cloud with a degree of certainty.

2017-12-04
Johnston, B., Lee, B., Angove, L., Rendell, A..  2017.  Embedded Accelerators for Scientific High-Performance Computing: An Energy Study of OpenCL Gaussian Elimination Workloads. 2017 46th International Conference on Parallel Processing Workshops (ICPPW). :59–68.

Energy efficient High-Performance Computing (HPC) is becoming increasingly important. Recent ventures into this space have introduced an unlikely candidate to achieve exascale scientific computing hardware with a small energy footprint. ARM processors and embedded GPU accelerators originally developed for energy efficiency in mobile devices, where battery life is critical, are being repurposed and deployed in the next generation of supercomputers. Unfortunately, the performance of executing scientific workloads on many of these devices is largely unknown, yet the bulk of computation required in high-performance supercomputers is scientific. We present an analysis of one such scientific code, in the form of Gaussian Elimination, and evaluate both execution time and energy used on a range of embedded accelerator SoCs. These include three ARM CPUs and two mobile GPUs. Understanding how these low power devices perform on scientific workloads will be critical in the selection of appropriate hardware for these supercomputers, for how can we estimate the performance of tens of thousands of these chips if the performance of one is largely unknown?

2018-02-21
Zhou, G., Feng, Y., Bo, R., Chien, L., Zhang, X., Lang, Y., Jia, Y., Chen, Z..  2017.  GPU-Accelerated Batch-ACPF Solution for N-1 Static Security Analysis. IEEE Transactions on Smart Grid. 8:1406–1416.

Graphics processing unit (GPU) has been applied successfully in many scientific computing realms due to its superior performances on float-pointing calculation and memory bandwidth, and has great potential in power system applications. The N-1 static security analysis (SSA) appears to be a candidate application in which massive alternating current power flow (ACPF) problems need to be solved. However, when applying existing GPU-accelerated algorithms to solve N-1 SSA problem, the degree of parallelism is limited because existing researches have been devoted to accelerating the solution of a single ACPF. This paper therefore proposes a GPU-accelerated solution that creates an additional layer of parallelism among batch ACPFs and consequently achieves a much higher level of overall parallelism. First, this paper establishes two basic principles for determining well-designed GPU algorithms, through which the limitation of GPU-accelerated sequential-ACPF solution is demonstrated. Next, being the first of its kind, this paper proposes a novel GPU-accelerated batch-QR solver, which packages massive number of QR tasks to formulate a new larger-scale problem and then achieves higher level of parallelism and better coalesced memory accesses. To further improve the efficiency of solving SSA, a GPU-accelerated batch-Jacobian-Matrix generating and contingency screening is developed and carefully optimized. Lastly, the complete process of the proposed GPU-accelerated batch-ACPF solution for SSA is presented. Case studies on an 8503-bus system show dramatic computation time reduction is achieved compared with all reported existing GPU-accelerated methods. In comparison to UMFPACK-library-based single-CPU counterpart using Intel Xeon E5-2620, the proposed GPU-accelerated SSA framework using NVIDIA K20C achieves up to 57.6 times speedup. It can even achieve four times speedup when compared to one of the fastest multi-core CPU parallel computing solution using KLU library. The prop- sed batch-solving method is practically very promising and lays a critical foundation for many other power system applications that need to deal with massive subtasks, such as Monte-Carlo simulation and probabilistic power flow.

Pak, W., Choi, Y. J..  2017.  High Performance and High Scalable Packet Classification Algorithm for Network Security Systems. IEEE Transactions on Dependable and Secure Computing. 14:37–49.

Packet classification is a core function in network and security systems; hence, hardware-based solutions, such as packet classification accelerator chips or Ternary Content Addressable Memory (T-CAM), have been widely adopted for high-performance systems. With the rapid improvement of general hardware architectures and growing popularity of multi-core multi-threaded processors, software-based packet classification algorithms are attracting considerable attention, owing to their high flexibility in satisfying various industrial requirements for security and network systems. For high classification speed, these algorithms internally use large tables, whose size increases exponentially with the ruleset size; consequently, they cannot be used with a large rulesets. To overcome this problem, we propose a new software-based packet classification algorithm that simultaneously supports high scalability and fast classification performance by merging partition decision trees in a search table. While most partitioning-based packet classification algorithms show good scalability at the cost of low classification speed, our algorithm shows very high classification speed, irrespective of the number of rules, with small tables and short table building time. Our test results confirm that the proposed algorithm enables network and security systems to support heavy traffic in the most effective manner.

2017-12-04
Hwang, T..  2017.  NSF GENI cloud enabled architecture for distributed scientific computing. 2017 IEEE Aerospace Conference. :1–8.

GENI (Global Environment for Network Innovations) is a National Science Foundation (NSF) funded program which provides a virtual laboratory for networking and distributed systems research and education. It is well suited for exploring networks at a scale, thereby promoting innovations in network science, security, services and applications. GENI allows researchers obtain compute resources from locations around the United States, connect compute resources using 100G Internet2 L2 service, install custom software or even custom operating systems on these compute resources, control how network switches in their experiment handle traffic flows, and run their own L3 and above protocols. GENI architecture incorporates cloud federation. With the federation, cloud resources can be federated and/or community of clouds can be formed. The heart of federation is user identity and an ability to “advertise” cloud resources into community including compute, storage, and networking. GENI administrators can carve out what resources are available to the community and hence a portion of GENI resources are reserved for internal consumption. GENI architecture also provides “stitching” of compute and storage resources researchers request. This provides L2 network domain over Internet2's 100G network. And researchers can run their Software Defined Networking (SDN) controllers on the provisioned L2 network domain for a complete control of networking traffic. This capability is useful for large science data transfer (bypassing security devices for high throughput). Renaissance Computing Institute (RENCI), a research institute in the state of North Carolina, has developed ORCA (Open Resource Control Architecture), a GENI control framework. ORCA is a distributed resource orchestration system to serve science experiments. ORCA provides compute resources as virtual machines and as well as baremetals. ORCA based GENI ra- k was designed to serve both High Throughput Computing (HTC) and High Performance Computing (HPC) type of computes. Although, GENI is primarily used in various universities and research entities today, GENI architecture can be leveraged in the commercial, aerospace and government settings. This paper will go over the architecture of GENI and discuss the GENI architecture for scientific computing experiments.

2018-09-12
Domínguez, A., Carballo, P. P., Núñez, A..  2017.  Programmable SoC platform for deep packet inspection using enhanced Boyer-Moore algorithm. 2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC). :1–8.

This paper describes the work done to design a SoC platform for real-time on-line pattern search in TCP packets for Deep Packet Inspection (DPI) applications. The platform is based on a Xilinx Zynq programmable SoC and includes an accelerator that implements a pattern search engine that extends the original Boyer-Moore algorithm with timing and logical rules, that produces a very complex set of rules. Also, the platform implements different modes of operation, including SIMD and MISD parallelism, which can be configured on-line. The platform is scalable depending of the analysis requirement up to 8 Gbps. High-Level synthesis and platform based design methodologies have been used to reduce the time to market of the completed system.

2018-06-07
Alazzawe, A., Kant, K..  2017.  Slice Swarms for HPC Application Resilience. 2017 Fifth International Symposium on Computing and Networking (CANDAR). :1–10.

Resilience in High Performance Computing (HPC) is a constraining factor for bringing applications to the upcoming exascale systems. Resilience techniques must be able to scale to handle the increasing number of expected errors in an energy efficient manner. Since the purpose of running applications on HPC systems is to perform large scale computations as quick as possible, resilience methods should not add a large delay to the time to completion of the application. In this paper we introduce a novel technique to detect and recover from transient errors in HPC applications. One of the features of our technique is that the energy budget allocated to resilience can be adjusted depending on the operator's resilience needs. For example, on synthetic data, the technique can detect about 50% of transient errors while only using 20% of the dynamic energy required for running the application. For a 60% energy budget, an application that uses 10k cores and takes 128 hours to run, will only require 10% longer to complete.

2017-12-28
Panetta, J., Filho, P. R. P. S., Laranjeira, L. A. F., Teixeira, C. A..  2017.  Scalability of CPU and GPU Solutions of the Prime Elliptic Curve Discrete Logarithm Problem. 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). :33–40.

Elliptic curve asymmetric cryptography has achieved increased popularity due to its capability of providing comparable levels of security as other existing cryptographic systems while requiring less computational work. Pollard Rho and Parallel Collision Search, the fastest known sequential and parallel algorithms for breaking this cryptographic system, have been successfully applied over time to break ever-increasing bit-length system instances using implementations heavily optimized for the available hardware. This work presents portable, general implementations of a Parallel Collision Search based solution for prime elliptic curve asymmetric cryptographic systems that use publicly available big integer libraries and make no assumption on prime curve properties. It investigates which bit-length keys can be broken in reasonable time by a user that has access to a state of the art, public HPC equipment with CPUs and GPUs. The final implementation breaks a 79-bit system in about two hours using 80 GPUs and 94-bits system in about 15 hours using 256 GPUs. Extensive experimentation investigates scalability of CPU, GPU and CPU+GPU runs. The discussed results indicate that speed-up is not a good metric for parallel scalability. This paper proposes and evaluates a new metric that is better suited for this task.

2018-02-21
Grgić, K., Kovačevic, Z., Čik, V. K..  2017.  Performance analysis of symmetric block cryptosystems on Android platform. 2017 International Conference on Smart Systems and Technologies (SST). :155–159.

The symmetric block ciphers, which represent a core element for building cryptographic communications systems and protocols, are used in providing message confidentiality, authentication and integrity. Various limitations in hardware and software resources, especially in terminal devices used in mobile communications, affect the selection of appropriate cryptosystem and its parameters. In this paper, an implementation of three symmetric ciphers (DES, 3DES, AES) used in different operating modes are analyzed on Android platform. The cryptosystems' performance is analyzed in different scenarios using several variable parameters: cipher, key size, plaintext size and number of threads. Also, the influence of parallelization supported by multi-core CPUs on cryptosystem performance is analyzed. Finally, some conclusions about the parameter selection for optimal efficiency are given.

2017-12-28
Rolinger, T. B., Simon, T. A., Krieger, C. D..  2017.  Performance challenges for heterogeneous distributed tensor decompositions. 2017 IEEE High Performance Extreme Computing Conference (HPEC). :1–7.

Tensor decompositions, which are factorizations of multi-dimensional arrays, are becoming increasingly important in large-scale data analytics. A popular tensor decomposition algorithm is Canonical Decomposition/Parallel Factorization using alternating least squares fitting (CP-ALS). Tensors that model real-world applications are often very large and sparse, driving the need for high performance implementations of decomposition algorithms, such as CP-ALS, that can take advantage of many types of compute resources. In this work we present ReFacTo, a heterogeneous distributed tensor decomposition implementation based on DeFacTo, an existing distributed memory approach to CP-ALS. DFacTo reduces the critical routine of CP-ALS to a series of sparse matrix-vector multiplications (SpMVs). ReFacTo leverages GPUs within a cluster via MPI to perform these SpMVs and uses OpenMP threads to parallelize other routines. We evaluate the performance of ReFacTo when using NVIDIA's GPU-based cuSPARSE library and compare it to an alternative implementation that uses Intel's CPU-based Math Kernel Library (MKL) for the SpMV. Furthermore, we provide a discussion of the performance challenges of heterogeneous distributed tensor decompositions based on the results we observed. We find that on up to 32 nodes, the SpMV of ReFacTo when using MKL is up to 6.8× faster than ReFacTo when using cuSPARSE.

Luo, S., Wang, Y., Huang, W., Yu, H..  2016.  Backup and Disaster Recovery System for HDFS. 2016 International Conference on Information Science and Security (ICISS). :1–4.

HDFS has been widely used for storing massive scale data which is vulnerable to site disaster. The file system backup is an important strategy for data retention. In this paper, we present an efficient, easy- to-use Backup and Disaster Recovery System for HDFS. The system includes a client based on HDFS with additional feature of remote backup, and a remote server with a HDFS cluster to keep the backup data. It supports full backup and regularly incremental backup to the server with very low cost and high throughout. In our experiment, the average speed of backup and recovery is up to 95 MB/s, approaching the theoretical maximum speed of gigabit Ethernet.

2017-11-27
Yi, Su-Wen, Li, Wei, Dai, Zi-Bin, Liu, Jun-Wei.  2016.  A compact and efficient architecture for elliptic curve cryptographic processor. 2016 13th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT). :1276–1280.

In this paper, a dual-field elliptic curve cryptographic processor is proposed to support arbitrary curves within 576-bit in dual field. Besides, two heterogeneous function units are coupled with the processor for the parallel operations in finite field based on the analysis of the characteristics of elliptic curve cryptographic algorithms. To simplify the hardware complexity, the clustering technology is adopted in the processor. At last, a fast Montgomery modular division algorithm and its implementation is proposed based on the Kaliski's Montgomery modular inversion. Using UMC 90-nm CMOS 1P9M technology, the proposed processor occupied 0.86-mm2 can perform the scalar multiplication in 0.34ms in GF(p160) and 0.22ms in GF(2160), respectively. Compared to other elliptic curve cryptographic processors, our design is advantageous in hardware efficiency and speed moderation.

2017-04-20
Nikolenko, S. I., Kogan, K., Rétvári, G., Bérczi-Kovács, E. R., Shalimov, A..  2016.  How to represent IPv6 forwarding tables on IPv4 or MPLS dataplanes. 2016 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). :521–526.

The Internet routing ecosystem is facing substantial scalability challenges on the data plane. Various “clean slate” architectures for representing forwarding tables (FIBs), such as IPv6, introduce additional constraints on efficient implementations from both lookup time and memory footprint perspectives due to significant classification width. In this work, we propose an abstraction layer able to represent IPv6 FIBs on existing IP and even MPLS infrastructure. Feasibility of the proposed representations is confirmed by an extensive simulation study on real IPv6 forwarding tables, including low-level experimental performance evaluation.

2016-05-04
Xianqing Yu, P. Ning, M. A. Vouk.  2015.  Enhancing security of Hadoop in a public cloud. Information and Communication Systems (ICICS), 2015 6th International Conference on. :38-43.

Hadoop has become increasingly popular as it rapidly processes data in parallel. Cloud computing gives reliability, flexibility, scalability, elasticity and cost saving to cloud users. Deploying Hadoop in cloud can benefit Hadoop users. Our evaluation exhibits that various internal cloud attacks can bypass current Hadoop security mechanisms, and compromised Hadoop components can be used to threaten overall Hadoop. It is urgent to improve compromise resilience, Hadoop can maintain a relative high security level when parts of Hadoop are compromised. Hadoop has two vulnerabilities that can dramatically impact its compromise resilience. The vulnerabilities are the overloaded authentication key, and the lack of fine-grained access control at the data access level. We developed a security enhancement for a public cloud-based Hadoop, named SEHadoop, to improve the compromise resilience through enhancing isolation among Hadoop components and enforcing least access privilege for Hadoop processes. We have implemented the SEHadoop model, and demonstrated that SEHadoop fixes the above vulnerabilities with minimal or no run-time overhead, and effectively resists related attacks.

2017-03-07
Kim, J., Moon, I., Lee, K., Suh, S. C., Kim, I..  2015.  Scalable Security Event Aggregation for Situation Analysis. 2015 IEEE First International Conference on Big Data Computing Service and Applications. :14–23.

Cyber-attacks have been evolved in a way to be more sophisticated by employing combinations of attack methodologies with greater impacts. For instance, Advanced Persistent Threats (APTs) employ a set of stealthy hacking processes running over a long period of time, making it much hard to detect. With this trend, the importance of big-data security analytics has taken greater attention since identifying such latest attacks requires large-scale data processing and analysis. In this paper, we present SEAS-MR (Security Event Aggregation System over MapReduce) that facilitates scalable security event aggregation for comprehensive situation analysis. The introduced system provides the following three core functions: (i) periodic aggregation, (ii) on-demand aggregation, and (iii) query support for effective analysis. We describe our design and implementation of the system over MapReduce and high-level query languages, and report our experimental results collected through extensive settings on a Hadoop cluster for performance evaluation and design impacts.

2017-02-14
A. Motamedi, M. Najafi, N. Erami.  2015.  "Parallel secure turbo code for security enhancement in physical layer". 2015 Signal Processing and Intelligent Systems Conference (SPIS). :179-184.

Turbo code has been one of the important subjects in coding theory since 1993. This code has low Bit Error Rate (BER) but decoding complexity and delay are big challenges. On the other hand, considering the complexity and delay of separate blocks for coding and encryption, if these processes are combined, the security and reliability of communication system are guaranteed. In this paper a secure decoding algorithm in parallel on General-Purpose Graphics Processing Units (GPGPU) is proposed. This is the first prototype of a fast and parallel Joint Channel-Security Coding (JCSC) system. Despite of encryption process, this algorithm maintains desired BER and increases decoding speed. We considered several techniques for parallelism: (1) distribute decoding load of a code word between multiple cores, (2) simultaneous decoding of several code words, (3) using protection techniques to prevent performance degradation. We also propose two kinds of optimizations to increase the decoding speed: (1) memory access improvement, (2) the use of new GPU properties such as concurrent kernel execution and advanced atomics to compensate buffering latency.

J. Kim, I. Moon, K. Lee, S. C. Suh, I. Kim.  2015.  "Scalable Security Event Aggregation for Situation Analysis". 2015 IEEE First International Conference on Big Data Computing Service and Applications. :14-23.

Cyber-attacks have been evolved in a way to be more sophisticated by employing combinations of attack methodologies with greater impacts. For instance, Advanced Persistent Threats (APTs) employ a set of stealthy hacking processes running over a long period of time, making it much hard to detect. With this trend, the importance of big-data security analytics has taken greater attention since identifying such latest attacks requires large-scale data processing and analysis. In this paper, we present SEAS-MR (Security Event Aggregation System over MapReduce) that facilitates scalable security event aggregation for comprehensive situation analysis. The introduced system provides the following three core functions: (i) periodic aggregation, (ii) on-demand aggregation, and (iii) query support for effective analysis. We describe our design and implementation of the system over MapReduce and high-level query languages, and report our experimental results collected through extensive settings on a Hadoop cluster for performance evaluation and design impacts.

2017-02-23
H. Ulusoy, M. Kantarcioglu, B. Thuraisingham, L. Khan.  2015.  "Honeypot based unauthorized data access detection in MapReduce systems". 2015 IEEE International Conference on Intelligence and Security Informatics (ISI). :126-131.

The data processing capabilities of MapReduce systems pioneered with the on-demand scalability of cloud computing have enabled the Big Data revolution. However, the data controllers/owners worried about the privacy and accountability impact of storing their data in the cloud infrastructures as the existing cloud computing solutions provide very limited control on the underlying systems. The intuitive approach - encrypting data before uploading to the cloud - is not applicable to MapReduce computation as the data analytics tasks are ad-hoc defined in the MapReduce environment using general programming languages (e.g, Java) and homomorphic encryption methods that can scale to big data do not exist. In this paper, we address the challenges of determining and detecting unauthorized access to data stored in MapReduce based cloud environments. To this end, we introduce alarm raising honeypots distributed over the data that are not accessed by the authorized MapReduce jobs, but only by the attackers and/or unauthorized users. Our analysis shows that unauthorized data accesses can be detected with reasonable performance in MapReduce based cloud environments.

2015-05-06
Bayat-sarmadi, S., Mozaffari-Kermani, M., Reyhani-Masoleh, A..  2014.  Efficient and Concurrent Reliable Realization of the Secure Cryptographic SHA-3 Algorithm. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on. 33:1105-1109.

The secure hash algorithm (SHA)-3 has been selected in 2012 and will be used to provide security to any application which requires hashing, pseudo-random number generation, and integrity checking. This algorithm has been selected based on various benchmarks such as security, performance, and complexity. In this paper, in order to provide reliable architectures for this algorithm, an efficient concurrent error detection scheme for the selected SHA-3 algorithm, i.e., Keccak, is proposed. To the best of our knowledge, effective countermeasures for potential reliability issues in the hardware implementations of this algorithm have not been presented to date. In proposing the error detection approach, our aim is to have acceptable complexity and performance overheads while maintaining high error coverage. In this regard, we present a low-complexity recomputing with rotated operands-based scheme which is a step-forward toward reducing the hardware overhead of the proposed error detection approach. Moreover, we perform injection-based fault simulations and show that the error coverage of close to 100% is derived. Furthermore, we have designed the proposed scheme and through ASIC analysis, it is shown that acceptable complexity and performance overheads are reached. By utilizing the proposed high-performance concurrent error detection scheme, more reliable and robust hardware implementations for the newly-standardized SHA-3 are realized.
 

2015-05-05
Campbell, S..  2014.  Open science, open security. High Performance Computing Simulation (HPCS), 2014 International Conference on. :584-587.

We propose that to address the growing problems with complexity and data volumes in HPC security wee need to refactor how we look at data by creating tools that not only select data, but analyze and represent it in a manner well suited for intuitive analysis. We propose a set of rules describing what this means, and provide a number of production quality tools that represent our current best effort in implementing these ideas.
 

2015-05-06
Voskuilen, G., Vijaykumar, T.N..  2014.  Fractal++: Closing the performance gap between fractal and conventional coherence. Computer Architecture (ISCA), 2014 ACM/IEEE 41st International Symposium on. :409-420.

Cache coherence protocol bugs can cause multicores to fail. Existing coherence verification approaches incur state explosion at small scales or require considerable human effort. As protocols' complexity and multicores' core counts increase, verification continues to be a challenge. Recently, researchers proposed fractal coherence which achieves scalable verification by enforcing observational equivalence between sub-systems in the coherence protocol. A larger sub-system is verified implicitly if a smaller sub-system has been verified. Unfortunately, fractal protocols suffer from two fundamental limitations: (1) indirect-communication: sub-systems cannot directly communicate and (2) partially-serial-invalidations: cores must be invalidated in a specific, serial order. These limitations disallow common performance optimizations used by conventional directory protocols: reply-forwarding where caches communicate directly and parallel invalidations. Therefore, fractal protocols lack performance scalability while directory protocols lack verification scalability. To enable both performance and verification scalability, we propose Fractal++ which employs a new class of protocol optimizations for verification-constrained architectures: decoupled-replies, contention-hints, and fully-parallel-fractal-invalidations. The first two optimizations allow reply-forwarding-like performance while the third optimization enables parallel invalidations in fractal protocols. Unlike conventional protocols, Fractal++ preserves observational equivalence and hence is scalably verifiable. In 32-core simulations of single- and four-socket systems, Fractal++ performs nearly as well as a directory protocol while providing scalable verifiability whereas the best-performing previous fractal protocol performs 8% on average and up to 26% worse with a single-socket and 12% on average and up to 34% worse with a longer-latency multi-socket system.