Visible to the public Biblio

Filters: Keyword is graphics processing units  [Clear All Filters]
2023-07-14
Lisičić, Marko, Mišić, Marko.  2022.  Software Tool for Parallel Generation of Cryptographic Keys Based on Elliptic Curves. 2022 30th Telecommunications Forum (℡FOR). :1–4.

Public key cryptography plays an important role in secure communications over insecure channels. Elliptic curve cryptography, as a variant of public key cryptography, has been extensively used in the last decades for such purposes. In this paper, we present a software tool for parallel generation of cryptographic keys based on elliptic curves. Binary method for point multiplication and C++ threads were used in parallel implementation, while secp256k1 elliptic curve was used for testing. Obtained results show speedup of 30% over the sequential solution for 8 threads. The results are briefly discussed in the paper.

Nguyen, Tuy Tan, Lee, Hanho.  2022.  Toward A Real-Time Elliptic Curve Cryptography-Based Facial Security System. 2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS). :364–367.
This paper presents a novel approach for a facial security system using elliptic curve cryptography. Face images extracted from input video are encrypted before sending to a remote server. The input face images are completely encrypted by mapping each pixel value of the detected face from the input video frame to a point on an elliptic curve. The original image can be recovered when needed using the elliptic curve cryptography decryption function. Specifically, we modify point multiplication designed for projective coordinates and apply the modified approach in affine coordinates to speed up scalar point multiplication operation. Image encryption and decryption operations are also facilitated using our existing scheme. Simulation results on Visual Studio demonstrate that the proposed systems help accelerate encryption and decryption operations while maintaining information confidentiality.
2022-09-20
Chandramouli, Athreya, Jana, Sayantan, Kothapalli, Kishore.  2021.  Efficient Parallel Algorithms for Computing Percolation Centrality. 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC). :111—120.
Centrality measures on graphs have found applications in a large number of domains including modeling the spread of an infection/disease, social network analysis, and transportation networks. As a result, parallel algorithms for computing various centrality metrics on graphs are gaining significant research attention in recent years. In this paper, we study parallel algorithms for the percolation centrality measure which extends the betweenness-centrality measure by incorporating a time dependent state variable with every node. We present parallel algorithms that compute the source-based and source-destination variants of the percolation centrality values of nodes in a network. Our algorithms extend the algorithm of Brandes, introduce optimizations aimed at exploiting the structural properties of graphs, and extend the algorithmic techniques introduced by Sariyuce et al. [26] in the context of centrality computation. Experimental studies of our algorithms on an Intel Xeon(R) Silver 4116 CPU and an Nvidia Tesla V100 GPU on a collection of 12 real-world graphs indicate that our algorithmic techniques offer a significant speedup.
2022-09-09
Tan, Mingtian, Wan, Junpeng, Zhou, Zhe, Li, Zhou.  2021.  Invisible Probe: Timing Attacks with PCIe Congestion Side-channel. 2021 IEEE Symposium on Security and Privacy (SP). :322—338.
PCIe (Peripheral Component Interconnect express) protocol is the de facto protocol to bridge CPU and peripheral devices like GPU, NIC, and SSD drive. There is an increasing demand to install more peripheral devices on a single machine, but the PCIe interfaces offered by Intel CPUs are fixed. To resolve such contention, PCIe switch, PCH (Platform Controller Hub), or virtualization cards are installed on the machine to allow multiple devices to share a PCIe interface. Congestion happens when the collective PCIe traffic from the devices overwhelm the PCIe link capacity, and transmission delay is then introduced.In this work, we found the PCIe delay not only harms device performance but also leaks sensitive information about a user who uses the machine. In particular, as user’s activities might trigger data movement over PCIe (e.g., between CPU and GPU), by measuring PCIe congestion, an adversary accessing another device can infer the victim’s secret indirectly. Therefore, the delay resulted from I/O congestion can be exploited as a side-channel. We demonstrate the threat from PCIe congestion through 2 attack scenarios and 4 victim settings. Specifically, an attacker can learn the workload of a GPU in a remote server by probing a RDMA NIC that shares the same PCIe switch and measuring the delays. Based on the measurement, the attacker is able to know the keystroke timings of the victim, what webpage is rendered on the GPU, and what machine-learning model is running on the GPU. Besides, when the victim is using a low-speed device, e.g., an Ethernet NIC, an attacker controlling an NVMe SSD can launch a similar attack when they share a PCH or virtualization card. The evaluation result shows our attack can achieve high accuracy (e.g., 96.31% accuracy in inferring webpage visited by a victim).
2022-06-14
Su, Liyilei, Fu, Xianjun, Hu, Qingmao.  2021.  A convolutional generative adversarial framework for data augmentation based on a robust optimal transport metric. 2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys). :1155–1162.
Enhancement of the vanilla generative adversarial network (GAN) to preserve data variability in the presence of real world noise is of paramount significance in deep learning. In this study, we proposed a new distance metric of cosine distance in the framework of optimal transport (OT), and presented and validated a convolutional neural network (CNN) based GAN framework. In comparison with state-of-the-art methods based on Graphics Processing Units (GPU), the proposed framework could maintain the data diversity and quality best in terms of inception score (IS), Fréchet inception distance (FID) and enhancing the classification network of bone age, and is robust to noise degradation. The proposed framework is independent of hardware and thus could also be extended to more advanced hardware such as specialized Tensor Processing Units (TPU), and could be a potential built-in component of a general deep learning networks for such applications as image classification, segmentation, registration, and object detection.
2022-06-09
Aleksandrov, Mykyta.  2021.  Confirmation of Mutual Synchronization of the TPMs Using Hash Functions. 2021 IEEE 3rd International Conference on Advanced Trends in Information Theory (ATIT). :80–83.
This paper presents experimental results of evaluating the effect of network delay on the synchronization time of three parity machines. The possibility of using a hash function to confirm the synchronization of parity tree machines has been investigated. Three parity machines have been proposed as a modification of the symmetric encryption algorithm. One advantage of the method is the possibility to use the phenomenon of mutual synchronization of neural networks to generate an identical encryption key for users without the need to transfer it. As a result, the degree of influence of network delay and the type of hash function used on the synchronization time of neural networks was determined. The degree of influence of the network delay and hash function was determined experimentally. The hash function sha512 showed the best results. The tasks for further research have been defined.
2022-05-19
Zhang, Feng, Pan, Zaifeng, Zhou, Yanliang, Zhai, Jidong, Shen, Xipeng, Mutlu, Onur, Du, Xiaoyong.  2021.  G-TADOC: Enabling Efficient GPU-Based Text Analytics without Decompression. 2021 IEEE 37th International Conference on Data Engineering (ICDE). :1679–1690.
Text analytics directly on compression (TADOC) has proven to be a promising technology for big data analytics. GPUs are extremely popular accelerators for data analytics systems. Unfortunately, no work so far shows how to utilize GPUs to accelerate TADOC. We describe G-TADOC, the first framework that provides GPU-based text analytics directly on compression, effectively enabling efficient text analytics on GPUs without decompressing the input data. G-TADOC solves three major challenges. First, TADOC involves a large amount of dependencies, which makes it difficult to exploit massive parallelism on a GPU. We develop a novel fine-grained thread-level workload scheduling strategy for GPU threads, which partitions heavily-dependent loads adaptively in a fine-grained manner. Second, in developing G-TADOC, thousands of GPU threads writing to the same result buffer leads to inconsistency while directly using locks and atomic operations lead to large synchronization overheads. We develop a memory pool with thread-safe data structures on GPUs to handle such difficulties. Third, maintaining the sequence information among words is essential for lossless compression. We design a sequence-support strategy, which maintains high GPU parallelism while ensuring sequence information. Our experimental evaluations show that G-TADOC provides 31.1× average speedup compared to state-of-the-art TADOC.
2022-03-15
Hu, Yanbu, Shao, Cuiping, Li, Huiyun.  2021.  Energy-Efficient Deep Neural Networks Implementation on a Scalable Heterogeneous FPGA Cluster. 2021 IEEE 15th International Conference on Anti-counterfeiting, Security, and Identification (ASID). :10—15.
In recent years, with the rapid development of DNN, the algorithm complexity in a series of fields such as computer vision and natural language processing is increasing rapidly. FPGA-based DNN accelerators have demonstrated superior flexibility and performance, with higher energy efficiency compared to high-performance devices such as GPU. However, the computing resources of a single FPGA are limited and it is difficult to flexibly meet the requirements of high throughput and high energy efficiency of different computing scales. Therefore, this paper proposes a DNN implementation method based on the scalable heterogeneous FPGA cluster to adapt to different tasks and achieve high throughput and energy efficiency. Firstly, the method divides a single enormous task into multiple modules and running each module on different FPGA as the pipeline structure between multiple boards. Secondly, a task deployment method based on dichotomy is proposed to maximize the balance of task execution time of different pipeline stages to improve throughput and energy efficiency. Thirdly, optimize DNN computing module according to the relationship between computing power and bandwidth, and improve energy efficiency by reducing waste of ineffective resources and improving resource utilization. The experiment results on Alexnet and VGG-16 demonstrate that we use Zynq 7035 cluster can at most achieves ×25.23 energy efficiency of optimized AMD AIO processor. Compared with previous works of single FPGA and FPGA cluster, the energy efficiency is improved by 59.5% and 18.8%, respectively.
2022-03-09
Park, Byung H., Chattopadhyay, Somrita, Burgin, John.  2021.  Haze Mitigation in High-Resolution Satellite Imagery Using Enhanced Style-Transfer Neural Network and Normalization Across Multiple GPUs. 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. :2827—2830.
Despite recent advances in deep learning approaches, haze mitigation in large satellite images is still a challenging problem. Due to amorphous nature of haze, object detection or image segmentation approaches are not applicable. Also it is practically infeasible to obtain ground truths for training. Bounded memory capacity of GPUs is another constraint that limits the size of image to be processed. In this paper, we propose a style transfer based neural network approach to mitigate haze in a large overhead imagery. The network is trained without paired ground truths; further, perception loss is added to restore vivid colors, enhance contrast and minimize artifacts. The paper also illustrates our use of multiple GPUs in a collective way to produce a single coherent clear image where each GPU dehazes different portions of a large hazy image.
2022-02-22
Olivier, Stephen L., Ellingwood, Nathan D., Berry, Jonathan, Dunlavy, Daniel M..  2021.  Performance Portability of an SpMV Kernel Across Scientific Computing and Data Science Applications. 2021 IEEE High Performance Extreme Computing Conference (HPEC). :1—8.
Both the data science and scientific computing communities are embracing GPU acceleration for their most demanding workloads. For scientific computing applications, the massive volume of code and diversity of hardware platforms at supercomputing centers has motivated a strong effort toward performance portability. This property of a program, denoting its ability to perform well on multiple architectures and varied datasets, is heavily dependent on the choice of parallel programming model and which features of the programming model are used. In this paper, we evaluate performance portability in the context of a data science workload in contrast to a scientific computing workload, evaluating the same sparse matrix kernel on both. Among our implementations of the kernel in different performance-portable programming models, we find that many struggle to consistently achieve performance improvements using the GPU compared to simple one-line OpenMP parallelization on high-end multicore CPUs. We show one that does, and its performance approaches and sometimes even matches that of vendor-provided GPU math libraries.
2021-05-20
Usher, Will, Pascucci, Valerio.  2020.  Interactive Visualization of Terascale Data in the Browser: Fact or Fiction? 2020 IEEE 10th Symposium on Large Data Analysis and Visualization (LDAV). :27—36.

Information visualization applications have become ubiquitous, in no small part thanks to the ease of wide distribution and deployment to users enabled by the web browser. Scientific visualization applications, relying on native code libraries and parallel processing, have been less suited to such widespread distribution, as browsers do not provide the required libraries or compute capabilities. In this paper, we revisit this gap in visualization technologies and explore how new web technologies, WebAssembly and WebGPU, can be used to deploy powerful visualization solutions for large-scale scientific data in the browser. In particular, we evaluate the programming effort required to bring scientific visualization applications to the browser through these technologies and assess their competitiveness against classic native solutions. As a main example, we present a new GPU-driven isosurface extraction method for block-compressed data sets, that is suitable for interactive isosurface computation on large volumes in resource-constrained environments, such as the browser. We conclude that web browsers are on the verge of becoming a competitive platform for even the most demanding scientific visualization tasks, such as interactive visualization of isosurfaces from a 1TB DNS simulation. We call on researchers and developers to consider investing in a community software stack to ease use of these upcoming browser features to bring accessible scientific visualization to the browser.

2021-05-13
Jain, Harsh, Vikram, Aditya, Mohana, Kashyap, Ankit, Jain, Ayush.  2020.  Weapon Detection using Artificial Intelligence and Deep Learning for Security Applications. 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC). :193—198.
Security is always a main concern in every domain, due to a rise in crime rate in a crowded event or suspicious lonely areas. Abnormal detection and monitoring have major applications of computer vision to tackle various problems. Due to growing demand in the protection of safety, security and personal properties, needs and deployment of video surveillance systems can recognize and interpret the scene and anomaly events play a vital role in intelligence monitoring. This paper implements automatic gun (or) weapon detection using a convolution neural network (CNN) based SSD and Faster RCNN algorithms. Proposed implementation uses two types of datasets. One dataset, which had pre-labelled images and the other one is a set of images, which were labelled manually. Results are tabulated, both algorithms achieve good accuracy, but their application in real situations can be based on the trade-off between speed and accuracy.
2021-02-01
Wang, H., Li, Y., Wang, Y., Hu, H., Yang, M.-H..  2020.  Collaborative Distillation for Ultra-Resolution Universal Style Transfer. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :1857–1866.
Universal style transfer methods typically leverage rich representations from deep Convolutional Neural Network (CNN) models (e.g., VGG-19) pre-trained on large collections of images. Despite the effectiveness, its application is heavily constrained by the large model size to handle ultra-resolution images given limited memory. In this work, we present a new knowledge distillation method (named Collaborative Distillation) for encoder-decoder based neural style transfer to reduce the convolutional filters. The main idea is underpinned by a finding that the encoder-decoder pairs construct an exclusive collaborative relationship, which is regarded as a new kind of knowledge for style transfer models. Moreover, to overcome the feature size mismatch when applying collaborative distillation, a linear embedding loss is introduced to drive the student network to learn a linear embedding of the teacher's features. Extensive experiments show the effectiveness of our method when applied to different universal style transfer approaches (WCT and AdaIN), even if the model size is reduced by 15.5 times. Especially, on WCT with the compressed models, we achieve ultra-resolution (over 40 megapixels) universal style transfer on a 12GB GPU for the first time. Further experiments on optimization-based stylization scheme show the generality of our algorithm on different stylization paradigms. Our code and trained models are available at https://github.com/mingsun-tse/collaborative-distillation.
2021-01-11
Awad, M. A., Ashkiani, S., Porumbescu, S. D., Owens, J. D..  2020.  Dynamic Graphs on the GPU. 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). :739–748.
We present a fast dynamic graph data structure for the GPU. Our dynamic graph structure uses one hash table per vertex to store adjacency lists and achieves 3.4-14.8x faster insertion rates over the state of the art across a diverse set of large datasets, as well as deletion speedups up to 7.8x. The data structure supports queries and dynamic updates through both edge and vertex insertion and deletion. In addition, we define a comprehensive evaluation strategy based on operations, workloads, and applications that we believe better characterize and evaluate dynamic graph data structures.
2020-12-01
Garbo, A., Quer, S..  2018.  A Fast MPEG’s CDVS Implementation for GPU Featured in Mobile Devices. IEEE Access. 6:52027—52046.
The Moving Picture Experts Group's Compact Descriptors for Visual Search (MPEG's CDVS) intends to standardize technologies in order to enable an interoperable, efficient, and cross-platform solution for internet-scale visual search applications and services. Among the key technologies within CDVS, we recall the format of visual descriptors, the descriptor extraction process, and the algorithms for indexing and matching. Unfortunately, these steps require precision and computation accuracy. Moreover, they are very time-consuming, as they need running times in the order of seconds when implemented on the central processing unit (CPU) of modern mobile devices. In this paper, to reduce computation times and maintain precision and accuracy, we re-design, for many-cores embedded graphical processor units (GPUs), all main local descriptor extraction pipeline phases of the MPEG's CDVS standard. To reach this goal, we introduce new techniques to adapt the standard algorithm to parallel processing. Furthermore, to reduce memory accesses and efficiently distribute the kernel workload, we use new approaches to store and retrieve CDVS information on proper GPU data structures. We present a complete experimental analysis on a large and standard test set. Our experiments show that our GPU-based approach is remarkably faster than the CPU-based reference implementation of the standard, and it maintains a comparable precision in terms of true and false positive rates.
Karatas, G., Demir, O., Sahingoz, O. K..  2019.  A Deep Learning Based Intrusion Detection System on GPUs. 2019 11th International Conference on Electronics, Computers and Artificial Intelligence (ECAI). :1—6.

In recent years, almost all the real-world operations are transferred to cyber world and these market computers connect with each other via Internet. As a result of this, there is an increasing number of security breaches of the networks, whose admins cannot protect their networks from the all types of attacks. Although most of these attacks can be prevented with the use of firewalls, encryption mechanisms, access controls and some password protections mechanisms; due to the emergence of new type of attacks, a dynamic intrusion detection mechanism is always needed in the information security market. To enable the dynamicity of the Intrusion Detection System (IDS), it should be updated by using a modern learning mechanism. Neural Network approach is one of the mostly preferred algorithms for training the system. However, with the increasing power of parallel computing and use of big data for training, as a new concept, deep learning has been used in many of the modern real-world problems. Therefore, in this paper, we have proposed an IDS system which uses GPU powered Deep Learning Algorithms. The experimental results are collected on mostly preferred dataset KDD99 and it showed that use of GPU speed up training time up to 6.48 times depending on the number of the hidden layers and nodes in them. Additionally, we compare the different optimizers to enlighten the researcher to select the best one for their ongoing or future research.

2020-07-20
Stroup, Ronald L., Niewoehner, Kevin R..  2019.  Application of Artificial Intelligence in the National Airspace System – A Primer. 2019 Integrated Communications, Navigation and Surveillance Conference (ICNS). :1–14.

The National Airspace System (NAS), as a portion of the US' transportation system, has not yet begun to model or adopt integration of Artificial Intelligence (AI) technology. However, users of the NAS, i.e., Air transport operators, UAS operators, etc. are beginning to use this technology throughout their operations. At issue within the broader aviation marketplace, is the continued search for a solution set to the persistent daily delays and schedule perturbations that occur within the NAS. Despite billions invested through the NAS Modernization Program, the delays persist in the face of reduced demand for commercial routings. Every delay represents an economic loss to commercial transport operators, passengers, freighters, and any business depending on the transportation performance. Therefore, the FAA needs to begin to address from an advanced concepts perspective, what this wave of new technology will affect as it is brought to bear on various operations performance parameters, including safety, security, efficiency, and resiliency solution sets. This paper is the first in a series of papers we are developing to explore the application of AI in the National Airspace System (NAS). This first paper is meant to get everyone in the aviation community on the same page, a primer if you will, to start the technical discussions. This paper will define AI; the capabilities associated with AI; current use cases within the aviation ecosystem; and how to prepare for insertion of AI in the NAS. The next series of papers will look at NAS Operations Theory utilizing AI capabilities and eventually leading to a future intelligent NAS (iNAS) environment.

2020-06-26
M, Raviraja Holla, D, Suma.  2019.  Memory Efficient High-Performance Rotational Image Encryption. 2019 International Conference on Communication and Electronics Systems (ICCES). :60—64.

Image encryption is an essential part of a Visual Cryptography. Existing traditional sequential encryption techniques are infeasible to real-time applications. High-performance reformulations of such methods are increasingly growing over the last decade. These reformulations proved better performances over their sequential counterparts. A rotational encryption scheme encrypts the images in such a way that the decryption is possible with the rotated encrypted images. A parallel rotational encryption technique makes use of a high-performance device. But it less-leverages the optimizations offered by them. We propose a rotational image encryption technique which makes use of memory coalescing provided by the Compute Unified Device Architecture (CUDA). The proposed scheme achieves improved global memory utilization and increased efficiency.

2020-05-22
Despotovski, Filip, Gusev, Marjan, Zdraveski, Vladimir.  2018.  Parallel Implementation of K-Nearest-Neighbors for Face Recognition. 2018 26th Telecommunications Forum (℡FOR). :1—4.
Face recognition is a fast-expanding field of research. Countless classification algorithms have found use in face recognition, with more still being developed, searching for better performance and accuracy. For high-dimensional data such as images, the K-Nearest-Neighbours classifier is a tempting choice. However, it is very computationally-intensive, as it has to perform calculations on all items in the stored dataset for each classification it makes. Fortunately, there is a way to speed up the process by performing some of the calculations in parallel. We propose a parallel CUDA implementation of the KNN classifier and then compare it to a serial implementation to demonstrate its performance superiority.
2020-04-17
Liu, Sihang, Wei, Yizhou, Chi, Jianfeng, Shezan, Faysal Hossain, Tian, Yuan.  2019.  Side Channel Attacks in Computation Offloading Systems with GPU Virtualization. 2019 IEEE Security and Privacy Workshops (SPW). :156—161.

The Internet of Things (IoT) and mobile systems nowadays are required to perform more intensive computation, such as facial detection, image recognition and even remote gaming, etc. Due to the limited computation performance and power budget, it is sometimes impossible to perform these workloads locally. As high-performance GPUs become more common in the cloud, offloading the computation to the cloud becomes a possible choice. However, due to the fact that offloaded workloads from different devices (belonging to different users) are being computed in the same cloud, security concerns arise. Side channel attacks on GPU systems have been widely studied, where the threat model is the attacker and the victim are running on the same operating system. Recently, major GPU vendors have provided hardware and library support to virtualize GPUs for better isolation among users. This work studies the side channel attacks from one virtual machine to another where both share the same physical GPU. We show that it is possible to infer other user's activities in this setup and can further steal others deep learning model.

2020-03-30
Kim, Sejin, Oh, Jisun, Kim, Yoonhee.  2019.  Data Provenance for Experiment Management of Scientific Applications on GPU. 2019 20th Asia-Pacific Network Operations and Management Symposium (APNOMS). :1–4.
Graphics Processing Units (GPUs) are getting popularly utilized for multi-purpose applications in order to enhance highly performed parallelism of computation. As memory virtualization methods in GPU nodes are not efficiently provided to deal with diverse memory usage patterns for these applications, the success of their execution depends on exclusive and limited use of physical memory in GPU environments. Therefore, it is important to predict a pattern change of GPU memory usage during runtime execution of an application. Data provenance extracted from application characteristics, GPU runtime environments, input, and execution patterns from runtime monitoring, is defined for supporting application management to set runtime configuration and predict an experimental result, and utilize resource with co-located applications. In this paper, we define data provenance of an application on GPUs and manage data by profiling the execution of CUDA scientific applications. Data provenance management helps to predict execution patterns of other similar experiments and plan efficient resource configuration.
2019-12-30
Razaque, Abdul, Jinrui, Wang, Zancheng, Wang, Hani, Qassim Bani, Khaskheli, Murad Ali, Bhutto, Waseem Ahmed.  2018.  Integration of CPU and GPU to Accelerate RSA Modular Exponentiation Operation. 2018 IEEE Long Island Systems, Applications and Technology Conference (LISAT). :1-6.

Now-a-days, the security of data becomes more and more important, people store many personal information in their phones. However, stored information require security and maintain privacy. Encryption algorithm has become the main force of maintaining the security of data. Thus, the algorithm complexity and encryption efficiency have become the main measurement of whether the encryption algorithm is save or not. With the development of hardware, we have many tools to improve the algorithm at present. Because modular exponentiation in RSA algorithm can be divided into several parts mathematically. In this paper, we introduce a conception by dividing the process of encryption and add the model into graphics process unit (GPU). By using GPU's capacity in parallel computing, the core of RSA can be accelerated by using central process unit (CPU) and GPU. Compute unified device architecture (CUDA) is a platform which can combine CPU and GPU together to realize GPU parallel programming and this is the tool we use to perform experience of accelerating RSA algorithm. This paper will also build up a mathematical model to help understand the mechanism of RSA encryption algorithm.

2018-09-12
Khazankin, G. R., Komarov, S., Kovalev, D., Barsegyan, A., Likhachev, A..  2017.  System architecture for deep packet inspection in high-speed networks. 2017 Siberian Symposium on Data Science and Engineering (SSDSE). :27–32.

To solve the problems associated with large data volume real-time processing, heterogeneous systems using various computing devices are increasingly used. The characteristic of solving this class of problems is related to the fact that there are two directions for improving methods of real-time data analysis: the first is the development of algorithms and approaches to analysis, and the second is the development of hardware and software. This article reviews the main approaches to the architecture of a hardware-software solution for traffic capture and deep packet inspection (DPI) in data transmission networks with a bandwidth of 80 Gbit/s and higher. At the moment there are software and hardware tools that allow designing the architecture of capture system and deep packet inspection: 1) Using only the central processing unit (CPU); 2) Using only the graphics processing unit (GPU); 3) Using the central processing unit and graphics processing unit simultaneously (CPU + GPU). In this paper, we consider these key approaches. Also attention is paid to both hardware and software requirements for the architecture of solutions. Pain points and remedies are described.

2018-06-20
Searles, R., Xu, L., Killian, W., Vanderbruggen, T., Forren, T., Howe, J., Pearson, Z., Shannon, C., Simmons, J., Cavazos, J..  2017.  Parallelization of Machine Learning Applied to Call Graphs of Binaries for Malware Detection. 2017 25th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP). :69–77.

Malicious applications have become increasingly numerous. This demands adaptive, learning-based techniques for constructing malware detection engines, instead of the traditional manual-based strategies. Prior work in learning-based malware detection engines primarily focuses on dynamic trace analysis and byte-level n-grams. Our approach in this paper differs in that we use compiler intermediate representations, i.e., the callgraph representation of binaries. Using graph-based program representations for learning provides structure of the program, which can be used to learn more advanced patterns. We use the Shortest Path Graph Kernel (SPGK) to identify similarities between call graphs extracted from binaries. The output similarity matrix is fed into a Support Vector Machine (SVM) algorithm to construct highly-accurate models to predict whether a binary is malicious or not. However, SPGK is computationally expensive due to the size of the input graphs. Therefore, we evaluate different parallelization methods for CPUs and GPUs to speed up this kernel, allowing us to continuously construct up-to-date models in a timely manner. Our hybrid implementation, which leverages both CPU and GPU, yields the best performance, achieving up to a 14.2x improvement over our already optimized OpenMP version. We compared our generated graph-based models to previously state-of-the-art feature vector 2-gram and 3-gram models on a dataset consisting of over 22,000 binaries. We show that our classification accuracy using graphs is over 19% higher than either n-gram model and gives a false positive rate (FPR) of less than 0.1%. We are also able to consider large call graphs and dataset sizes because of the reduced execution time of our parallelized SPGK implementation.

2018-03-26
Zahilah, R., Tahir, F., Zainal, A., Abdullah, A. H., Ismail, A. S..  2017.  Unified Approach for Operating System Comparisons with Windows OS Case Study. 2017 IEEE Conference on Application, Information and Network Security (AINS). :91–96.

The advancement in technology has changed how people work and what software and hardware people use. From conventional personal computer to GPU, hardware technology and capability have dramatically improved so does the operating systems that come along. Unfortunately, current industry practice to compare OS is performed with single perspective. It is either benchmark the hardware level performance or performs penetration testing to check the security features of an OS. This rigid method of benchmarking does not really reflect the true performance of an OS as the performance analysis is not comprehensive and conclusive. To illustrate this deficiency, the study performed hardware level and operational level benchmarking on Windows XP, Windows 7 and Windows 8 and the results indicate that there are instances where Windows XP excels over its newer counterparts. Overall, the research shows Windows 8 is a superior OS in comparison to its predecessors running on the same hardware. Furthermore, the findings also show that the automated benchmarking tools are proved less efficient benchmark systems that run on Windows XP and older OS as they do not support DirectX 11 and other advanced features that the hardware supports. There lies the need to have a unified benchmarking approach to compare other aspects of OS such as user oriented tasks and security parameters to provide a complete comparison. Therefore, this paper is proposing a unified approach for Operating System (OS) comparisons with the help of a Windows OS case study. This unified approach includes comparison of OS from three aspects which are; hardware level, operational level performance and security tests.