Visible to the public Biblio

Filters: Keyword is Cluster computing  [Clear All Filters]
2023-05-12
Li, Shushan, Wang, Meng, Zhang, Hong.  2022.  Deadlock Detection for MPI Programs Based on Refined Match-sets. 2022 IEEE International Conference on Cluster Computing (CLUSTER). :82–93.

Deadlock is one of the critical problems in the message passing interface. At present, most techniques for detecting the MPI deadlock issue rely on exhausting all execution paths of a program, which is extremely inefficient. In addition, with the increasing number of wildcards that receive events and processes, the number of execution paths raises exponentially, further worsening the situation. To alleviate the problem, we propose a deadlock detection approach called SAMPI based on match-sets to avoid exploring execution paths. In this approach, a match detection rule is employed to form the rough match-sets based on Lazy Lamport Clocks Protocol. Then we design three refining algorithms based on the non-overtaking rule and MPI communication mechanism to refine the match-sets. Finally, deadlocks are detected by analyzing the refined match-sets. We performed the experimental evaluation on 15 various programs, and the experimental results show that SAMPI is really efficient in detecting deadlocks in MPI programs, especially in handling programs with many interleavings.

ISSN: 2168-9253

2023-02-17
Mahmood, Riyadh, Pennington, Jay, Tsang, Danny, Tran, Tan, Bogle, Andrea.  2022.  A Framework for Automated API Fuzzing at Enterprise Scale. 2022 IEEE Conference on Software Testing, Verification and Validation (ICST). :377–388.
Web-based Application Programming Interfaces (APIs) are often described using SOAP, OpenAPI, and GraphQL specifications. These specifications provide a consistent way to define web services and enable automated fuzz testing. As such, many fuzzers take advantage of these specifications. However, in an enterprise setting, the tools are usually installed and scaled by individual teams, leading to duplication of efforts. There is a need for an enterprise-wide fuzz testing solution to provide shared, cost efficient, off-nominal testing at scale where fuzzers can be plugged-in as needed. Internet cloud-based fuzz testing-as-a-service solutions mitigate scalability concerns but are not always feasible as they require artifacts to be uploaded to external infrastructure. Typically, corporate policies prevent sharing artifacts with third parties due to cost, intellectual property, and security concerns. We utilize API specifications and combine them with cluster computing elasticity to build an automated, scalable framework that can fuzz multiple apps at once and retain the trust boundary of the enterprise.
ISSN: 2159-4848
2022-07-15
Lagraa, Sofiane, State, Radu.  2021.  What database do you choose for heterogeneous security log events analysis? 2021 IFIP/IEEE International Symposium on Integrated Network Management (IM). :812—817.
The heterogeneous massive logs incoming from multiple sources pose major challenges to professionals responsible for IT security and system administrator. One of the challenges is to develop a scalable heterogeneous logs database for storage and further analysis. In fact, it is difficult to decide which database is suitable for the needs, the best of a use case, execution time and storage performances. In this paper, we explore, study, and compare the performance of SQL and NoSQL databases on large heterogeneous event logs. We implement the relational database using MySQL, the column-oriented database using Impala on the top of Hadoop, and the graph database using Neo4j. We experiment the databases on a large heterogeneous logs and provide advice, the pros and cons of each SQL and NoSQL database. Our findings that Impala outperforms MySQL and Neo4j databases in terms of loading logs, execution time of simple queries, and storage of logs. However, Neo4j outperforms Impala and MySQL in the execution time of complex queries.
2022-04-13
Kousar, Heena, Mulla, Mohammed Moin, Shettar, Pooja, D. G., Narayan.  2021.  DDoS Attack Detection System using Apache Spark. 2021 International Conference on Computer Communication and Informatics (ICCCI). :1—5.
Distributed Denial of Service Attacks (DDoS) are most widely used cyber-attacks. Thus, design of DDoS detection mechanisms has attracted attention of researchers. Design of these mechanisms involves building statistical and machine learning models. Most of the work in design of mechanisms is focussed on improving the accuracy of the model. However, due to large volume of network traffic, scalability and performance of these techniques is an important research issue. In this work, we use Apache Spark framework for detection of DDoS attacks. We use NSL-KDD Cup as a benchmark dataset for experimental analysis. The results reveal that random forest performs better than decision trees and distributed processing improves the performance in terms of pre-processing and training time.
2020-08-28
Hasanin, Tawfiq, Khoshgoftaar, Taghi M., Leevy, Joffrey L..  2019.  A Comparison of Performance Metrics with Severely Imbalanced Network Security Big Data. 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI). :83—88.

Severe class imbalance between the majority and minority classes in large datasets can prejudice Machine Learning classifiers toward the majority class. Our work uniquely consolidates two case studies, each utilizing three learners implemented within an Apache Spark framework, six sampling methods, and five sampling distribution ratios to analyze the effect of severe class imbalance on big data analytics. We use three performance metrics to evaluate this study: Area Under the Receiver Operating Characteristic Curve, Area Under the Precision-Recall Curve, and Geometric Mean. In the first case study, models were trained on one dataset (POST) and tested on another (SlowlorisBig). In the second case study, the training and testing dataset roles were switched. Our comparison of performance metrics shows that Area Under the Precision-Recall Curve and Geometric Mean are sensitive to changes in the sampling distribution ratio, whereas Area Under the Receiver Operating Characteristic Curve is relatively unaffected. In addition, we demonstrate that when comparing sampling methods, borderline-SMOTE2 outperforms the other methods in the first case study, and Random Undersampling is the top performer in the second case study.