Xie, Bing, Tan, Zilong, Carns, Philip, Chase, Jeff, Harms, Kevin, Lofstead, Jay, Oral, Sarp, Vazhkudai, Sudharshan S., Wang, Feiyi.
2021.
Interpreting Write Performance of Supercomputer I/O Systems with Regression Models. 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). :557—566.
This work seeks to advance the state of the art in HPC I/O performance analysis and interpretation. In particular, we demonstrate effective techniques to: (1) model output performance in the presence of I/O interference from production loads; (2) build features from write patterns and key parameters of the system architecture and configurations; (3) employ suitable machine learning algorithms to improve model accuracy. We train models with five popular regression algorithms and conduct experiments on two distinct production HPC platforms. We find that the lasso and random forest models predict output performance with high accuracy on both of the target systems. We also explore use of the models to guide adaptation in I/O middleware systems, and show potential for improvements of at least 15% from model-guided adaptation on 70% of samples, and improvements up to 10 x on some samples for both of the target systems.
Aichernig, Bernhard K., Muškardin, Edi, Pferscher, Andrea.
2021.
Learning-Based Fuzzing of IoT Message Brokers. 2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST). :47—58.
The number of devices in the Internet of Things (IoT) immensely grew in recent years. A frequent challenge in the assurance of the dependability of IoT systems is that components of the system appear as a black box. This paper presents a semi-automatic testing methodology for black-box systems that combines automata learning and fuzz testing. Our testing technique uses stateful fuzzing based on a model that is automatically inferred by automata learning. Applying this technique, we can simultaneously test multiple implementations for unexpected behavior and possible security vulnerabilities.We show the effectiveness of our learning-based fuzzing technique in a case study on the MQTT protocol. MQTT is a widely used publish/subscribe protocol in the IoT. Our case study reveals several inconsistencies between five different MQTT brokers. The found inconsistencies expose possible security vulnerabilities and violations of the MQTT specification.
Abutaha, Mohammed, Ababneh, Mohammad, Mahmoud, Khaled, Baddar, Sherenaz Al-Haj.
2021.
URL Phishing Detection using Machine Learning Techniques based on URLs Lexical Analysis. 2021 12th International Conference on Information and Communication Systems (ICICS). :147—152.
Phishing URLs mainly target individuals and/or organizations through social engineering attacks by exploiting the humans' weaknesses in information security awareness. These URLs lure online users to access fake websites, and harvest their confidential information, such as debit/credit card numbers and other sensitive information. In this work, we introduce a phishing detection technique based on URL lexical analysis and machine learning classifiers. The experiments were carried out on a dataset that originally contained 1056937 labeled URLs (phishing and legitimate). This dataset was processed to generate 22 different features that were reduced further to a smaller set using different features reduction techniques. Random Forest, Gradient Boosting, Neural Network and Support Vector Machine (SVM) classifiers were all evaluated, and results show the superiority of SVMs, which achieved the highest accuracy in detecting the analyzed URLs with a rate of 99.89%. Our approach can be incorporated within add-on/middleware features in Internet browsers for alerting online users whenever they try to access a phishing website using only its URL.
Liu, Xusheng, Deng, Zhidong, Lv, Jingxian, Zhang, Xiaohui, Xu, Yin.
2021.
Intelligent Notification System for Large User Groups. 2021 IEEE Asia-Pacific Conference on Image Processing, Electronics and Computers (IPEC). :1213—1216.
With the development of communication technology, the disadvantages of traditional notification methods such as low efficiency gradually appear. With the introduction of WAP with WTLS security and its development and maintenance, more and more notification systems are using this technology. Through the analysis, design and implementation of notification system for large user groups, this paper studies how to collect and notify data without affecting the business system, and proposes a scheme of real-time data acquisition and filtering based on trigger. The middleware and application server implementation transaction management and database operation to separate CICS middleware technology based on research using UNIXC, Socket programming, SQL statements, SYBASE database technology, from the system requirements, business process, function structure, database and data structure, the input and output of the system, system testing the aspects such as design of practical significance to intelligent notification system for large user groups. Finally, the paper describes the test effect of the system in detail. 10 users send 1, 5, 10 and 20 strokes at the same time, and the completion time is 0.28, 1.09, 1.58 and 2.20 seconds, which proves that the system has practical significance.
Bolbol, Noor, Barhoom, Tawfiq.
2021.
Mitigating Web Scrapers using Markup Randomization. 2021 Palestinian International Conference on Information and Communication Technology (PICICT). :157—162.
Web Scraping is the technique of extracting desired data in an automated way by scanning the internal links and content of a website, this activity usually performed by systematically programmed bots. This paper explains our proposed solution to protect the blog content from theft and from being copied to other destinations by mitigating the scraping bots. To achieve our purpose we applied two steps in two levels, the first one, on the main blog page level, mitigated the work of crawler bots by adding extra empty articles anchors among real articles, and the next step, on the article page level, we add a random number of empty and hidden spans with randomly generated text among the article's body. To assess this solution we apply it to a local project developed using PHP language in Laravel framework, and put four criteria that measure the effectiveness. The results show that the changes in the file size before and after the application do not affect it, also, the processing time increased by few milliseconds which still in the acceptable range. And by using the HTML-similarity tool we get very good results that show the symmetric over style, with a few bit changes over the structure. Finally, to assess the effects on the bots, scraper bot reused and get the expected results from the programmed middleware. These results show that the solution is feasible to be adopted and use to protect blogs content.