Visible to the public Biblio

Filters: Keyword is Pipelines  [Clear All Filters]
2023-07-19
Voulgaris, Konstantinos, Kiourtis, Athanasios, Karamolegkos, Panagiotis, Karabetian, Andreas, Poulakis, Yannis, Mavrogiorgou, Argyro, Kyriazis, Dimosthenis.  2022.  Data Processing Tools for Graph Data Modelling Big Data Analytics. 2022 13th International Congress on Advanced Applied Informatics Winter (IIAI-AAI-Winter). :208—212.
Any Big Data scenario eventually reaches scalability concerns for several factors, often storage or computing power related. Modern solutions have been proven to be effective in multiple domains and have automated many aspects of the Big Data pipeline. In this paper, we aim to present a solution for deploying event-based automated data processing tools for low code environments that aim to minimize the need for user input and can effectively handle common data processing jobs, as an alternative to distributed solutions which require language specific libraries and code. Our architecture uses a combination of a network exposed service with a cluster of “Data Workers” that handle data processing jobs effectively without requiring manual input from the user. This system proves to be effective at handling most data processing scenarios and allows for easy expandability by following simple patterns when declaring any additional jobs.
2023-06-29
Rasyid, Ihsan Faishal, Zagi, Luqman Muhammad, Suhardi.  2022.  Digital Forensic Readiness Information System For EJBCA Digital Signature Web Server. 2022 International Conference on Information Technology Systems and Innovation (ICITSI). :177–182.
As the nature of the website, the EJBCA digital signatures may have vulnerabilities. The list of web-based vulnerabilities can be found in OWASP's Top 10 2021. Anticipating the attack with an effective and efficient forensics application is necessary. The concept of digital forensic readiness can be applied as a pre-incident plan with a digital forensic lifecycle pipeline to establish an efficient forensic process. Managing digital evidence in the pre-incident plan includes data collection, examination, analysis, and findings report. Based on this concept, we implemented it in designing an information system that carries out the entire flow, provides attack evidence collection, visualization of attack statistics in executive summary, mitigation recommendation, and forensic report generation in a physical form when needed. This research offers an information system that can help the digital forensic process and maintain the integrity of the EJBCA digital signature server web.
2023-06-23
Vogel, Michael, Schuster, Franka, Kopp, Fabian Malte, König, Hartmut.  2022.  Data Volume Reduction for Deep Packet Inspection by Multi-layer Application Determination. 2022 IEEE International Conference on Cyber Security and Resilience (CSR). :44–49.
Attack detection in enterprise networks is increasingly faced with large data volumes, in part high data bursts, and heavily fluctuating data flows that often cause arbitrary discarding of data packets in overload situations which can be used by attackers to hide attack activities. Attack detection systems usually configure a comprehensive set of signatures for known vulnerabilities in different operating systems, protocols, and applications. Many of these signatures, however, are not relevant in each context, since certain vulnerabilities have already been eliminated, or the vulnerable applications or operating system versions, respectively, are not installed on the involved systems. In this paper, we present an approach for clustering data flows to assign them to dedicated analysis units that contain only signature sets relevant for the analysis of these flows. We discuss the performance of this clustering and show how it can be used in practice to improve the efficiency of an analysis pipeline.
2023-05-11
Li, Hongwei, Chasaki, Danai.  2022.  Network-Based Machine Learning Detection of Covert Channel Attacks on Cyber-Physical Systems. 2022 IEEE 20th International Conference on Industrial Informatics (INDIN). :195–201.
Most of the recent high-profile attacks targeting cyber-physical systems (CPS) started with lengthy reconnaissance periods that enabled attackers to gain in-depth understanding of the victim’s environment. To simulate these stealthy attacks, several covert channel tools have been published and proven effective in their ability to blend into existing CPS communication streams and have the capability for data exfiltration and command injection.In this paper, we report a novel machine learning feature engineering and data processing pipeline for the detection of covert channel attacks on CPS systems with real-time detection throughput. The system also operates at the network layer without requiring physical system domain-specific state modeling, such as voltage levels in a power generation system. We not only demonstrate the effectiveness of using TCP payload entropy as engineered features and the technique of grouping information into network flows, but also pitch the proposed detector against scenarios employing advanced evasion tactics, and still achieve above 99% detection performance.
2023-03-31
Gao, Ruijun, Guo, Qing, Juefei-Xu, Felix, Yu, Hongkai, Fu, Huazhu, Feng, Wei, Liu, Yang, Wang, Song.  2022.  Can You Spot the Chameleon? Adversarially Camouflaging Images from Co-Salient Object Detection 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). :2140–2149.
Co-salient object detection (CoSOD) has recently achieved significant progress and played a key role in retrieval-related tasks. However, it inevitably poses an entirely new safety and security issue, i.e., highly personal and sensitive content can potentially be extracting by powerful CoSOD methods. In this paper, we address this problem from the perspective of adversarial attacks and identify a novel task: adversarial co-saliency attack. Specially, given an image selected from a group of images containing some common and salient objects, we aim to generate an adversarial version that can mislead CoSOD methods to predict incorrect co-salient regions. Note that, compared with general white-box adversarial attacks for classification, this new task faces two additional challenges: (1) low success rate due to the diverse appearance of images in the group; (2) low transferability across CoSOD methods due to the considerable difference between CoSOD pipelines. To address these challenges, we propose the very first blackbox joint adversarial exposure and noise attack (Jadena), where we jointly and locally tune the exposure and additive perturbations of the image according to a newly designed high-feature-level contrast-sensitive loss function. Our method, without any information on the state-of-the-art CoSOD methods, leads to significant performance degradation on various co-saliency detection datasets and makes the co-salient objects undetectable. This can have strong practical benefits in properly securing the large number of personal photos currently shared on the Internet. Moreover, our method is potential to be utilized as a metric for evaluating the robustness of CoSOD methods.
2023-03-03
Zhou, Ziyi, Han, Xing, Chen, Zeyuan, Nan, Yuhong, Li, Juanru, Gu, Dawu.  2022.  SIMulation: Demystifying (Insecure) Cellular Network based One-Tap Authentication Services. 2022 52nd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :534–546.
A recently emerged cellular network based One-Tap Authentication (OTAuth) scheme allows app users to quickly sign up or log in to their accounts conveniently: Mobile Network Operator (MNO) provided tokens instead of user passwords are used as identity credentials. After conducting a first in-depth security analysis, however, we have revealed several fundamental design flaws among popular OTAuth services, which allow an adversary to easily (1) perform unauthorized login and register new accounts as the victim, (2) illegally obtain identities of victims, and (3) interfere OTAuth services of legitimate apps. To further evaluate the impact of our identified issues, we propose a pipeline that integrates both static and dynamic analysis. We examined 1,025/894 Android/iOS apps, each app holding more than 100 million installations. We confirmed 396/398 Android/iOS apps are affected. Our research systematically reveals the threats against OTAuth services. Finally, we provide suggestions on how to mitigate these threats accordingly.
ISSN: 2158-3927
Hong, Geng, Yang, Zhemin, Yang, Sen, Liaoy, Xiaojing, Du, Xiaolin, Yang, Min, Duan, Haixin.  2022.  Analyzing Ground-Truth Data of Mobile Gambling Scams. 2022 IEEE Symposium on Security and Privacy (SP). :2176–2193.
With the growth of mobile computing techniques, mobile gambling scams have seen a rampant increase in the recent past. In mobile gambling scams, miscreants deliver scamming messages via mobile instant messaging, host scam gambling platforms on mobile apps, and adopt mobile payment channels. To date, there is little quantitative knowledge about how this trending cybercrime operates, despite causing daily fraud losses estimated at more than \$\$\$522,262 USD. This paper presents the first empirical study based on ground-truth data of mobile gambling scams, associated with 1,461 scam incident reports and 1,487 gambling scam apps, spanning from January 1, 2020 to December 31, 2020. The qualitative and quantitative analysis of this ground-truth data allows us to characterize the operational pipeline and full fraud kill chain of mobile gambling scams. In particular, we study the social engineering tricks used by scammers and reveal their effectiveness. Our work provides a systematic analysis of 1,068 confirmed Android and 419 iOS scam apps, including their development frameworks, declared permissions, compatibility, and backend network infrastructure. Perhaps surprisingly, our study unveils that public online app generators have been abused to develop gambling scam apps. Our analysis reveals several payment channels (ab)used by gambling scam app and uncovers a new type of money mule-based payment channel with the average daily gambling deposit of \$\$\$400,000 USD. Our findings enable a better understanding of the mobile gambling scam ecosystem, and suggest potential avenues to disrupt these scam activities.
ISSN: 2375-1207
2023-02-17
Sikder, Md Nazmul Kabir, Batarseh, Feras A., Wang, Pei, Gorentala, Nitish.  2022.  Model-Agnostic Scoring Methods for Artificial Intelligence Assurance. 2022 IEEE 29th Annual Software Technology Conference (STC). :9–18.
State of the art Artificial Intelligence Assurance (AIA) methods validate AI systems based on predefined goals and standards, are applied within a given domain, and are designed for a specific AI algorithm. Existing works do not provide information on assuring subjective AI goals such as fairness and trustworthiness. Other assurance goals are frequently required in an intelligent deployment, including explainability, safety, and security. Accordingly, issues such as value loading, generalization, context, and scalability arise; however, achieving multiple assurance goals without major trade-offs is generally deemed an unattainable task. In this manuscript, we present two AIA pipelines that are model-agnostic, independent of the domain (such as: healthcare, energy, banking), and provide scores for AIA goals including explainability, safety, and security. The two pipelines: Adversarial Logging Scoring Pipeline (ALSP) and Requirements Feedback Scoring Pipeline (RFSP) are scalable and tested with multiple use cases, such as a water distribution network and a telecommunications network, to illustrate their benefits. ALSP optimizes models using a game theory approach and it also logs and scores the actions of an AI model to detect adversarial inputs, and assures the datasets used for training. RFSP identifies the best hyper-parameters using a Bayesian approach and provides assurance scores for subjective goals such as ethical AI using user inputs and statistical assurance measures. Each pipeline has three algorithms that enforce the final assurance scores and other outcomes. Unlike ALSP (which is a parallel process), RFSP is user-driven and its actions are sequential. Data are collected for experimentation; the results of both pipelines are presented and contrasted.
2023-02-03
Kumar, Abhinav, Tourani, Reza, Vij, Mona, Srikanteswara, Srikathyayani.  2022.  SCLERA: A Framework for Privacy-Preserving MLaaS at the Pervasive Edge. 2022 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops). :175–180.
The increasing data generation rate and the proliferation of deep learning applications have led to the development of machine learning-as-a-service (MLaaS) platforms by major Cloud providers. The existing MLaaS platforms, however, fall short in protecting the clients’ private data. Recent distributed MLaaS architectures such as federated learning have also shown to be vulnerable against a range of privacy attacks. Such vulnerabilities motivated the development of privacy-preserving MLaaS techniques, which often use complex cryptographic prim-itives. Such approaches, however, demand abundant computing resources, which undermine the low-latency nature of evolving applications such as autonomous driving.To address these challenges, we propose SCLERA–an efficient MLaaS framework that utilizes trusted execution environment for secure execution of clients’ workloads. SCLERA features a set of optimization techniques to reduce the computational complexity of the offloaded services and achieve low-latency inference. We assessed SCLERA’s efficacy using image/video analytic use cases such as scene detection. Our results show that SCLERA achieves up to 23× speed-up when compared to the baseline secure model execution.
2023-01-06
Ham, MyungJoo, Woo, Sangjung, Jung, Jaeyun, Song, Wook, Jang, Gichan, Ahn, Yongjoo, Ahn, Hyoungjoo.  2022.  Toward Among-Device AI from On-Device AI with Stream Pipelines. 2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP). :285—294.
Modern consumer electronic devices often provide intelligence services with deep neural networks. We have started migrating the computing locations of intelligence services from cloud servers (traditional AI systems) to the corresponding devices (on-device AI systems). On-device AI systems generally have the advantages of preserving privacy, removing network latency, and saving cloud costs. With the emergence of on-device AI systems having relatively low computing power, the inconsistent and varying hardware resources and capabilities pose difficulties. Authors' affiliation has started applying a stream pipeline framework, NNStreamer, for on-device AI systems, saving developmental costs and hardware resources and improving performance. We want to expand the types of devices and applications with on-device AI services products of both the affiliation and second/third parties. We also want to make each AI service atomic, re-deployable, and shared among connected devices of arbitrary vendors; we now have yet another requirement introduced as it always has been. The new requirement of “among-device AI” includes connectivity between AI pipelines so that they may share computing resources and hardware capabilities across a wide range of devices regardless of vendors and manufacturers. We propose extensions of the stream pipeline framework, NNStreamer, for on-device AI so that NNStreamer may provide among-device AI capability. This work is a Linux Foundation (LF AI & Data) open source project accepting contributions from the general public.
2022-12-20
Song, Suhwan, Hur, Jaewon, Kim, Sunwoo, Rogers, Philip, Lee, Byoungyoung.  2022.  R2Z2: Detecting Rendering Regressions in Web Browsers through Differential Fuzz Testing. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE). :1818–1829.
A rendering regression is a bug introduced by a web browser where a web page no longer functions as users expect. Such rendering bugs critically harm the usability of web browsers as well as web applications. The unique aspect of rendering bugs is that they affect the presented visual appearance of web pages, but those web pages have no pre-defined correct appearance. Therefore, it is challenging to automatically detect errors in their appearance. In practice, web browser vendors rely on non-trivial and time-prohibitive manual analysis to detect and handle rendering regressions. This paper proposes R2Z2, an automated tool to find rendering regressions. R2Z2 uses the differential fuzz testing approach, which repeatedly compares the rendering results of two different versions of a browser while providing the same HTML as input. If the rendering results are different, R2Z2 further performs cross browser compatibility testing to check if the rendering difference is indeed a rendering regression. After identifying a rendering regression, R2Z2 will perform an in-depth analysis to aid in fixing the regression. Specifically, R2Z2 performs a delta-debugging-like analysis to pinpoint the exact browser source code commit causing the regression, as well as inspecting the rendering pipeline stages to pinpoint which pipeline stage is responsible. We implemented a prototype of R2Z2 particularly targeting the Chrome browser. So far, R2Z2 found 11 previously undiscovered rendering regressions in Chrome, all of which were confirmed by the Chrome developers. Importantly, in each case, R2Z2 correctly reported the culprit commit. Moreover, R2Z2 correctly pin-pointed the culprit rendering pipeline stage in all but one case.
ISSN: 1558-1225
2022-12-01
Yeo, Guo Feng Anders, Hudson, Irene, Akman, David, Chan, Jeffrey.  2022.  A Simple Framework for XAI Comparisons with a Case Study. 2022 5th International Conference on Artificial Intelligence and Big Data (ICAIBD). :501—508.
The number of publications related to Explainable Artificial Intelligence (XAI) has increased rapidly this last decade. However, the subjective nature of explainability has led to a lack of consensus regarding commonly used definitions for explainability and with differing problem statements falling under the XAI label resulting in a lack of comparisons. This paper proposes in broad terms a simple comparison framework for XAI methods based on the output and what we call the practical attributes. The aim of the framework is to ensure that everything that can be held constant for the purpose of comparison, is held constant and to ignore many of the subjective elements present in the area of XAI. An example utilizing such a comparison along the lines of the proposed framework is performed on local, post-hoc, model-agnostic XAI algorithms which are designed to measure the feature importance/contribution for a queried instance. These algorithms are assessed on two criteria using synthetic datasets across a range of classifiers. The first is based on selecting features which contribute to the underlying data structure and the second is how accurately the algorithms select the features used in a decision tree path. The results from the first comparison showed that when the classifier was able to pick up the underlying pattern in the model, the LIME algorithm was the most accurate at selecting the underlying ground truth features. The second test returned mixed results with some instances in which the XAI algorithms were able to accurately return the features used to produce predictions, however this result was not consistent.
2022-11-18
Sun, Xiaohan, Cheng, Yunchang, Qu, Xiaojie, Li, Hang.  2021.  Design and Implementation of Security Test Pipeline based on DevSecOps. 2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC). 4:532—535.
In recent years, a variety of information security incidents emerge in endlessly, with different types. Security vulnerability is an important factor leading to the security risk of information system, and is the most common and urgent security risk in information system. The research goal of this paper is to seamlessly integrate the security testing process and the integration process of software construction, deployment, operation and maintenance. Through the management platform, the security testing results are uniformly managed and displayed in reports, and the project management system is introduced to develop, regress and manage the closed-loop security vulnerabilities. Before the security vulnerabilities cause irreparable damage to the information system, the security vulnerabilities are found and analyzed Full vulnerability, the formation of security vulnerability solutions to minimize the threat of security vulnerabilities to the information system.
2022-09-09
Weaver, Gabriel A..  2021.  A Data Processing Pipeline For Cyber-Physical Risk Assessments Of Municipal Supply Chains. 2021 Winter Simulation Conference (WSC). :1—12.
Smart city technologies promise reduced congestion by optimizing transportation movements. Increased connectivity, however, may increase the attack surface of a municipality's critical functions. Increased supply chain attacks (up nearly 80 % in 2019) and municipal ransomware attacks (up 60 % in 2019) motivate the need for holistic approaches to risk assessment. Therefore, we present a methodology to quantify the degree to which supply-chain movements may be observed or disrupted via compromised smart-city devices. Our data-processing pipeline uses publicly available datasets to model intermodal commodity flows within and surrounding a municipality. Using a hierarchy tree to adaptively sample spatial networks within geographic regions of interest, we bridge the gap between grid- and network-based risk assessment frameworks. Results based on fieldwork for the Jack Voltaic exercises sponsored by the Army Cyber Institute demonstrate our approach on intermodal movements through Charleston, SC and San Diego, CA.
2022-08-26
Rangnau, Thorsten, Buijtenen, Remco v., Fransen, Frank, Turkmen, Fatih.  2020.  Continuous Security Testing: A Case Study on Integrating Dynamic Security Testing Tools in CI/CD Pipelines. 2020 IEEE 24th International Enterprise Distributed Object Computing Conference (EDOC). :145–154.
Continuous Integration (CI) and Continuous Delivery (CD) have become a well-known practice in DevOps to ensure fast delivery of new features. This is achieved by automatically testing and releasing new software versions, e.g. multiple times per day. However, classical security management techniques cannot keep up with this quick Software Development Life Cycle (SDLC). Nonetheless, guaranteeing high security quality of software systems has become increasingly important. The new trend of DevSecOps aims to integrate security techniques into existing DevOps practices. Especially, the automation of security testing is an important area of research in this trend. Although plenty of literature discusses security testing and CI/CD practices, only a few deal with both topics together. Additionally, most of the existing works cover only static code analysis and neglect dynamic testing methods. In this paper, we present an approach to integrate three automated dynamic testing techniques into a CI/CD pipeline and provide an empirical analysis of the introduced overhead. We then go on to identify unique research/technology challenges the DevSecOps communities will face and propose preliminary solutions to these challenges. Our findings will enable informed decisions when employing DevSecOps practices in agile enterprise applications engineering processes and enterprise security.
2022-08-12
Aguinaldo, Roberto Daniel, Solano, Geoffrey, Pontiveros, Marc Jermaine, Balolong, Marilen Parungao.  2021.  NAMData: A Web-application for the Network Analysis of Microbiome Data. TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON). :341–346.
Recent projects regarding the exploration of the functions of microbiomes within communities brought about a plethora of new data. That specific field of study is called Metagenomics and one of its more advancing approach is the application of network analysis. The paper introduces NAMData which is a web-application tool for the network analysis of microbiome data. The system handles the compositionality and sparsity nature of microbiome data by applying taxa filtration, normalization, and zero treatment. Furthermore, compositionally aware correlation estimators were used to compute for the correlation between taxa and the system divides the network into the positive and negative correlation network. NAMData aims to capitalize on the unique network features namely network visualization, centrality scores, and community detection. The system enables researchers to include network analysis in their analysis pipelines even without any knowledge of programming. Biological concepts can be integrated with the network findings gathered from the system to either support existing facts or form new insights.
2022-07-28
[Anonymous].  2021.  An Automated Pipeline for Privacy Leak Analysis of Android Applications. 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). :1048—1050.
We propose an automated pipeline for analyzing privacy leaks in Android applications. By using a combination of dynamic and static analysis, we validate the results from each other to improve accuracy. Compare to the state-of-the-art approaches, we not only capture the network traffic for analysis, but also look into the data flows inside the application. We particularly focus on the privacy leakage caused by third-party services and high-risk permissions. The proposed automated approach will combine taint analysis, permission analysis, network traffic analysis, and dynamic function tracing during run-time to identify private information leaks. We further implement an automatic validation and complementation process to reduce false positives. A small-scale experiment has been conducted on 30 Android applications and a large-scale experiment on more than 10,000 Android applications is in progress.
2022-07-15
Figueiredo, Cainã, Lopes, João Gabriel, Azevedo, Rodrigo, Zaverucha, Gerson, Menasché, Daniel Sadoc, Pfleger de Aguiar, Leandro.  2021.  Software Vulnerabilities, Products and Exploits: A Statistical Relational Learning Approach. 2021 IEEE International Conference on Cyber Security and Resilience (CSR). :41—46.
Data on software vulnerabilities, products and exploits is typically collected from multiple non-structured sources. Valuable information, e.g., on which products are affected by which exploits, is conveyed by matching data from those sources, i.e., through their relations. In this paper, we leverage this simple albeit unexplored observation to introduce a statistical relational learning (SRL) approach for the analysis of vulnerabilities, products and exploits. In particular, we focus on the problem of determining the existence of an exploit for a given product, given information about the relations between products and vulnerabilities, and vulnerabilities and exploits, focusing on Industrial Control Systems (ICS), the National Vulnerability Database and ExploitDB. Using RDN-Boost, we were able to reach an AUC ROC of 0.83 and an AUC PR of 0.69 for the problem at hand. To reach that performance, we indicate that it is instrumental to include textual features, e.g., extracted from the description of vulnerabilities, as well as structured information, e.g., about product categories. In addition, using interpretable relational regression trees we report simple rules that shed insight on factors impacting the weaponization of ICS products.
2022-07-01
Matri, Pierre, Ross, Robert.  2021.  Neon: Low-Latency Streaming Pipelines for HPC. 2021 IEEE 14th International Conference on Cloud Computing (CLOUD). :698—707.
Real time data analysis in the context of e.g. realtime monitoring or computational steering is an important tool in many fields of science, allowing scientists to make the best use of limited resources such as sensors and HPC platforms. These tools typically rely on large amounts of continuously collected data that needs to be processed in near-real time to avoid wasting compute, storage, and networking resources. Streaming pipelines are a natural fit for this use case but are inconvenient to use on high-performance computing (HPC) systems because of the diverging system software environment with big data, increasing both the cost and the complexity of the solution. In this paper we propose Neon, a clean-slate design of a streaming data processing framework for HPC systems that enables users to create arbitrarily large streaming pipelines. The experimental results on the Bebop supercomputer show significant performance improvements compared with Apache Storm, with up to 2x increased throughput and reduced latency.
2022-06-10
Yang, Jing, Vega-Oliveros, Didier, Seibt, Tais, Rocha, Anderson.  2021.  Scalable Fact-checking with Human-in-the-Loop. 2021 IEEE International Workshop on Information Forensics and Security (WIFS). :1–6.
Researchers have been investigating automated solutions for fact-checking in various fronts. However, current approaches often overlook the fact that information released every day is escalating, and a large amount of them overlap. Intending to accelerate fact-checking, we bridge this gap by proposing a new pipeline – grouping similar messages and summarizing them into aggregated claims. Specifically, we first clean a set of social media posts (e.g., tweets) and build a graph of all posts based on their semantics; Then, we perform two clustering methods to group the messages for further claim summarization. We evaluate the summaries both quantitatively with ROUGE scores and qualitatively with human evaluation. We also generate a graph of summaries to verify that there is no significant overlap among them. The results reduced 28,818 original messages to 700 summary claims, showing the potential to speed up the fact-checking process by organizing and selecting representative claims from massive disorganized and redundant messages.
2022-06-09
Gupta, Ragini, Nahrstedt, Klara, Suri, Niranjan, Smith, Jeffrey.  2021.  SVAD: End-to-End Sensory Data Analysis for IoBT-Driven Platforms. 2021 IEEE 7th World Forum on Internet of Things (WF-IoT). :903–908.
The rapid advancement of IoT technologies has led to its flexible adoption in battle field networks, known as Internet of Battlefield Things (IoBT) networks. One important application of IoBT networks is the weather sensory network characterized with a variety of weather, land and environmental sensors. This data contains hidden trends and correlations, needed to provide situational awareness to soldiers and commanders. To interpret the incoming data in real-time, machine learning algorithms are required to automate strategic decision-making. Existing solutions are not well-equipped to provide the fine-grained feedback to military personnel and cannot facilitate a scalable, end-to-end platform for fast unlabeled data collection, cleaning, querying, analysis and threats identification. In this work, we present a scalable end-to-end IoBT data driven platform for SVAD (Storage, Visualization, Anomaly Detection) analysis of heterogeneous weather sensor data. Our SVAD platform includes extensive data cleaning techniques to denoise efficiently data to differentiate data from anomalies and noise data instances. We perform comparative analysis of unsupervised machine learning algorithms for multi-variant data analysis and experimental evaluation of different data ingestion pipelines to show the ability of the SVAD platform for (near) real-time processing. Our results indicate impending turbulent weather conditions that can be detected by early anomaly identification and detection techniques.
2022-05-20
Sion, Laurens, Van Landuyt, Dimitri, Yskout, Koen, Verreydt, Stef, Joosen, Wouter.  2021.  Automated Threat Analysis and Management in a Continuous Integration Pipeline. 2021 IEEE Secure Development Conference (SecDev). :30–37.
Security and privacy threat modeling is commonly applied to systematically identify and address design-level security and privacy concerns in the early stages of architecture and design. Identifying and resolving these threats should remain a continuous concern during the development lifecycle. Especially with contemporary agile development practices, a single-shot upfront analysis becomes quickly outdated. Despite it being explicitly recommended by experts, existing threat modeling approaches focus largely on early development phases and provide limited support during later implementation phases.In this paper, we present an integrated threat analysis toolchain to support automated, continuous threat elicitation, assessment, and mitigation as part of a continuous integration pipeline in the GitLab DevOps platform. This type of automation allows for continuous attention to security and privacy threats during development at the level of individual commits, supports monitoring and managing the progress in addressing security and privacy threats over time, and enables more advanced and fine-grained analyses such as assessing the impact of proposed changes in different code branches or merge/pull requests by analyzing the changes to the threat model.
2022-05-19
Fareed, Samsad Beagum Sheik.  2021.  API Pipeline for Visualising Text Analytics Features of Twitter Texts. 2021 International Conference of Women in Data Science at Taif University (WiDSTaif ). :1–6.
Twitter text analysis is quite useful in analysing emotions, sentiments and feedbacks of consumers on products and services. This helps the service providers and the manufacturers to improve their products and services, address serious issues before they lead to a crisis and improve business acumen. Twitter texts also form a data source for various research studies. They are used in topic analysis, sentiment analysis, content analysis and thematic analysis. In this paper, we present a pipeline for searching, analysing and visualizing the text analytics features of twitter texts using web APIs. It allows to build a simple yet powerful twitter text analytics tool for researchers and other interested users.
Kuilboer, Jean-Pierre, Stull, Tristan.  2021.  Text Analytics and Big Data in the Financial domain. 2021 16th Iberian Conference on Information Systems and Technologies (CISTI). :1–4.
This research attempts to provide some insights on the application of text mining and Natural Language Processing (NLP). The application domain is consumer complaints about financial institutions in the USA. As an advanced analytics discipline embedded within the Big Data paradigm, the practice of text analytics contains elements of emergent knowledge processes. Since our experiment should be able to scale up we make use of a pipeline based on Spark-NLP. The usage scenario is adapting the model to a specific industrial context and using the dataset offered by the "Consumer Financial Protection Bureau" to illustrate the application.
2022-05-05
Singh, Praneet, P, Jishnu Jaykumar, Pankaj, Akhil, Mitra, Reshmi.  2021.  Edge-Detect: Edge-Centric Network Intrusion Detection using Deep Neural Network. 2021 IEEE 18th Annual Consumer Communications Networking Conference (CCNC). :1—6.
Edge nodes are crucial for detection against multitudes of cyber attacks on Internet-of-Things endpoints and is set to become part of a multi-billion industry. The resource constraints in this novel network infrastructure tier constricts the deployment of existing Network Intrusion Detection System with Deep Learning models (DLM). We address this issue by developing a novel light, fast and accurate `Edge-Detect' model, which detects Distributed Denial of Service attack on edge nodes using DLM techniques. Our model can work within resource restrictions i.e. low power, memory and processing capabilities, to produce accurate results at a meaningful pace. It is built by creating layers of Long Short-Term Memory or Gated Recurrent Unit based cells, which are known for their excellent representation of sequential data. We designed a practical data science pipeline with Recurring Neural Network to learn from the network packet behavior in order to identify whether it is normal or attack-oriented. The model evaluation is from deployment on actual edge node represented by Raspberry Pi using current cybersecurity dataset (UNSW2015). Our results demonstrate that in comparison to conventional DLM techniques, our model maintains a high testing accuracy of 99% even with lower resource utilization in terms of cpu and memory. In addition, it is nearly 3 times smaller in size than the state-of-art model and yet requires a much lower testing time.