Visible to the public Biblio

Filters: Keyword is distributed file system  [Clear All Filters]
2023-07-21
Xin, Wu, Shen, Qingni, Feng, Ke, Xia, Yutang, Wu, Zhonghai, Lin, Zhenghao.  2022.  Personalized User Profiles-based Insider Threat Detection for Distributed File System. 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1441—1446.
In recent years, data security incidents caused by insider threats in distributed file systems have attracted the attention of academia and industry. The most common way to detect insider threats is based on user profiles. Through analysis, we realize that based on existing user profiles are not efficient enough, and there are many false positives when a stable user profile has not yet been formed. In this work, we propose personalized user profiles and design an insider threat detection framework, which can intelligently detect insider threats for securing distributed file systems in real-time. To generate personalized user profiles, we come up with a time window-based clustering algorithm and a weighted kernel density estimation algorithm. Compared with non-personalized user profiles, both the Recall and Precision of insider threat detection based on personalized user profiles have been improved, resulting in their harmonic mean F1 increased to 96.52%. Meanwhile, to reduce the false positives of insider threat detection, we put forward operation recommendations based on user similarity to predict new operations that users will produce in the future, which can reduce the false positive rate (FPR). The FPR is reduced to 1.54% and the false positive identification rate (FPIR) is as high as 92.62%. Furthermore, to mitigate the risks caused by inaccurate authorization for users, we present user tags based on operation content and permission. The experimental results show that our proposed framework can detect insider threats more effectively and precisely, with lower FPR and high FPIR.
2021-03-29
Ouiazzane, S., Addou, M., Barramou, F..  2020.  Toward a Network Intrusion Detection System for Geographic Data. 2020 IEEE International conference of Moroccan Geomatics (Morgeo). :1—7.

The objective of this paper is to propose a model of a distributed intrusion detection system based on the multi-agent paradigm and the distributed file system (HDFS). Multi-agent systems (MAS) are very suitable to intrusion detection systems as they can address the issue of geographic data security in terms of autonomy, distribution and performance. The proposed system is based on a set of autonomous agents that cooperate and collaborate with each other to effectively detect intrusions and suspicious activities that may impact geographic information systems. Our system allows the detection of known and unknown computer attacks without any human intervention (Security Experts) unlike traditional intrusion detection systems that rely on knowledge bases as a mechanism to detect known attacks. The proposed model allows a real time detection of known and unknown attacks within large networks hosting geographic data.

2020-03-30
Scherzinger, Stefanie, Seifert, Christin, Wiese, Lena.  2019.  The Best of Both Worlds: Challenges in Linking Provenance and Explainability in Distributed Machine Learning. 2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS). :1620–1629.
Machine learning experts prefer to think of their input as a single, homogeneous, and consistent data set. However, when analyzing large volumes of data, the entire data set may not be manageable on a single server, but must be stored on a distributed file system instead. Moreover, with the pressing demand to deliver explainable models, the experts may no longer focus on the machine learning algorithms in isolation, but must take into account the distributed nature of the data stored, as well as the impact of any data pre-processing steps upstream in their data analysis pipeline. In this paper, we make the point that even basic transformations during data preparation can impact the model learned, and that this is exacerbated in a distributed setting. We then sketch our vision of end-to-end explainability of the model learned, taking the pre-processing into account. In particular, we point out the potentials of linking the contributions of research on data provenance with the efforts on explainability in machine learning. In doing so, we highlight pitfalls we may experience in a distributed system on the way to generating more holistic explanations for our machine learning models.
2015-05-05
Peng Li, Song Guo.  2014.  Load balancing for privacy-preserving access to big data in cloud. Computer Communications Workshops (INFOCOM WKSHPS), 2014 IEEE Conference on. :524-528.

In the era of big data, many users and companies start to move their data to cloud storage to simplify data management and reduce data maintenance cost. However, security and privacy issues become major concerns because third-party cloud service providers are not always trusty. Although data contents can be protected by encryption, the access patterns that contain important information are still exposed to clouds or malicious attackers. In this paper, we apply the ORAM algorithm to enable privacy-preserving access to big data that are deployed in distributed file systems built upon hundreds or thousands of servers in a single or multiple geo-distributed cloud sites. Since the ORAM algorithm would lead to serious access load unbalance among storage servers, we study a data placement problem to achieve a load balanced storage system with improved availability and responsiveness. Due to the NP-hardness of this problem, we propose a low-complexity algorithm that can deal with large-scale problem size with respect to big data. Extensive simulations are conducted to show that our proposed algorithm finds results close to the optimal solution, and significantly outperforms a random data placement algorithm.