Visible to the public Biblio

Filters: Keyword is HDFS  [Clear All Filters]
2023-07-21
Xin, Wu, Shen, Qingni, Feng, Ke, Xia, Yutang, Wu, Zhonghai, Lin, Zhenghao.  2022.  Personalized User Profiles-based Insider Threat Detection for Distributed File System. 2022 IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). :1441—1446.
In recent years, data security incidents caused by insider threats in distributed file systems have attracted the attention of academia and industry. The most common way to detect insider threats is based on user profiles. Through analysis, we realize that based on existing user profiles are not efficient enough, and there are many false positives when a stable user profile has not yet been formed. In this work, we propose personalized user profiles and design an insider threat detection framework, which can intelligently detect insider threats for securing distributed file systems in real-time. To generate personalized user profiles, we come up with a time window-based clustering algorithm and a weighted kernel density estimation algorithm. Compared with non-personalized user profiles, both the Recall and Precision of insider threat detection based on personalized user profiles have been improved, resulting in their harmonic mean F1 increased to 96.52%. Meanwhile, to reduce the false positives of insider threat detection, we put forward operation recommendations based on user similarity to predict new operations that users will produce in the future, which can reduce the false positive rate (FPR). The FPR is reduced to 1.54% and the false positive identification rate (FPIR) is as high as 92.62%. Furthermore, to mitigate the risks caused by inaccurate authorization for users, we present user tags based on operation content and permission. The experimental results show that our proposed framework can detect insider threats more effectively and precisely, with lower FPR and high FPIR.
2022-04-13
Kousar, Heena, Mulla, Mohammed Moin, Shettar, Pooja, D. G., Narayan.  2021.  DDoS Attack Detection System using Apache Spark. 2021 International Conference on Computer Communication and Informatics (ICCCI). :1—5.
Distributed Denial of Service Attacks (DDoS) are most widely used cyber-attacks. Thus, design of DDoS detection mechanisms has attracted attention of researchers. Design of these mechanisms involves building statistical and machine learning models. Most of the work in design of mechanisms is focussed on improving the accuracy of the model. However, due to large volume of network traffic, scalability and performance of these techniques is an important research issue. In this work, we use Apache Spark framework for detection of DDoS attacks. We use NSL-KDD Cup as a benchmark dataset for experimental analysis. The results reveal that random forest performs better than decision trees and distributed processing improves the performance in terms of pre-processing and training time.
2021-08-17
Jaiswal, Ayshwarya, Dwivedi, Vijay Kumar, Yadav, Om Prakash.  2020.  Big Data and its Analyzing Tools : A Perspective. 2020 6th International Conference on Advanced Computing and Communication Systems (ICACCS). :560–565.
Data are generated and stored in databases at a very high speed and hence it need to be handled and analyzed properly. Nowadays industries are extensively using Hadoop and Spark to analyze the datasets. Both the frameworks are used for increasing processing speeds in computing huge complex datasets. Many researchers are comparing both of them. Now, the big questions arising are, Is Spark a substitute for Hadoop? Is hadoop going to be replaced by spark in mere future?. Spark is “built on top of” Hadoop and it extends the model to deploy more types of computations which incorporates Stream Processing and Interactive Queries. No doubt, Spark's execution speed is much faster than Hadoop, but talking in terms of fault tolerance, hadoop is slightly more fault tolerant than spark. In this article comparison of various bigdata analytics tools are done and Hadoop and Spark are discussed in detail. This article further gives an overview of bigdata, spark and hadoop issues. In this survey paper, the approaches to resolve the issues of spark and hadoop are discussed elaborately.
2020-12-28
Marichamy, V. S., Natarajan, V..  2020.  A Study of Big Data Security on a Partitional Clustering Algorithm with Perturbation Technique. 2020 International Conference on Smart Electronics and Communication (ICOSEC). :482—486.

Partitional Clustering Algorithm (PCA) on the Hadoop Distributed File System is to perform big data securities using the Perturbation Technique is the main idea of the proposed work. There are numerous clustering methods available that are used to categorize the information from the big data. PCA discovers the cluster based on the initial partition of the data. In this approach, it is possible to develop a security safeguarding of data that is impoverished to allow the calculations and communication. The performances were analyzed on Health Care database under the studies of various parameters like precision, accuracy, and F-score measure. The outcome of the results is to demonstrate that this method is used to decrease the complication in preserving privacy and better accuracy than that of the existing techniques.

2020-12-11
Kumar, S., Vasthimal, D. K..  2019.  Raw Cardinality Information Discovery for Big Datasets. 2019 IEEE 5th Intl Conference on Big Data Security on Cloud (BigDataSecurity), IEEE Intl Conference on High Performance and Smart Computing, (HPSC) and IEEE Intl Conference on Intelligent Data and Security (IDS). :200—205.
Real-time discovery of all different types of unique attributes within unstructured data is a challenging problem to solve when dealing with multiple petabytes of unstructured data volume everyday. Popular discovery solutions such as the creation of offline jobs to uniquely identify attributes or running aggregation queries on raw data sets limits real time discovery use-cases and often results into poor resource utilization. The discovery information must be treated as a parallel problem to just storing raw data sets efficiently onto back-end big data systems. Solving the discovery problem by creating a parallel discovery data store infrastructure has multiple benefits as it allows such to channel the actual search queries against the raw data set in much more funneled manner instead of being widespread across the entire data sets. Such focused search queries and data separation are far more performant and requires less compute and memory footprint.
2017-12-28
Luo, S., Wang, Y., Huang, W., Yu, H..  2016.  Backup and Disaster Recovery System for HDFS. 2016 International Conference on Information Science and Security (ICISS). :1–4.

HDFS has been widely used for storing massive scale data which is vulnerable to site disaster. The file system backup is an important strategy for data retention. In this paper, we present an efficient, easy- to-use Backup and Disaster Recovery System for HDFS. The system includes a client based on HDFS with additional feature of remote backup, and a remote server with a HDFS cluster to keep the backup data. It supports full backup and regularly incremental backup to the server with very low cost and high throughout. In our experiment, the average speed of backup and recovery is up to 95 MB/s, approaching the theoretical maximum speed of gigabit Ethernet.

2017-03-08
Gupta, A., Mehrotra, A., Khan, P. M..  2015.  Challenges of Cloud Computing amp; Big Data Analytics. 2015 2nd International Conference on Computing for Sustainable Global Development (INDIACom). :1112–1115.

Now-a-days for most of the organizations across the globe, two important IT initiatives are: Big Data Analytics and Cloud Computing. Big Data Analytics can provide valuables insight that can create competitiveness and generate increased revenues. Cloud Computing can enhance productivity and efficiencies thus reducing cost. Cloud Computing offers groups of servers, storages and various networking resources. It enables environment of Big Data to processes voluminous, high velocity and varied formats of Big Data.

2015-05-05
Singh, S., Sharma, S..  2014.  Improving security mechanism to access HDFS data by mobile consumers using middleware-layer framework. Computing, Communication and Networking Technologies (ICCCNT), 2014 International Conference on. :1-7.

Revolution in the field of technology leads to the development of cloud computing which delivers on-demand and easy access to the large shared pools of online stored data, softwares and applications. It has changed the way of utilizing the IT resources but at the compromised cost of security breaches as well such as phishing attacks, impersonation, lack of confidentiality and integrity. Thus this research work deals with the core problem of providing absolute security to the mobile consumers of public cloud to improve the mobility of user's, accessing data stored on public cloud securely using tokens without depending upon the third party to generate them. This paper presents the approach of simplifying the process of authenticating and authorizing the mobile user's by implementing middleware-centric framework called MiLAMob model with the huge online data storage system i.e. HDFS. It allows the consumer's to access the data from HDFS via mobiles or through the social networking sites eg. facebook, gmail, yahoo etc using OAuth 2.0 protocol. For authentication, the tokens are generated using one-time password generation technique and then encrypting them using AES method. By implementing the flexible user based policies and standards, this model improves the authorization process.