Visible to the public Biblio

Filters: Keyword is metadata  [Clear All Filters]
Conference Paper
Shang, Siyuan, Zhou, Aoyang, Tan, Ming, Wang, Xiaohan, Liu, Aodi.  2022.  Access Control Audit and Traceability Forensics Technology Based on Blockchain. 2022 4th International Conference on Frontiers Technology of Information and Computer (ICFTIC). :932—937.
Access control includes authorization of security administrators and access of users. Aiming at the problems of log information storage difficulty and easy tampering faced by auditing and traceability forensics of authorization and access in cross-domain scenarios, we propose an access control auditing and traceability forensics method based on Blockchain, whose core is Ethereum Blockchain and IPFS interstellar mail system, and its main function is to store access control log information and trace forensics. Due to the technical characteristics of blockchain, such as openness, transparency and collective maintenance, the log information metadata storage based on Blockchain meets the requirements of distribution and trustworthiness, and the exit of any node will not affect the operation of the whole system. At the same time, by storing log information in the blockchain structure and using mapping, it is easy to locate suspicious authorization or judgment that lead to permission leakage, so that security administrators can quickly grasp the causes of permission leakage. Using this distributed storage structure for security audit has stronger anti-attack and anti-risk.
Hasegawa, Taichi, Saito, Taiichi, Sasaki, Ryoichi.  2022.  Analyzing Metadata in PDF Files Published by Police Agencies in Japan. 2022 IEEE 22nd International Conference on Software Quality, Reliability, and Security Companion (QRS-C). :145–151.
In recent years, new types of cyber attacks called targeted attacks have been observed. It targets specific organizations or individuals, while usual large-scale attacks do not focus on specific targets. Organizations have published many Word or PDF files on their websites. These files may provide the starting point for targeted attacks if they include hidden data unintentionally generated in the authoring process. Adhatarao and Lauradoux analyzed hidden data found in the PDF files published by security agencies in many countries and showed that many PDF files potentially leak information like author names, details on the information system and computer architecture. In this study, we analyze hidden data of PDF files published on the website of police agencies in Japan and compare the results with Adhatarao and Lauradoux's. We gathered 110989 PDF files. 56% of gathered PDF files contain personal names, organization names, usernames, or numbers that seem to be IDs within the organizations. 96% of PDF files contain software names.
ISSN: 2693-9371
V. Waghmare, K. Gojre, A. Watpade.  2015.  "Approach to Enhancing Concurrent and Self-Reliant Access to Cloud Database: A Review". 2015 International Conference on Computational Intelligence and Communication Networks (CICN). :777-781.

Now a day's cloud computing is power station to run multiple businesses. It is cumulating more and more users every day. Database-as-a-service is service model provided by cloud computing to store, manage and process data on a cloud platform. Database-as-a-service has key characteristics such as availability, scalability, elasticity. A customer does not have to worry about database installation and management. As a replacement, the cloud database service provider takes responsibility for installing and maintaining the database. The real problem occurs when it comes to storing confidential or private information in the cloud database, we cannot rely on the cloud data vendor. A curious cloud database vendor may capture and leak the secret information. For that purpose, Protected Database-as-a-service is a novel solution to this problem that provides provable and pragmatic privacy in the face of a compromised cloud database service provider. Protected Database-as-a-service defines various encryption schemes to choose encryption algorithm and encryption key to encrypt and decrypt data. It also provides "Master key" to users, so that a metadata storage table can be decrypted only by using the master key of the users. As a result, a cloud service vendor never gets access to decrypted data, and even if all servers are jeopardized, in such inauspicious circumstances a cloud service vendor will not be able to decrypt the data. Proposed Protected Database-as-a-service system allows multiple geographically distributed clients to execute concurrent and independent operation on encrypted data and also conserve data confidentiality and consistency at cloud level, to eradicate any intermediate server between the client and the cloud database.

Suwansrikham, P., She, K..  2018.  Asymmetric Secure Storage Scheme for Big Data on Multiple Cloud Providers. 2018 IEEE 4th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS). :121-125.

Recently, cloud computing is an emerging technology along with big data. Both technologies come together. Due to the enormous size of data in big data, it is impossible to store them in local storage. Alternatively, even we want to store them locally, we have to spend much money to create bit data center. One way to save money is store big data in cloud storage service. Cloud storage service provides users space and security to store the file. However, relying on single cloud storage may cause trouble for the customer. CSP may stop its service anytime. It is too risky if data owner hosts his file only single CSP. Also, the CSP is the third party that user have to trust without verification. After deploying his file to CSP, the user does not know who access his file. Even CSP provides a security mechanism to prevent outsider attack. However, how user ensure that there is no insider attack to steal or corrupt the file. This research proposes the way to minimize the risk, ensure data privacy, also accessing control. The big data file is split into chunks and distributed to multiple cloud storage provider. Even there is insider attack; the attacker gets only part of the file. He cannot reconstruct the whole file. After splitting the file, metadata is generated. Metadata is a place to keep chunk information, includes, chunk locations, access path, username and password of data owner to connect each CSP. Asymmetric security concept is applied to this research. The metadata will be encrypted and transfer to the user who requests to access the file. The file accessing, monitoring, metadata transferring is functions of dew computing which is an intermediate server between the users and cloud service.

Sommers, Joel, Durairajan, Ramakrishnan, Barford, Paul.  2017.  Automatic Metadata Generation for Active Measurement. Proceedings of the 2017 Internet Measurement Conference. :261–267.

Empirical research in the Internet is fraught with challenges. Among these is the possibility that local environmental conditions (e.g., CPU load or network load) introduce unexpected bias or artifacts in measurements that lead to erroneous conclusions. In this paper, we describe a framework for local environment monitoring that is designed to be used during Internet measurement experiments. The goals of our work are to provide a critical, expanded perspective on measurement results and to improve the opportunity for reproducibility of results. We instantiate our framework in a tool we call SoMeta, which monitors the local environment during active probe-based measurement experiments. We evaluate the runtime costs of SoMeta and conduct a series of experiments in which we intentionally perturb different aspects of the local environment during active probe-based measurements. Our experiments show how simple local monitoring can readily expose conditions that bias active probe-based measurement results. We conclude with a discussion of how our framework can be expanded to provide metadata for a broad range of Internet measurement experiments.

Luo, S., Wang, Y., Huang, W., Yu, H..  2016.  Backup and Disaster Recovery System for HDFS. 2016 International Conference on Information Science and Security (ICISS). :1–4.

HDFS has been widely used for storing massive scale data which is vulnerable to site disaster. The file system backup is an important strategy for data retention. In this paper, we present an efficient, easy- to-use Backup and Disaster Recovery System for HDFS. The system includes a client based on HDFS with additional feature of remote backup, and a remote server with a HDFS cluster to keep the backup data. It supports full backup and regularly incremental backup to the server with very low cost and high throughout. In our experiment, the average speed of backup and recovery is up to 95 MB/s, approaching the theoretical maximum speed of gigabit Ethernet.

Bharati, Aparna, Moreira, Daniel, Brogan, Joel, Hale, Patricia, Bowyer, Kevin, Flynn, Patrick, Rocha, Anderson, Scheirer, Walter.  2019.  Beyond Pixels: Image Provenance Analysis Leveraging Metadata. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). :1692–1702.
Creative works, whether paintings or memes, follow unique journeys that result in their final form. Understanding these journeys, a process known as "provenance analysis," provides rich insights into the use, motivation, and authenticity underlying any given work. The application of this type of study to the expanse of unregulated content on the Internet is what we consider in this paper. Provenance analysis provides a snapshot of the chronology and validity of content as it is uploaded, re-uploaded, and modified over time. Although still in its infancy, automated provenance analysis for online multimedia is already being applied to different types of content. Most current works seek to build provenance graphs based on the shared content between images or videos. This can be a computationally expensive task, especially when considering the vast influx of content that the Internet sees every day. Utilizing non-content-based information, such as timestamps, geotags, and camera IDs can help provide important insights into the path a particular image or video has traveled during its time on the Internet without large computational overhead. This paper tests the scope and applicability of metadata-based inferences for provenance graph construction in two different scenarios: digital image forensics and cultural analytics.
Chen, W., Liang, X., Li, J., Qin, H., Mu, Y., Wang, J..  2018.  Blockchain Based Provenance Sharing of Scientific Workflows. 2018 IEEE International Conference on Big Data (Big Data). :3814–3820.
In a research community, the provenance sharing of scientific workflows can enhance distributed research cooperation, experiment reproducibility verification and experiment repeatedly doing. Considering that scientists in such a community are often in a loose relation and distributed geographically, traditional centralized provenance sharing architectures have shown their disadvantages in poor trustworthiness, reliabilities and efficiency. Additionally, they are also difficult to protect the rights and interests of data providers. All these have been largely hindering the willings of distributed scientists to share their workflow provenance. Considering the big advantages of blockchain in decentralization, trustworthiness and high reliability, an approach to sharing scientific workflow provenance based on blockchain in a research community is proposed. To make the approach more practical, provenance is handled on-chain and original data is delivered off-chain. A kind of block structure to support efficient provenance storing and retrieving is designed, and an algorithm for scientists to search workflow segments from provenance as well as an algorithm for experiments backtracking are provided to enhance the experiment result sharing, save computing resource and time cost by avoiding repeated experiments as far as possible. Analyses show that the approach is efficient and effective.
Singi, Kapil, Kaulgud, Vikrant, Bose, R.P. Jagadeesh Chandra, Podder, Sanjay.  2019.  CAG: Compliance Adherence and Governance in Software Delivery Using Blockchain. 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB). :32—39.

The software development life cycle (SDLC) starts with business and functional specifications signed with a client. In addition to this, the specifications also capture policy / procedure / contractual / regulatory / legislation / standard compliances with respect to a given client industry. The SDLC must adhere to service level agreements (SLAs) while being compliant to development activities, processes, tools, frameworks, and reuse of open-source software components. In today's world, global software development happens across geographically distributed (autonomous) teams consuming extraordinary amounts of open source components drawn from a variety of disparate sources. Although this is helping organizations deal with technical and economic challenges, it is also increasing unintended risks, e.g., use of a non-complaint license software might lead to copyright issues and litigations, use of a library with vulnerabilities pose security risks etc. Mitigation of such risks and remedial measures is a challenge due to lack of visibility and transparency of activities across these distributed teams as they mostly operate in silos. We believe a unified model that non-invasively monitors and analyzes the activities of distributed teams will help a long way in building software that adhere to various compliances. In this paper, we propose a decentralized CAG - Compliance Adherence and Governance framework using blockchain technologies. Our framework (i) enables the capturing of required data points based on compliance specifications, (ii) analyzes the events for non-conformant behavior through smart contracts, (iii) provides real-time alerts, and (iv) records and maintains an immutable audit trail of various activities.

Tabiban, Azadeh, Jarraya, Yosr, Zhang, Mengyuan, Pourzandi, Makan, Wang, Lingyu, Debbabi, Mourad.  2020.  Catching Falling Dominoes: Cloud Management-Level Provenance Analysis with Application to OpenStack. 2020 IEEE Conference on Communications and Network Security (CNS). :1—9.

The dynamicity and complexity of clouds highlight the importance of automated root cause analysis solutions for explaining what might have caused a security incident. Most existing works focus on either locating malfunctioning clouds components, e.g., switches, or tracing changes at lower abstraction levels, e.g., system calls. On the other hand, a management-level solution can provide a big picture about the root cause in a more scalable manner. In this paper, we propose DOMINOCATCHER, a novel provenance-based solution for explaining the root cause of security incidents in terms of management operations in clouds. Specifically, we first define our provenance model to capture the interdependencies between cloud management operations, virtual resources and inputs. Based on this model, we design a framework to intercept cloud management operations and to extract and prune provenance metadata. We implement DOMINOCATCHER on OpenStack platform as an attached middleware and validate its effectiveness using security incidents based on real-world attacks. We also evaluate the performance through experiments on our testbed, and the results demonstrate that DOMINOCATCHER incurs insignificant overhead and is scalable for clouds.

Liang, Xiubo, Guo, Ningxiang, Hong, Chaoqun.  2022.  A Certificate Authority Scheme Based on Trust Ring for Consortium Nodes. 2022 International Conference on High Performance Big Data and Intelligent Systems (HDIS). :90–94.
The access control mechanism of most consortium blockchain is implemented through traditional Certificate Authority scheme based on trust chain and centralized key management such as PKI/CA at present. However, the uneven power distribution of CA nodes may cause problems with leakage of certificate keys, illegal issuance of certificates, malicious rejection of certificates issuance, manipulation of issuance logs and metadata, it could compromise the security and dependability of consortium blockchain. Therefore, this paper design and implement a Certificate Authority scheme based on trust ring model that can not only enhance the reliability of consortium blockchain, but also ensure high performance. Combined public key, transformation matrix and elliptic curve cryptography are applied to the scheme to generate and store keys in a cluster of CA nodes dispersedly and securely for consortium nodes. It greatly reduced the possibility of malicious behavior and key leakage. To achieve the immutability of logs and metadata, the scheme also utilized public blockchain and smart contract technology to organize the whole procedure of certificate issuance, the issuance logs and metadata for certificate validation are stored in public blockchain. Experimental results showed that the scheme can surmount the disadvantages of the traditional scheme while maintaining sufficiently good performance, including issuance speed and storage efficiency of certificates.
Kimmig, A., Memory, A., Miller, R. J., Getoor, L..  2017.  A Collective, Probabilistic Approach to Schema Mapping. 2017 IEEE 33rd International Conference on Data Engineering (ICDE). :921–932.

We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.

Nadgowda, S., Duri, S., Isci, C., Mann, V..  2017.  Columbus: Filesystem Tree Introspection for Software Discovery. 2017 IEEE International Conference on Cloud Engineering (IC2E). :67–74.

Software discovery is a key management function to ensure that systems are free of vulnerabilities, comply with licensing requirements, and support advanced search for systems containing given software. Today, software is predominantly discovered through querying package management tools, or using rules that check for file metadata or contents. These approaches are inadequate as not every software is installed through package managers, and agile development practices lead to frequent deployment of software. Other approaches to software discovery use machine learning methods requiring training phase, or require maintaining knowledge bases. Columbus uses the knowledge of the software packaging practices that evolved over time, and uses the information embedded in the file system impression created by a software package to discover it. Columbus is able to discover software in 92% of all official Docker images. Further, Columbus can be used in problem diagnosis and drift detection situations to compare two different systems, or to determine the evolution of a system overtime.

Ezick, James, Henretty, Tom, Baskaran, Muthu, Lethin, Richard, Feo, John, Tuan, Tai-Ching, Coley, Christopher, Leonard, Leslie, Agrawal, Rajeev, Parsons, Ben et al..  2019.  Combining Tensor Decompositions and Graph Analytics to Provide Cyber Situational Awareness at HPC Scale. 2019 IEEE High Performance Extreme Computing Conference (HPEC). :1–7.

This paper describes MADHAT (Multidimensional Anomaly Detection fusing HPC, Analytics, and Tensors), an integrated workflow that demonstrates the applicability of HPC resources to the problem of maintaining cyber situational awareness. MADHAT combines two high-performance packages: ENSIGN for large-scale sparse tensor decompositions and HAGGLE for graph analytics. Tensor decompositions isolate coherent patterns of network behavior in ways that common clustering methods based on distance metrics cannot. Parallelized graph analysis then uses directed queries on a representation that combines the elements of identified patterns with other available information (such as additional log fields, domain knowledge, network topology, whitelists and blacklists, prior feedback, and published alerts) to confirm or reject a threat hypothesis, collect context, and raise alerts. MADHAT was developed using the collaborative HPC Architecture for Cyber Situational Awareness (HACSAW) research environment and evaluated on structured network sensor logs collected from Defense Research and Engineering Network (DREN) sites using HPC resources at the U.S. Army Engineer Research and Development Center DoD Supercomputing Resource Center (ERDC DSRC). To date, MADHAT has analyzed logs with over 650 million entries.

Koo, H., Chen, Y., Lu, L., Kemerlis, V. P., Polychronakis, M..  2018.  Compiler-Assisted Code Randomization. 2018 IEEE Symposium on Security and Privacy (SP). :461–477.

Despite decades of research on software diversification, only address space layout randomization has seen widespread adoption. Code randomization, an effective defense against return-oriented programming exploits, has remained an academic exercise mainly due to i) the lack of a transparent and streamlined deployment model that does not disrupt existing software distribution norms, and ii) the inherent incompatibility of program variants with error reporting, whitelisting, patching, and other operations that rely on code uniformity. In this work we present compiler-assisted code randomization (CCR), a hybrid approach that relies on compiler-rewriter cooperation to enable fast and robust fine-grained code randomization on end-user systems, while maintaining compatibility with existing software distribution models. The main concept behind CCR is to augment binaries with a minimal set of transformation-assisting metadata, which i) facilitate rapid fine-grained code transformation at installation or load time, and ii) form the basis for reversing any applied code transformation when needed, to maintain compatibility with existing mechanisms that rely on referencing the original code. We have implemented a prototype of this approach by extending the LLVM compiler toolchain, and developing a simple binary rewriter that leverages the embedded metadata to generate randomized variants using basic block reordering. The results of our experimental evaluation demonstrate the feasibility and practicality of CCR, as on average it incurs a modest file size increase of 11.46% and a negligible runtime overhead of 0.28%, while it is compatible with link-time optimization and control flow integrity.

Perkins, J., Eikenberry, J., Coglio, A., Rinard, M..  2020.  Comprehensive Java Metadata Tracking for Attack Detection and Repair. 2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :39—51.

We present ClearTrack, a system that tracks meta-data for each primitive value in Java programs to detect and nullify a range of vulnerabilities such as integer overflow/underflow and SQL/command injection vulnerabilities. Contributions include new techniques for eliminating false positives associated with benign integer overflows and underflows, new metadata-aware techniques for detecting and nullifying SQL/command command injection attacks, and results from an independent evaluation team. These results show that 1) ClearTrack operates successfully on Java programs comprising hundreds of thousands of lines of code (including instrumented jar files and Java system libraries, the majority of the applications comprise over 3 million lines of code), 2) because of computations such as cryptography and hash table calculations, these applications perform millions of benign integer overflows and underflows, and 3) ClearTrack successfully detects and nullifies all tested integer overflow and underflow and SQL/command injection vulnerabilities in the benchmark applications.

Paiker, N., Ding, X., Curtmola, R., Borcea, C..  2018.  Context-Aware File Discovery System for Distributed Mobile-Cloud Apps. 2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). :198–203.
Recent research has proposed middleware to enable efficient distributed apps over mobile-cloud platforms. This paper presents a Context-Aware File Discovery Service (CAFDS) that allows distributed mobile-cloud applications to find and access files of interest shared by collaborating users. CAFDS enables programmers to search for files defined by context and content features, such as location, creation time, or the presence of certain object types within an image file. CAFDS provides low-latency through a cloud-based metadata server, which uses a decision tree to locate the nearest files that satisfy the context and content features requested by applications. We implemented CAFDS in Android and Linux. Experimental results show CAFDS achieves substantially lower latency than peer-to-peer solutions that cannot leverage context information.
Sabek, I., Chandramouli, B., Minhas, U. F..  2019.  CRA: Enabling Data-Intensive Applications in Containerized Environments. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1762—1765.
Today, a modern data center hosts a wide variety of applications comprising batch, interactive, machine learning, and streaming applications. In this paper, we factor out the commonalities in a large majority of these applications, into a generic dataflow layer called Common Runtime for Applications (CRA). In parallel, another trend, with containerization technologies (e.g., Docker), has taken a serious hold on cloud-scale data centers, with direct implications on building next generation of data center applications. Container orchestrators (e.g., Kubernetes) have made deployment a lot easy, and they solve many infrastructure level problems, e.g., service discovery, auto-restart, and replication. For best in class performance, there is a need to marry the next generation applications with containerization technologies. To that end, CRA leverages and builds upon the containerization and resource orchestration capabilities of Kubernetes/Docker, and makes it easy to build a wide range of cloud-edge applications on top. To the best of our knowledge, we are the first to present a cloud native runtime for building data center applications. We show the efficiency of CRA through various micro-benchmarking experiments.
Patil, A., Jha, A., Mulla, M. M., Narayan, D. G., Kengond, S..  2020.  Data Provenance Assurance for Cloud Storage Using Blockchain. 2020 International Conference on Advances in Computing, Communication Materials (ICACCM). :443—448.

Cloud forensics investigates the crime committed over cloud infrastructures like SLA-violations and storage privacy. Cloud storage forensics is the process of recording the history of the creation and operations performed on a cloud data object and investing it. Secure data provenance in the Cloud is crucial for data accountability, forensics, and privacy. Towards this, we present a Cloud-based data provenance framework using Blockchain, which traces data record operations and generates provenance data. Initially, we design a dropbox like application using AWS S3 storage. The application creates a cloud storage application for the students and faculty of the university, thereby making the storage and sharing of work and resources efficient. Later, we design a data provenance mechanism for confidential files of users using Ethereum blockchain. We also evaluate the proposed system using performance parameters like query and transaction latency by varying the load and number of nodes of the blockchain network.

Yazici, I. M., Karabulut, E., Aktas, M. S..  2018.  A Data Provenance Visualization Approach. 2018 14th International Conference on Semantics, Knowledge and Grids (SKG). :84–91.
Data Provenance has created an emerging requirement for technologies that enable end users to access, evaluate, and act on the provenance of data in recent years. In the era of Big Data, the amount of data created by corporations around the world has grown each year. As an example, both in the Social Media and e-Science domains, data is growing at an unprecedented rate. As the data has grown rapidly, information on the origin and lifecycle of the data has also grown. In turn, this requires technologies that enable the clarification and interpretation of data through the use of data provenance. This study proposes methodologies towards the visualization of W3C-PROV-O Specification compatible provenance data. The visualizations are done by summarization and comparison of the data provenance. We facilitated the testing of these methodologies by providing a prototype, extending an existing open source visualization tool. We discuss the usability of the proposed methodologies with an experimental study; our initial results show that the proposed approach is usable, and its processing overhead is negligible.
Bogatu, Alex, Fernandes, Alvaro A. A., Paton, Norman W., Konstantinou, Nikolaos.  2020.  Dataset Discovery in Data Lakes. 2020 IEEE 36th International Conference on Data Engineering (ICDE). :709—720.
Data analytics stands to benefit from the increasing availability of datasets that are held without their conceptual relationships being explicitly known. When collected, these datasets form a data lake from which, by processes like data wrangling, specific target datasets can be constructed that enable value- adding analytics. Given the potential vastness of such data lakes, the issue arises of how to pull out of the lake those datasets that might contribute to wrangling out a given target. We refer to this as the problem of dataset discovery in data lakes and this paper contributes an effective and efficient solution to it. Our approach uses features of the values in a dataset to construct hash- based indexes that map those features into a uniform distance space. This makes it possible to define similarity distances between features and to take those distances as measurements of relatedness w.r.t. a target table. Given the latter (and exemplar tuples), our approach returns the most related tables in the lake. We provide a detailed description of the approach and report on empirical results for two forms of relatedness (unionability and joinability) comparing them with prior work, where pertinent, and showing significant improvements in all of precision, recall, target coverage, indexing and discovery times.
Castellano, Giovanna, Vessio, Gennaro.  2021.  Deep Convolutional Embedding for Digitized Painting Clustering. 2020 25th International Conference on Pattern Recognition (ICPR). :2708–2715.
Clustering artworks is difficult for several reasons. On the one hand, recognizing meaningful patterns in accordance with domain knowledge and visual perception is extremely difficult. On the other hand, applying traditional clustering and feature reduction techniques to the highly dimensional pixel space can be ineffective. To address these issues, we propose to use a deep convolutional embedding model for digitized painting clustering, in which the task of mapping the raw input data to an abstract, latent space is jointly optimized with the task of finding a set of cluster centroids in this latent feature space. Quantitative and qualitative experimental results show the effectiveness of the proposed method. The model is also capable of outperforming other state-of-the-art deep clustering approaches to the same problem. The proposed method can be useful for several art-related tasks, in particular visual link retrieval and historical knowledge discovery in painting datasets.
Alsadhan, A. F., Alhussein, M. A..  2018.  Deleted Data Attribution in Cloud Computing Platforms. 2018 1st International Conference on Computer Applications Information Security (ICCAIS). :1–6.
The introduction of Cloud-based storage represents one of the most discussed challenges among digital forensic professionals. In a 2014 report, the National Institute of Standards and Technology (NIST) highlighted the various forensic challenges created as a consequence of sharing storage area among cloud users. One critical issue discussed in the report is how to recognize a file's owner after the file has been deleted. When a file is deleted, the cloud system also deletes the file metadata. After metadata has been deleted, no one can know who owned the file. This critical issue has introduced some difficulties in the deleted data acquisition process. For example, if a cloud user accidently deletes a file, it is difficult to recover the file. More importantly, it is even more difficult to identify the actual cloud user that owned the file. In addition, forensic investigators encounter numerous obstacles if a deleted file was to be used as evidence against a crime suspect. Unfortunately, few studies have been conducted to solve this matter. As a result, this work presents our proposed solution to the challenge of attributing deleted files to their specific users. We call this the “user signature” approach. This approach aims to enhance the deleted data acquisition process in cloud computing environments by specifically attributing files to the corresponding user.
Jaafar, Fehmi, Avellaneda, Florent, Alikacem, El-Hackemi.  2020.  Demystifying the Cyber Attribution: An Exploratory Study. 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). :35–40.
Current cyber attribution approaches proposed to use a variety of datasets and analytical techniques to distill the information that will be useful to identify cyber attackers. In contrast, practitioners and researchers in cyber attribution face several technical and regulation challenges. In this paper, we describe the main challenges of cyber attribution and present a state of the art of used approaches to face these challenges. Then, we will present an exploratory study to perform cyber attacks attribution based on pattern recognition from real data. In our study, we are using attack pattern discovery and identification based on real data collection and analysis.
Pan, J., Mao, X..  2017.  Detecting DOM-Sourced Cross-Site Scripting in Browser Extensions. 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). :24–34.

In recent years, with the advances in JavaScript engines and the adoption of HTML5 APIs, web applications begin to show a tendency to shift their functionality from the server side towards the client side, resulting in dense and complex interactions with HTML documents using the Document Object Model (DOM). As a consequence, client-side vulnerabilities become more and more prevalent. In this paper, we focus on DOM-sourced Cross-site Scripting (XSS), which is a kind of severe but not well-studied vulnerability appearing in browser extensions. Comparing with conventional DOM-based XSS, a new attack surface is introduced by DOM-sourced XSS where the DOM could become a vulnerable source as well besides common sources such as URLs and form inputs. To discover such vulnerability, we propose a detecting framework employing hybrid analysis with two phases. The first phase is the lightweight static analysis consisting of a text filter and an abstract syntax tree parser, which produces potential vulnerable candidates. The second phase is the dynamic symbolic execution with an additional component named shadow DOM, generating a document as a proof-of-concept exploit. In our large-scale real-world experiment, 58 previously unknown DOM-sourced XSS vulnerabilities were discovered in user scripts of the popular browser extension Greasemonkey.