Visible to the public Biblio

Filters: Keyword is metadata  [Clear All Filters]
2014
Hauger, W.K., Olivier, M.S..  2014.  The role of triggers in database forensics. Information Security for South Africa (ISSA), 2014. :1-7.

An aspect of database forensics that has not received much attention in the academic research community yet is the presence of database triggers. Database triggers and their implementations have not yet been thoroughly analysed to establish what possible impact they could have on digital forensic analysis methods and processes. Conventional database triggers are defined to perform automatic actions based on changes in the database. These changes can be on the data level or the data definition level. Digital forensic investigators might thus feel that database triggers do not have an impact on their work. They are simply interrogating the data and metadata without making any changes. This paper attempts to establish if the presence of triggers in a database could potentially disrupt, manipulate or even thwart forensic investigations. The database triggers as defined in the SQL standard were studied together with a number of database trigger implementations. This was done in order to establish what aspects might have an impact on digital forensic analysis. It is demonstrated in this paper that some of the current database forensic analysis methods are impacted by the possible presence of certain types of triggers in a database. Furthermore, it finds that the forensic interpretation and attribution processes should be extended to include the handling and analysis of database triggers if they are present in a database.
 

Khanuja, H., Suratkar, S.S..  2014.  #x201C;Role of metadata in forensic analysis of database attacks #x201C;. Advance Computing Conference (IACC), 2014 IEEE International. :457-462.

With the spectacular increase in online activities like e-transactions, security and privacy issues are at the peak with respect to their significance. Large numbers of database security breaches are occurring at a very high rate on daily basis. So, there is a crucial need in the field of database forensics to make several redundant copies of sensitive data found in database server artifacts, audit logs, cache, table storage etc. for analysis purposes. Large volume of metadata is available in database infrastructure for investigation purposes but most of the effort lies in the retrieval and analysis of that information from computing systems. Thus, in this paper we mainly focus on the significance of metadata in database forensics. We proposed a system here to perform forensics analysis of database by generating its metadata file independent of the DBMS system used. We also aim to generate the digital evidence against criminals for presenting it in the court of law in the form of who, when, why, what, how and where did the fraudulent transaction occur. Thus, we are presenting a system to detect major database attacks as well as anti-forensics attacks by developing an open source database forensics tool. Eventually, we are pointing out the challenges in the field of forensics and how these challenges can be used as opportunities to stimulate the areas of database forensics.

2015
V. Waghmare, K. Gojre, A. Watpade.  2015.  "Approach to Enhancing Concurrent and Self-Reliant Access to Cloud Database: A Review". 2015 International Conference on Computational Intelligence and Communication Networks (CICN). :777-781.

Now a day's cloud computing is power station to run multiple businesses. It is cumulating more and more users every day. Database-as-a-service is service model provided by cloud computing to store, manage and process data on a cloud platform. Database-as-a-service has key characteristics such as availability, scalability, elasticity. A customer does not have to worry about database installation and management. As a replacement, the cloud database service provider takes responsibility for installing and maintaining the database. The real problem occurs when it comes to storing confidential or private information in the cloud database, we cannot rely on the cloud data vendor. A curious cloud database vendor may capture and leak the secret information. For that purpose, Protected Database-as-a-service is a novel solution to this problem that provides provable and pragmatic privacy in the face of a compromised cloud database service provider. Protected Database-as-a-service defines various encryption schemes to choose encryption algorithm and encryption key to encrypt and decrypt data. It also provides "Master key" to users, so that a metadata storage table can be decrypted only by using the master key of the users. As a result, a cloud service vendor never gets access to decrypted data, and even if all servers are jeopardized, in such inauspicious circumstances a cloud service vendor will not be able to decrypt the data. Proposed Protected Database-as-a-service system allows multiple geographically distributed clients to execute concurrent and independent operation on encrypted data and also conserve data confidentiality and consistency at cloud level, to eradicate any intermediate server between the client and the cloud database.

2016
Luo, S., Wang, Y., Huang, W., Yu, H..  2016.  Backup and Disaster Recovery System for HDFS. 2016 International Conference on Information Science and Security (ICISS). :1–4.

HDFS has been widely used for storing massive scale data which is vulnerable to site disaster. The file system backup is an important strategy for data retention. In this paper, we present an efficient, easy- to-use Backup and Disaster Recovery System for HDFS. The system includes a client based on HDFS with additional feature of remote backup, and a remote server with a HDFS cluster to keep the backup data. It supports full backup and regularly incremental backup to the server with very low cost and high throughout. In our experiment, the average speed of backup and recovery is up to 95 MB/s, approaching the theoretical maximum speed of gigabit Ethernet.

Navuluri, Karthik, Mukkamala, Ravi, Ahmad, Aftab.  2016.  Privacy-Aware Big Data Warehouse Architecture. 2016 IEEE International Congress on Big Data (BigData Congress). :341–344.
Along with the ever increasing growth in data collection and its mining, there is an increasing fear of compromising individual and population privacy. Several techniques have been proposed in literature to preserve privacy of collected data while storing and processing. In this paper, we propose a privacy-aware architecture for storing and processing data in a Big Data warehouse. In particular, we propose a flexible, extendable, and adaptable architecture that enforces user specified privacy requirements in the form of Embedded Privacy Agreements. The paper discusses the details of the architecture with some implementation details.
2017
Ukwandu, E., Buchanan, W. J., Russell, G..  2017.  Performance Evaluation of a Fragmented Secret Share System. 2017 International Conference On Cyber Situational Awareness, Data Analytics And Assessment (Cyber SA). :1–6.
There are many risks in moving data into public storage environments, along with an increasing threat around large-scale data leakage. Secret sharing scheme has been proposed as a keyless and resilient mechanism to mitigate this, but scaling through large scale data infrastructure has remained the bane of using secret sharing scheme in big data storage and retrievals. This work applies secret sharing methods as used in cryptography to create robust and secure data storage and retrievals in conjunction with data fragmentation. It outlines two different methods of distributing data equally to storage locations as well as recovering them in such a manner that ensures consistent data availability irrespective of file size and type. Our experiments consist of two different methods - data and key shares. Using our experimental results, we were able to validate previous works on the effects of threshold on file recovery. Results obtained also revealed the varying effects of share writing to and retrieval from storage locations other than computer memory. The implication is that increase in fragment size at varying file and threshold sizes rather than add overheads to file recovery, do so on creation instead, underscoring the importance of choosing a varying fragment size as file size increases.
Jemel, M., Msahli, M., Serhrouchni, A..  2017.  Towards an Efficient File Synchronization between Digital Safes. 2017 IEEE 31st International Conference on Advanced Information Networking and Applications (AINA). :136–143.
One of the main concerns of Cloud storage solutions is to offer the availability to the end user. Thus, addressing the mobility needs and device's variety has emerged as a major challenge. At first, data should be synchronized automatically and continuously when the user moves from one equipment to another. Secondly, the Cloud service should offer to the owner the possibility to share data with specific users. The paper's goal is to develop a secure framework that ensures file synchronization with high quality and minimal resource consumption. As a first step towards this goal, we propose the SyncDS protocol with its associated architecture. The synchronization protocol efficiency raises through the choice of the used networking protocol as well as the strategy of changes detection between two versions of file systems located in different devices. Our experiment results show that adopting the Hierarchical Hash Tree to detect the changes between two file systems and adopting the WebSocket protocol for the data exchanges improve the efficiency of the synchronization protocol.
Sommers, Joel, Durairajan, Ramakrishnan, Barford, Paul.  2017.  Automatic Metadata Generation for Active Measurement. Proceedings of the 2017 Internet Measurement Conference. :261–267.

Empirical research in the Internet is fraught with challenges. Among these is the possibility that local environmental conditions (e.g., CPU load or network load) introduce unexpected bias or artifacts in measurements that lead to erroneous conclusions. In this paper, we describe a framework for local environment monitoring that is designed to be used during Internet measurement experiments. The goals of our work are to provide a critical, expanded perspective on measurement results and to improve the opportunity for reproducibility of results. We instantiate our framework in a tool we call SoMeta, which monitors the local environment during active probe-based measurement experiments. We evaluate the runtime costs of SoMeta and conduct a series of experiments in which we intentionally perturb different aspects of the local environment during active probe-based measurements. Our experiments show how simple local monitoring can readily expose conditions that bias active probe-based measurement results. We conclude with a discussion of how our framework can be expanded to provide metadata for a broad range of Internet measurement experiments.

Kimmig, A., Memory, A., Miller, R. J., Getoor, L..  2017.  A Collective, Probabilistic Approach to Schema Mapping. 2017 IEEE 33rd International Conference on Data Engineering (ICDE). :921–932.

We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.

Pan, J., Mao, X..  2017.  Detecting DOM-Sourced Cross-Site Scripting in Browser Extensions. 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME). :24–34.

In recent years, with the advances in JavaScript engines and the adoption of HTML5 APIs, web applications begin to show a tendency to shift their functionality from the server side towards the client side, resulting in dense and complex interactions with HTML documents using the Document Object Model (DOM). As a consequence, client-side vulnerabilities become more and more prevalent. In this paper, we focus on DOM-sourced Cross-site Scripting (XSS), which is a kind of severe but not well-studied vulnerability appearing in browser extensions. Comparing with conventional DOM-based XSS, a new attack surface is introduced by DOM-sourced XSS where the DOM could become a vulnerable source as well besides common sources such as URLs and form inputs. To discover such vulnerability, we propose a detecting framework employing hybrid analysis with two phases. The first phase is the lightweight static analysis consisting of a text filter and an abstract syntax tree parser, which produces potential vulnerable candidates. The second phase is the dynamic symbolic execution with an additional component named shadow DOM, generating a document as a proof-of-concept exploit. In our large-scale real-world experiment, 58 previously unknown DOM-sourced XSS vulnerabilities were discovered in user scripts of the popular browser extension Greasemonkey.

Schäfer, C..  2017.  Detection of compromised email accounts used for spamming in correlation with origin-destination delivery notification extracted from metadata. 2017 5th International Symposium on Digital Forensic and Security (ISDFS). :1–6.

Fifty-four percent of the global email traffic in October 2016 was spam and phishing messages. Those emails were commonly sent from compromised email accounts. Previous research has primarily focused on detecting incoming junk mail but not locally generated spam messages. State-of-the-art spam detection methods generally require the content of the email to be able to classify it as either spam or a regular message. This content is not available within encrypted messages or is prohibited due to data privacy. The object of the research presented is to detect an anomaly with the Origin-Destination Delivery Notification method, which is based on the geographical origin and destination as well as the Delivery Status Notification of the remote SMTP server without the knowledge of the email content. The proposed method detects an abused account after a few transferred emails; it is very flexible and can be adjusted for every environment and requirement.

Yang, Y., Liu, X., Deng, R. H., Weng, J..  2017.  Flexible Wildcard Searchable Encryption System. IEEE Transactions on Services Computing. PP:1–1.

Searchable encryption is an important technique for public cloud storage service to provide user data confidentiality protection and at the same time allow users performing keyword search over their encrypted data. Previous schemes only deal with exact or fuzzy keyword search to correct some spelling errors. In this paper, we propose a new wildcard searchable encryption system to support wildcard keyword queries which has several highly desirable features. First, our system allows multiple keywords search in which any queried keyword may contain zero, one or two wildcards, and a wildcard may appear in any position of a keyword and represent any number of symbols. Second, it supports simultaneous search on multiple data owner’s data using only one trapdoor. Third, it provides flexible user authorization and revocation to effectively manage search and decryption privileges. Fourth, it is constructed based on homomorphic encryption rather than Bloom filter and hence completely eliminates the false probability caused by Bloom filter. Finally, it achieves a high level of privacy protection since matching results are unknown to the cloud server in the test phase. The proposed system is thoroughly analyzed and is proved secure. Extensive experimental results indicate that our system is efficient compared with other existing wildcard searchable encryption schemes in the public key setting.

Bijoy, J. M., Kavitha, V. K., Radhakrishnan, B., Suresh, L. P..  2017.  A Graphical Password Authentication for analyzing legitimate user in online social network and secure social image repository with metadata. 2017 International Conference on Circuit ,Power and Computing Technologies (ICCPCT). :1–7.

Internet plays a crucial role in today's life, so the usage of online social network monotonically increasing. People can share multimedia information's fastly and keep in touch or communicate with friend's easily through online social network across the world. Security in authentication is a big challenge in online social network and authentication is a preliminary process for identifying legitimate user. Conventionally, we are using alphanumeric textbased password for authentication approach. But the main flaw points of text based password is highly vulnerable to attacks and difficulty of recalling password during authentication time due to the irregular use of passwords. To overcome the shortcoming of text passwords, we propose a Graphical Password authentication. An approach of Graphical Password is an authentication of amalgam of pictures. It is less vulnerable to attacks and human can easily recall pictures better than text. So the graphical password is a better alternative to text passwords. As the image uploads are increasing by users share through online site, privacy preserving has become a major problem. So we need a Caption Based Metadata Stratification of images for delivers an automatic suggestion of similar category already in database, it works by comparing the caption metadata of album with caption metadata already in database or extract the synonyms of caption metadata of new album for checking the similarity with caption metadata already in database. This stratification offers an enhanced automatic privacy prediction for uploaded images in online social network, privacy is an inevitable factor for uploaded images, and privacy violation is a major concern. So we propose an Automatic Policy Prediction for uploaded images that are classified by caption metadata. An automatic policy prediction is a hassle-free privacy setting proposed to the user.

Gaikwad, V. S., Gandle, K. S..  2017.  Ideal complexity cryptosystem with high privacy data service for cloud databases. 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM). :267–270.

Data storage in cloud should come along with high safety and confidentiality. It is accountability of cloud service provider to guarantee the availability and security of client data. There exist various alternatives for storage services but confidentiality and complexity solutions for database as a service are still not satisfactory. Proposed system gives alternative solution for database as a service that integrates benefits of different services along with advance encryption techniques. It yields possibility of applying concurrency on encrypted data. This alternative provides supporting facility to connect dispersed clients with elimination of intermediate proxy by which simplicity can acquired. Performance of proposed system evaluated on basis of theoretical analyses.

Sürer, Özge.  2017.  Improving Similarity Measures Using Ontological Data. Proceedings of the Eleventh ACM Conference on Recommender Systems. :416–420.

The representation of structural data is important to capture the pattern between features. Interrelations between variables provide information beyond the standard variables. In this study, we show how ontology information may be used in a recommender systems to increase the efficiency of predictions. We propose two alternative similarity measures that incorporates the structural data representation. Experiments show that our ontology-based approach delivers improved classification accuracy when the dimension increases.

Shaabani, Nuhad, Meinel, Christoph.  2017.  Incremental Discovery of Inclusion Dependencies. Proceedings of the 29th International Conference on Scientific and Statistical Database Management. :2:1–2:12.

Inclusion dependencies form one of the most fundamental classes of integrity constraints. Their importance in classical data management is reinforced by modern applications such as data profiling, data cleaning, entity resolution and schema matching. Their discovery in an unknown dataset is at the core of any data analysis effort. Therefore, several research approaches have focused on their efficient discovery in a given, static dataset. However, none of these approaches are appropriate for applications on dynamic datasets, such as transactional datasets, scientific applications, and social network. In these cases, discovery techniques should be able to efficiently update the inclusion dependencies after an update in the dataset, without reprocessing the entire dataset. We present the first approach for incrementally updating the unary inclusion dependencies. In particular, our approach is based on the concept of attribute clustering from which the unary inclusion dependencies are efficiently derivable. We incrementally update the clusters after each update of the dataset. Updating the clusters does not need to access the dataset because of special data structures designed to efficiently support the updating process. We perform an exhaustive analysis of our approach by applying it to large datasets with several hundred attributes and more than 116,200,000 million tuples. The results show that the incremental discovery significantly reduces the runtime needed by the static discovery. This reduction in the runtime is up to 99.9996 % for both the insert and the delete.

That, D. H. T., Fils, G., Yuan, Z., Malik, T..  2017.  Sciunits: Reusable Research Objects. 2017 IEEE 13th International Conference on e-Science (e-Science). :374–383.

Science is conducted collaboratively, often requiring knowledge sharing about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. In this paper, we present the sciunit, a reusable research object in which aggregated content is recomputable. We describe a Git-like client that efficiently creates, stores, and repeats sciunits. We show through analysis that sciunits repeat computational experiments with minimal storage and processing overhead. Finally, we provide an overview of sharing and reproducible cyberinfrastructure based on sciunits gaining adoption in the domain of geosciences.

Kollenda, B., Göktaş, E., Blazytko, T., Koppe, P., Gawlik, R., Konoth, R. K., Giuffrida, C., Bos, H., Holz, T..  2017.  Towards Automated Discovery of Crash-Resistant Primitives in Binary Executables. 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :189–200.

Many modern defenses rely on address space layout randomization (ASLR) to efficiently hide security-sensitive metadata in the address space. Absent implementation flaws, an attacker can only bypass such defenses by repeatedly probing the address space for mapped (security-sensitive) regions, incurring a noisy application crash on any wrong guess. Recent work shows that modern applications contain idioms that allow the construction of crash-resistant code primitives, allowing an attacker to efficiently probe the address space without causing any visible crash. In this paper, we classify different crash-resistant primitives and show that this problem is much more prominent than previously assumed. More specifically, we show that rather than relying on labor-intensive source code inspection to find a few "hidden" application-specific primitives, an attacker can find such primitives semi-automatically, on many classes of real-world programs, at the binary level. To support our claims, we develop methods to locate such primitives in real-world binaries. We successfully identified 29 new potential primitives and constructed proof-of-concept exploits for four of them.

Diaz, J. S. B., Medeiros, C. B..  2017.  WorkflowHunt: Combining Keyword and Semantic Search in Scientific Workflow Repositories. 2017 IEEE 13th International Conference on e-Science (e-Science). :138–147.

Scientific datasets and the experiments that analyze them are growing in size and complexity, and scientists are facing difficulties to share such resources. Some initiatives have emerged to try to solve this problem. One of them involves the use of scientific workflows to represent and enact experiment execution. There is an increasing number of workflows that are potentially relevant for more than one scientific domain. However, it is hard to find workflows suitable for reuse given an experiment. Creating a workflow takes time and resources, and their reuse helps scientists to build new workflows faster and in a more reliable way. Search mechanisms in workflow repositories should provide different options for workflow discovery, but it is difficult for generic repositories to provide multiple mechanisms. This paper presents WorkflowHunt, a hybrid architecture for workflow search and discovery for generic repositories, which combines keyword and semantic search to allow finding relevant workflows using different search methods. We validated our architecture creating a prototype that uses real workflows and metadata from myExperiment, and compare search results via WorkflowHunt and via myExperiment's search interface.

Nadgowda, S., Duri, S., Isci, C., Mann, V..  2017.  Columbus: Filesystem Tree Introspection for Software Discovery. 2017 IEEE International Conference on Cloud Engineering (IC2E). :67–74.

Software discovery is a key management function to ensure that systems are free of vulnerabilities, comply with licensing requirements, and support advanced search for systems containing given software. Today, software is predominantly discovered through querying package management tools, or using rules that check for file metadata or contents. These approaches are inadequate as not every software is installed through package managers, and agile development practices lead to frequent deployment of software. Other approaches to software discovery use machine learning methods requiring training phase, or require maintaining knowledge bases. Columbus uses the knowledge of the software packaging practices that evolved over time, and uses the information embedded in the file system impression created by a software package to discover it. Columbus is able to discover software in 92% of all official Docker images. Further, Columbus can be used in problem diagnosis and drift detection situations to compare two different systems, or to determine the evolution of a system overtime.

Lim, H., Ni, A., Kim, D., Ko, Y. B..  2017.  Named data networking testbed for scientific data. 2017 2nd International Conference on Computer and Communication Systems (ICCCS). :65–69.

Named Data Networking (NDN) is one of the future internet architectures, which is a clean-slate approach. NDN provides intelligent data retrieval using the principles of name-based symmetrical forwarding of Interest/Data packets and innetwork caching. The continually increasing demand for rapid dissemination of large-scale scientific data is driving the use of NDN in data-intensive science experiments. In this paper, we establish an intercontinental NDN testbed. In the testbed, an NDN-based application that targets climate science as an example data intensive science application is designed and implemented, which has differentiated features compared to those of previous studies. We verify experimental justification of using NDN for climate science in the intercontinental network, through performance comparisons between classical delivery techniques and NDN-based climate data delivery.

Saundry, A..  2017.  Institutional Repository Digital Object Metadata Enhancement and Re-Architecting. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). :1–3.

We present work undertaken at our institutional repository to enhance metadata and re-organize digital objects according to new information architecture, in an effort to minimize administrative object management and processing, and improve object discovery and use. This work was partly motivated by the launch of a new discovery platform at our institution, which aggregates metadata and full text from our four open access repositories into a cohesive, consistent, and enhanced searching and browsing experience. The platform provides digital object identifier (DOI) assignment, metadata access via various formats, and an open metadata and full text application program interface (API) for researchers, amongst other features. Functionality of these platform features relies heavily on accurate object representation and metadata. This work facilitates and improves the discovery and engagement of the diverse digital objects available from our institution, so they can be used and analyzed in new, flexible, and innovative ways by a myriad of communities and disciplines.

Zhou, G., Huang, J. X..  2017.  Modeling and Learning Distributed Word Representation with Metadata for Question Retrieval. IEEE Transactions on Knowledge and Data Engineering. 29:1226–1239.

Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.

Kolosnjaji, B., Eraisha, G., Webster, G., Zarras, A., Eckert, C..  2017.  Empowering convolutional networks for malware classification and analysis. 2017 International Joint Conference on Neural Networks (IJCNN). :3838–3845.

Performing large-scale malware classification is increasingly becoming a critical step in malware analytics as the number and variety of malware samples is rapidly growing. Statistical machine learning constitutes an appealing method to cope with this increase as it can use mathematical tools to extract information out of large-scale datasets and produce interpretable models. This has motivated a surge of scientific work in developing machine learning methods for detection and classification of malicious executables. However, an optimal method for extracting the most informative features for different malware families, with the final goal of malware classification, is yet to be found. Fortunately, neural networks have evolved to the state that they can surpass the limitations of other methods in terms of hierarchical feature extraction. Consequently, neural networks can now offer superior classification accuracy in many domains such as computer vision and natural language processing. In this paper, we transfer the performance improvements achieved in the area of neural networks to model the execution sequences of disassembled malicious binaries. We implement a neural network that consists of convolutional and feedforward neural constructs. This architecture embodies a hierarchical feature extraction approach that combines convolution of n-grams of instructions with plain vectorization of features derived from the headers of the Portable Executable (PE) files. Our evaluation results demonstrate that our approach outperforms baseline methods, such as simple Feedforward Neural Networks and Support Vector Machines, as we achieve 93% on precision and recall, even in case of obfuscations in the data.

2018
Chen, W., Liang, X., Li, J., Qin, H., Mu, Y., Wang, J..  2018.  Blockchain Based Provenance Sharing of Scientific Workflows. 2018 IEEE International Conference on Big Data (Big Data). :3814–3820.
In a research community, the provenance sharing of scientific workflows can enhance distributed research cooperation, experiment reproducibility verification and experiment repeatedly doing. Considering that scientists in such a community are often in a loose relation and distributed geographically, traditional centralized provenance sharing architectures have shown their disadvantages in poor trustworthiness, reliabilities and efficiency. Additionally, they are also difficult to protect the rights and interests of data providers. All these have been largely hindering the willings of distributed scientists to share their workflow provenance. Considering the big advantages of blockchain in decentralization, trustworthiness and high reliability, an approach to sharing scientific workflow provenance based on blockchain in a research community is proposed. To make the approach more practical, provenance is handled on-chain and original data is delivered off-chain. A kind of block structure to support efficient provenance storing and retrieving is designed, and an algorithm for scientists to search workflow segments from provenance as well as an algorithm for experiments backtracking are provided to enhance the experiment result sharing, save computing resource and time cost by avoiding repeated experiments as far as possible. Analyses show that the approach is efficient and effective.