Visible to the public Biblio

Filters: Keyword is meta data  [Clear All Filters]
2021-03-09
Wilkens, F., Fischer, M..  2020.  Towards Data-Driven Characterization of Brute-Force Attackers. 2020 IEEE Conference on Communications and Network Security (CNS). :1—9.

Brute-force login attempts are common for every host on the public Internet. While most of them can be discarded as low-threat attacks, targeted attack campaigns often use a dictionary-based brute-force attack to establish a foothold in the network. Therefore, it is important to characterize the attackers' behavior to prioritize defensive measures and react to new threats quickly. In this paper we present a set of metrics that can support threat hunters in characterizing brute-force login attempts. Based on connection metadata, timing information, and the attacker's dictionary these metrics can help to differentiate scans and to find common behavior across distinct IP addresses. We evaluated our novel metrics on a real-world data set of malicious login attempts collected by our honeypot Honeygrove. We highlight interesting metrics, show how clustering can be leveraged to reveal common behavior across IP addresses, and describe how selected metrics help to assess the threat level of attackers. Amongst others, we for example found strong indicators for collusion between ten otherwise unrelated IP addresses confirming that a clustering of the right metrics can help to reveal coordinated attacks.

2020-12-11
Liu, F., Li, J., Wang, Y., Li, L..  2019.  Kubestorage: A Cloud Native Storage Engine for Massive Small Files. 2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC). :1—4.
Cloud Native, the emerging computing infrastructure has become a new trend for cloud computing, especially after the development of containerization technology such as docker and LXD, and the orchestration system for them like Kubernetes and Swarm. With the growing popularity of Cloud Native, the following problems have been raised: (i) most Cloud Native applications were designed for making full use of the cloud platform, but their file storage has not been completely optimized for adapting it. (ii) the traditional file system is designed as a utility for storing and retrieving files, usually built into the kernel of the operating systems. But when placing it to a large-scale condition, like a network storage server shared by thousands of computing instances, and stores millions of files, it will be slow and even unstable. (iii) most storage solutions use metadata for faster tracking of files, but the metadata itself will take up a lot of space, and the capacity of it is usually limited. If the file system store metadata directly into hard disk without caching, the tracking of massive small files will be a lot slower. (iv) The traditional object storage solution can't provide enough features to make itself more practical on the cloud such as caching and auto replication. This paper proposes a new storage engine based on the well-known Haystack storage engine, optimized in terms of service discovery and Automated fault tolerance, make it more suitable for Cloud Native infrastructure, deployment and applications. We use the object storage model to solve the large and high-frequency file storage needs, offering a simple and unified set of APIs for application to access. We also take advantage of Kubernetes' sophisticated and automated toolchains to make cloud storage easier to deploy, more flexible to scale, and more stable to run.
Xie, J., Zhang, M., Ma, Y..  2019.  Using Format Migration and Preservation Metadata to Support Digital Preservation of Scientific Data. 2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS). :1—6.

With the development of e-Science and data intensive scientific discovery, it needs to ensure scientific data available for the long-term, with the goal that the valuable scientific data should be discovered and re-used for downstream investigations, either alone, or in combination with newly generated data. As such, the preservation of scientific data enables that not only might experiment be reproducible and verifiable, but also new questions can be raised by other scientists to promote research and innovation. In this paper, we focus on the two main problems of digital preservation that are format migration and preservation metadata. Format migration includes both format verification and object transformation. The system architecture of format migration and preservation metadata is presented, mapping rules of object transformation are analyzed, data fixity and integrity and authenticity, digital signature and so on are discussed and an example is shown in detail.

Zhang, W., Byna, S., Niu, C., Chen, Y..  2019.  Exploring Metadata Search Essentials for Scientific Data Management. 2019 IEEE 26th International Conference on High Performance Computing, Data, and Analytics (HiPC). :83—92.

Scientific experiments and observations store massive amounts of data in various scientific file formats. Metadata, which describes the characteristics of the data, is commonly used to sift through massive datasets in order to locate data of interest to scientists. Several indexing data structures (such as hash tables, trie, self-balancing search trees, sparse array, etc.) have been developed as part of efforts to provide an efficient method for locating target data. However, efficient determination of an indexing data structure remains unclear in the context of scientific data management, due to the lack of investigation on metadata, metadata queries, and corresponding data structures. In this study, we perform a systematic study of the metadata search essentials in the context of scientific data management. We study a real-world astronomy observation dataset and explore the characteristics of the metadata in the dataset. We also study possible metadata queries based on the discovery of the metadata characteristics and evaluate different data structures for various types of metadata attributes. Our evaluation on real-world dataset suggests that trie is a suitable data structure when prefix/suffix query is required, otherwise hash table should be used. We conclude our study with a summary of our findings. These findings provide a guideline and offers insights in developing metadata indexing methodologies for scientific applications.

2020-11-30
Georgakopoulos, D..  2019.  A Global IoT Device Discovery and Integration Vision. 2019 IEEE 5th International Conference on Collaboration and Internet Computing (CIC). :214–221.
This paper presents the vision of establishing a global service for Global IoT Device Discovery and Integration (GIDDI). The establishment of a GIDDI will: (1) make IoT application development more efficient and cost-effective via enabling sharing and reuse of existing IoT devices owned and maintained by different providers, and (2) promote deployment of new IoT devices supported by a revenue generation scheme for their providers. More specifically, this paper proposes a distributed IoT blockchain ledger that is specifically designed for managing the metadata needed to describe IoT devices and the data they produce. This GIDDI Blockchain is Internet-owned (i.e., it is not controlled by any individual or organization) and is Internet-scaled (i.e., it can support the discovery and reuse billions of IoT devices). The paper also proposes a GIDDI Marketplace that provides the functionality needed for IoT device registration, query, integration, payment and security via the proposed GIDDI Blockchain. We outline the GIDDI Blockchain and Marketplace implementation. We also discuss ongoing research for automatically mining the IoT Device metadata needed for IoT Device query and integration from the data produce. This significantly reduces the need for IoT device providers to supply the metadata descriptions the devices and the data they produce during the registration of IoT Devices in the GIDDI Blockchain.
2020-04-13
Jeong, Yena, Hwang, DongYeop, Kim, Ki-Hyung.  2019.  Blockchain-Based Management of Video Surveillance Systems. 2019 International Conference on Information Networking (ICOIN). :465–468.
In this paper, we propose a video surveillance system based on blockchain system. The proposed system consists of a blockchain network with trusted internal managers. The metadata of the video is recorded on the distributed ledger of the blockchain, thereby blocking the possibility of forgery of the data. The proposed architecture encrypts and stores the video, creates a license within the blockchain, and exports the video. Since the decryption key for the video is managed by the private DB of the blockchain, it is not leaked by the internal manager unauthorizedly. In addition, the internal administrator can manage and export videos safely by exporting the license generated in the blockchain to the DRM-applied video player.
2020-04-10
Robic-Butez, Pierrick, Win, Thu Yein.  2019.  Detection of Phishing websites using Generative Adversarial Network. 2019 IEEE International Conference on Big Data (Big Data). :3216—3221.

Phishing is typically deployed as an attack vector in the initial stages of a hacking endeavour. Due to it low-risk rightreward nature it has seen a widespread adoption, and detecting it has become a challenge in recent times. This paper proposes a novel means of detecting phishing websites using a Generative Adversarial Network. Taking into account the internal structure and external metadata of a website, the proposed approach uses a generator network which generates both legitimate as well as synthetic phishing features to train a discriminator network. The latter then determines if the features are either normal or phishing websites, before improving its detection accuracy based on the classification error. The proposed approach is evaluated using two different phishing datasets and is found to achieve a detection accuracy of up to 94%.

2020-03-30
Bharati, Aparna, Moreira, Daniel, Brogan, Joel, Hale, Patricia, Bowyer, Kevin, Flynn, Patrick, Rocha, Anderson, Scheirer, Walter.  2019.  Beyond Pixels: Image Provenance Analysis Leveraging Metadata. 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). :1692–1702.
Creative works, whether paintings or memes, follow unique journeys that result in their final form. Understanding these journeys, a process known as "provenance analysis," provides rich insights into the use, motivation, and authenticity underlying any given work. The application of this type of study to the expanse of unregulated content on the Internet is what we consider in this paper. Provenance analysis provides a snapshot of the chronology and validity of content as it is uploaded, re-uploaded, and modified over time. Although still in its infancy, automated provenance analysis for online multimedia is already being applied to different types of content. Most current works seek to build provenance graphs based on the shared content between images or videos. This can be a computationally expensive task, especially when considering the vast influx of content that the Internet sees every day. Utilizing non-content-based information, such as timestamps, geotags, and camera IDs can help provide important insights into the path a particular image or video has traveled during its time on the Internet without large computational overhead. This paper tests the scope and applicability of metadata-based inferences for provenance graph construction in two different scenarios: digital image forensics and cultural analytics.
Miao, Hui, Deshpande, Amol.  2019.  Understanding Data Science Lifecycle Provenance via Graph Segmentation and Summarization. 2019 IEEE 35th International Conference on Data Engineering (ICDE). :1710–1713.
Increasingly modern data science platforms today have non-intrusive and extensible provenance ingestion mechanisms to collect rich provenance and context information, handle modifications to the same file using distinguishable versions, and use graph data models (e.g., property graphs) and query languages (e.g., Cypher) to represent and manipulate the stored provenance/context information. Due to the schema-later nature of the metadata, multiple versions of the same files, and unfamiliar artifacts introduced by team members, the resulting "provenance graphs" are quite verbose and evolving; further, it is very difficult for the users to compose queries and utilize this valuable information just using standard graph query model. In this paper, we propose two high-level graph query operators to address the verboseness and evolving nature of such provenance graphs. First, we introduce a graph segmentation operator, which queries the retrospective provenance between a set of source vertices and a set of destination vertices via flexible boundary criteria to help users get insight about the derivation relationships among those vertices. We show the semantics of such a query in terms of a context-free grammar, and develop efficient algorithms that run orders of magnitude faster than state-of-the-art. Second, we propose a graph summarization operator that combines similar segments together to query prospective provenance of the underlying project. The operator allows tuning the summary by ignoring vertex details and characterizing local structures, and ensures the provenance meaning using path constraints. We show the optimal summary problem is PSPACE-complete and develop effective approximation algorithms. We implement the operators on top of Neo4j, evaluate our query techniques extensively, and show the effectiveness and efficiency of the proposed methods.
2020-02-10
Krause, Tim, Uetz, Rafael, Kretschmann, Tim.  2019.  Recognizing Email Spam from Meta Data Only. 2019 IEEE Conference on Communications and Network Security (CNS). :178–186.

We propose a new spam detection approach based solely on meta data features gained from email headers. The approach achieves above 99 % classification accuracy on the CSDMC2010 dataset, which matches or surpasses state-of-the-art spam classifiers. We utilize a static set of engineered features, supplemented with automatically extracted features. The approach is just as effective for spam detection in end-to-end encryption, as our feature set remains unchanged for encrypted emails. In contrast to most established spam detectors, we disregard the email body completely and can therefore deliver very high classification speeds, as computationally expensive text preprocessing is not necessary.

2020-01-20
Klarin, K., Nazor, I., Celar, S..  2019.  Ontology literature review as guidelines for improving Croatian Qualification Framework. 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). :1402–1407.

Development of information systems dealing with education and labour market using web and grid service architecture enables their modularity, expandability and interoperability. Application of ontologies to the web helps with collecting and selecting the knowledge about a certain field in a generic way, thus enabling different applications to understand, use, reuse and share the knowledge among them. A necessary step before publishing computer-interpretable data on the public web is the implementation of common standards that will ensure the exchange of information. Croatian Qualification Framework (CROQF) is a project of standardization of occupations for the labour market, as well as standardization of sets of qualifications, skills and competences and their mutual relations. This paper analysis a respectable amount of research dealing with application of ontologies to information systems in education during the last decade. The main goal is to compare achieved results according to: 1) phases of development/classifications of education-related ontologies; 2) areas of education and 3) standards and structures of metadata for educational systems. Collected information is used to provide insight into building blocks of CROQF, both the ones well supported by experience and best practices, and the ones that are not, together with guidelines for development of own standards using ontological structures.

2020-01-02
Hagan, Matthew, Kang, BooJoong, McLaughlin, Kieran, Sezer, Sakir.  2018.  Peer Based Tracking Using Multi-Tuple Indexing for Network Traffic Analysis and Malware Detection. 2018 16th Annual Conference on Privacy, Security and Trust (PST). :1–5.

Traditional firewalls, Intrusion Detection Systems(IDS) and network analytics tools extensively use the `flow' connection concept, consisting of five `tuples' of source and destination IP, ports and protocol type, for classification and management of network activities. By analysing flows, information can be obtained from TCP/IP fields and packet content to give an understanding of what is being transferred within a single connection. As networks have evolved to incorporate more connections and greater bandwidth, particularly from ``always on'' IoT devices and video and data streaming, so too have malicious network threats, whose communication methods have increased in sophistication. As a result, the concept of the 5 tuple flow in isolation is unable to detect such threats and malicious behaviours. This is due to factors such as the length of time and data required to understand the network traffic behaviour, which cannot be accomplished by observing a single connection. To alleviate this issue, this paper proposes the use of additional, two tuple and single tuple flow types to associate multiple 5 tuple communications, with generated metadata used to profile individual connnection behaviour. This proposed approach enables advanced linking of different connections and behaviours, developing a clearer picture as to what network activities have been taking place over a prolonged period of time. To demonstrate the capability of this approach, an expert system rule set has been developed to detect the presence of a multi-peered ZeuS botnet, which communicates by making multiple connections with multiple hosts, thus undetectable to standard IDS systems observing 5 tuple flow types in isolation. Finally, as the solution is rule based, this implementation operates in realtime and does not require post-processing and analytics of other research solutions. This paper aims to demonstrate possible applications for next generation firewalls and methods to acquire additional information from network traffic.

2019-09-04
Paiker, N., Ding, X., Curtmola, R., Borcea, C..  2018.  Context-Aware File Discovery System for Distributed Mobile-Cloud Apps. 2018 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). :198–203.
Recent research has proposed middleware to enable efficient distributed apps over mobile-cloud platforms. This paper presents a Context-Aware File Discovery Service (CAFDS) that allows distributed mobile-cloud applications to find and access files of interest shared by collaborating users. CAFDS enables programmers to search for files defined by context and content features, such as location, creation time, or the presence of certain object types within an image file. CAFDS provides low-latency through a cloud-based metadata server, which uses a decision tree to locate the nearest files that satisfy the context and content features requested by applications. We implemented CAFDS in Android and Linux. Experimental results show CAFDS achieves substantially lower latency than peer-to-peer solutions that cannot leverage context information.
Lawson, M., Lofstead, J..  2018.  Using a Robust Metadata Management System to Accelerate Scientific Discovery at Extreme Scales. 2018 IEEE/ACM 3rd International Workshop on Parallel Data Storage Data Intensive Scalable Computing Systems (PDSW-DISCS). :13–23.
Our previous work, which can be referred to as EMPRESS 1.0, showed that rich metadata management provides a relatively low-overhead approach to facilitating insight from scale-up scientific applications. However, this system did not provide the functionality needed for a viable production system or address whether such a system could scale. Therefore, we have extended our previous work to create EMPRESS 2.0, which incorporates the features required for a useful production system. Through a discussion of EMPRESS 2.0, this paper explores how to incorporate rich query functionality, fault tolerance, and atomic operations into a scalable, storage system independent metadata management system that is easy to use. This paper demonstrates that such a system offers significant performance advantages over HDF5, providing metadata querying that is 150X to 650X faster, and can greatly accelerate post-processing. Finally, since the current implementation of EMPRESS 2.0 relies on an RDBMS, this paper demonstrates that an RDBMS is a viable technology for managing data-oriented metadata.
Liang, J., Jiang, L., Cao, L., Li, L., Hauptmann, A..  2018.  Focal Visual-Text Attention for Visual Question Answering. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. :6135–6143.
Recent insights on language and vision with neural networks have been successfully applied to simple single-image visual question answering. However, to tackle real-life question answering problems on multimedia collections such as personal photos, we have to look at whole collections with sequences of photos or videos. When answering questions from a large collection, a natural problem is to identify snippets to support the answer. In this paper, we describe a novel neural network called Focal Visual-Text Attention network (FVTA) for collective reasoning in visual question answering, where both visual and text sequence information such as images and text metadata are presented. FVTA introduces an end-to-end approach that makes use of a hierarchical process to dynamically determine what media and what time to focus on in the sequential data to answer the question. FVTA can not only answer the questions well but also provides the justifications which the system results are based upon to get the answers. FVTA achieves state-of-the-art performance on the MemexQA dataset and competitive results on the MovieQA dataset.
2019-07-01
Rasin, A., Wagner, J., Heart, K., Grier, J..  2018.  Establishing Independent Audit Mechanisms for Database Management Systems. 2018 IEEE International Symposium on Technologies for Homeland Security (HST). :1-7.

The pervasive use of databases for the storage of critical and sensitive information in many organizations has led to an increase in the rate at which databases are exploited in computer crimes. While there are several techniques and tools available for database forensic analysis, such tools usually assume an apriori database preparation, such as relying on tamper-detection software to already be in place and the use of detailed logging. Further, such tools are built-in and thus can be compromised or corrupted along with the database itself. In practice, investigators need forensic and security audit tools that work on poorlyconfigured systems and make no assumptions about the extent of damage or malicious hacking in a database.In this paper, we present our database forensics methods, which are capable of examining database content from a storage (disk or RAM) image without using any log or file system metadata. We describe how these methods can be used to detect security breaches in an untrusted environment where the security threat arose from a privileged user (or someone who has obtained such privileges). Finally, we argue that a comprehensive and independent audit framework is necessary in order to detect and counteract threats in an environment where the security breach originates from an administrator (either at database or operating system level).

2019-05-08
Ölvecký, M., Gabriška, D..  2018.  Wiping Techniques and Anti-Forensics Methods. 2018 IEEE 16th International Symposium on Intelligent Systems and Informatics (SISY). :000127–000132.

This paper presents a theoretical background of main research activity focused on the evaluation of wiping/erasure standards which are mostly implemented in specific software products developed and programming for data wiping. The information saved in storage devices often consists of metadata and trace data. Especially but not only these kinds of data are very important in the process of forensic analysis because they sometimes contain information about interconnection on another file. Most people saving their sensitive information on their local storage devices and later they want to secure erase these files but usually there is a problem with this operation. Secure file destruction is one of many Anti-forensics methods. The outcome of this paper is to define the future research activities focused on the establishment of the suitable digital environment. This environment will be prepared for testing and evaluating selected wiping standards and appropriate eraser software.

2019-03-06
Suwansrikham, P., She, K..  2018.  Asymmetric Secure Storage Scheme for Big Data on Multiple Cloud Providers. 2018 IEEE 4th International Conference on Big Data Security on Cloud (BigDataSecurity), IEEE International Conference on High Performance and Smart Computing, (HPSC) and IEEE International Conference on Intelligent Data and Security (IDS). :121-125.

Recently, cloud computing is an emerging technology along with big data. Both technologies come together. Due to the enormous size of data in big data, it is impossible to store them in local storage. Alternatively, even we want to store them locally, we have to spend much money to create bit data center. One way to save money is store big data in cloud storage service. Cloud storage service provides users space and security to store the file. However, relying on single cloud storage may cause trouble for the customer. CSP may stop its service anytime. It is too risky if data owner hosts his file only single CSP. Also, the CSP is the third party that user have to trust without verification. After deploying his file to CSP, the user does not know who access his file. Even CSP provides a security mechanism to prevent outsider attack. However, how user ensure that there is no insider attack to steal or corrupt the file. This research proposes the way to minimize the risk, ensure data privacy, also accessing control. The big data file is split into chunks and distributed to multiple cloud storage provider. Even there is insider attack; the attacker gets only part of the file. He cannot reconstruct the whole file. After splitting the file, metadata is generated. Metadata is a place to keep chunk information, includes, chunk locations, access path, username and password of data owner to connect each CSP. Asymmetric security concept is applied to this research. The metadata will be encrypted and transfer to the user who requests to access the file. The file accessing, monitoring, metadata transferring is functions of dew computing which is an intermediate server between the users and cloud service.

2019-02-22
Bakour, K., Ünver, H. M., Ghanem, R..  2018.  The Android Malware Static Analysis: Techniques, Limitations, and Open Challenges. 2018 3rd International Conference on Computer Science and Engineering (UBMK). :586-593.

This paper aims to explain static analysis techniques in detail, and to highlight the weaknesses and challenges which face it. To this end, more than 80 static analysis-based framework have been studied, and in their light, the process of detecting malicious applications has been divided into four phases that were explained in a schematic manner. Also, the features that is used in static analysis were discussed in detail by dividing it into four categories namely, Manifest-based features, code-based features, semantic features and app's metadata-based features. Also, the challenges facing methods based on static analysis were discussed in detail. Finally, a case study was conducted to test the strength of some known commercial antivirus and one of the stat-of-art academic static analysis frameworks against obfuscation techniques used by developers of malicious applications. The results showed a significant impact on the performance of the most tested antiviruses and frameworks, which is reflecting the urgent need for more accurately tools.

2017-12-12
Bijoy, J. M., Kavitha, V. K., Radhakrishnan, B., Suresh, L. P..  2017.  A Graphical Password Authentication for analyzing legitimate user in online social network and secure social image repository with metadata. 2017 International Conference on Circuit ,Power and Computing Technologies (ICCPCT). :1–7.

Internet plays a crucial role in today's life, so the usage of online social network monotonically increasing. People can share multimedia information's fastly and keep in touch or communicate with friend's easily through online social network across the world. Security in authentication is a big challenge in online social network and authentication is a preliminary process for identifying legitimate user. Conventionally, we are using alphanumeric textbased password for authentication approach. But the main flaw points of text based password is highly vulnerable to attacks and difficulty of recalling password during authentication time due to the irregular use of passwords. To overcome the shortcoming of text passwords, we propose a Graphical Password authentication. An approach of Graphical Password is an authentication of amalgam of pictures. It is less vulnerable to attacks and human can easily recall pictures better than text. So the graphical password is a better alternative to text passwords. As the image uploads are increasing by users share through online site, privacy preserving has become a major problem. So we need a Caption Based Metadata Stratification of images for delivers an automatic suggestion of similar category already in database, it works by comparing the caption metadata of album with caption metadata already in database or extract the synonyms of caption metadata of new album for checking the similarity with caption metadata already in database. This stratification offers an enhanced automatic privacy prediction for uploaded images in online social network, privacy is an inevitable factor for uploaded images, and privacy violation is a major concern. So we propose an Automatic Policy Prediction for uploaded images that are classified by caption metadata. An automatic policy prediction is a hassle-free privacy setting proposed to the user.

Zhou, G., Huang, J. X..  2017.  Modeling and Learning Distributed Word Representation with Metadata for Question Retrieval. IEEE Transactions on Knowledge and Data Engineering. 29:1226–1239.

Community question answering (cQA) has become an important issue due to the popularity of cQA archives on the Web. This paper focuses on addressing the lexical gap problem in question retrieval. Question retrieval in cQA archives aims to find the existing questions that are semantically equivalent or relevant to the queried questions. However, the lexical gap problem brings a new challenge for question retrieval in cQA. In this paper, we propose to model and learn distributed word representations with metadata of category information within cQA pages for question retrieval using two novel category powered models. One is a basic category powered model called MB-NET and the other one is an enhanced category powered model called ME-NET which can better learn the distributed word representations and alleviate the lexical gap problem. To deal with the variable size of word representation vectors, we employ the framework of fisher kernel to transform them into the fixed-length vectors. Experimental results on large-scale English and Chinese cQA data sets show that our proposed approaches can significantly outperform state-of-the-art retrieval models for question retrieval in cQA. Moreover, we further conduct our approaches on large-scale automatic evaluation experiments. The evaluation results show that promising and significant performance improvements can be achieved.

Saundry, A..  2017.  Institutional Repository Digital Object Metadata Enhancement and Re-Architecting. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). :1–3.

We present work undertaken at our institutional repository to enhance metadata and re-organize digital objects according to new information architecture, in an effort to minimize administrative object management and processing, and improve object discovery and use. This work was partly motivated by the launch of a new discovery platform at our institution, which aggregates metadata and full text from our four open access repositories into a cohesive, consistent, and enhanced searching and browsing experience. The platform provides digital object identifier (DOI) assignment, metadata access via various formats, and an open metadata and full text application program interface (API) for researchers, amongst other features. Functionality of these platform features relies heavily on accurate object representation and metadata. This work facilitates and improves the discovery and engagement of the diverse digital objects available from our institution, so they can be used and analyzed in new, flexible, and innovative ways by a myriad of communities and disciplines.

Nadgowda, S., Duri, S., Isci, C., Mann, V..  2017.  Columbus: Filesystem Tree Introspection for Software Discovery. 2017 IEEE International Conference on Cloud Engineering (IC2E). :67–74.

Software discovery is a key management function to ensure that systems are free of vulnerabilities, comply with licensing requirements, and support advanced search for systems containing given software. Today, software is predominantly discovered through querying package management tools, or using rules that check for file metadata or contents. These approaches are inadequate as not every software is installed through package managers, and agile development practices lead to frequent deployment of software. Other approaches to software discovery use machine learning methods requiring training phase, or require maintaining knowledge bases. Columbus uses the knowledge of the software packaging practices that evolved over time, and uses the information embedded in the file system impression created by a software package to discover it. Columbus is able to discover software in 92% of all official Docker images. Further, Columbus can be used in problem diagnosis and drift detection situations to compare two different systems, or to determine the evolution of a system overtime.

Kimmig, A., Memory, A., Miller, R. J., Getoor, L..  2017.  A Collective, Probabilistic Approach to Schema Mapping. 2017 IEEE 33rd International Conference on Data Engineering (ICDE). :921–932.

We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.

2017-02-23
V. Waghmare, K. Gojre, A. Watpade.  2015.  "Approach to Enhancing Concurrent and Self-Reliant Access to Cloud Database: A Review". 2015 International Conference on Computational Intelligence and Communication Networks (CICN). :777-781.

Now a day's cloud computing is power station to run multiple businesses. It is cumulating more and more users every day. Database-as-a-service is service model provided by cloud computing to store, manage and process data on a cloud platform. Database-as-a-service has key characteristics such as availability, scalability, elasticity. A customer does not have to worry about database installation and management. As a replacement, the cloud database service provider takes responsibility for installing and maintaining the database. The real problem occurs when it comes to storing confidential or private information in the cloud database, we cannot rely on the cloud data vendor. A curious cloud database vendor may capture and leak the secret information. For that purpose, Protected Database-as-a-service is a novel solution to this problem that provides provable and pragmatic privacy in the face of a compromised cloud database service provider. Protected Database-as-a-service defines various encryption schemes to choose encryption algorithm and encryption key to encrypt and decrypt data. It also provides "Master key" to users, so that a metadata storage table can be decrypted only by using the master key of the users. As a result, a cloud service vendor never gets access to decrypted data, and even if all servers are jeopardized, in such inauspicious circumstances a cloud service vendor will not be able to decrypt the data. Proposed Protected Database-as-a-service system allows multiple geographically distributed clients to execute concurrent and independent operation on encrypted data and also conserve data confidentiality and consistency at cloud level, to eradicate any intermediate server between the client and the cloud database.