Visible to the public Biblio

Filters: Keyword is Metadata Discovery Problem  [Clear All Filters]
2017-12-12
Saundry, A..  2017.  Institutional Repository Digital Object Metadata Enhancement and Re-Architecting. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL). :1–3.

We present work undertaken at our institutional repository to enhance metadata and re-organize digital objects according to new information architecture, in an effort to minimize administrative object management and processing, and improve object discovery and use. This work was partly motivated by the launch of a new discovery platform at our institution, which aggregates metadata and full text from our four open access repositories into a cohesive, consistent, and enhanced searching and browsing experience. The platform provides digital object identifier (DOI) assignment, metadata access via various formats, and an open metadata and full text application program interface (API) for researchers, amongst other features. Functionality of these platform features relies heavily on accurate object representation and metadata. This work facilitates and improves the discovery and engagement of the diverse digital objects available from our institution, so they can be used and analyzed in new, flexible, and innovative ways by a myriad of communities and disciplines.

Kollenda, B., Göktaş, E., Blazytko, T., Koppe, P., Gawlik, R., Konoth, R. K., Giuffrida, C., Bos, H., Holz, T..  2017.  Towards Automated Discovery of Crash-Resistant Primitives in Binary Executables. 2017 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN). :189–200.

Many modern defenses rely on address space layout randomization (ASLR) to efficiently hide security-sensitive metadata in the address space. Absent implementation flaws, an attacker can only bypass such defenses by repeatedly probing the address space for mapped (security-sensitive) regions, incurring a noisy application crash on any wrong guess. Recent work shows that modern applications contain idioms that allow the construction of crash-resistant code primitives, allowing an attacker to efficiently probe the address space without causing any visible crash. In this paper, we classify different crash-resistant primitives and show that this problem is much more prominent than previously assumed. More specifically, we show that rather than relying on labor-intensive source code inspection to find a few "hidden" application-specific primitives, an attacker can find such primitives semi-automatically, on many classes of real-world programs, at the binary level. To support our claims, we develop methods to locate such primitives in real-world binaries. We successfully identified 29 new potential primitives and constructed proof-of-concept exploits for four of them.

Ktob, A., Li, Z..  2017.  The Arabic Knowledge Graph: Opportunities and Challenges. 2017 IEEE 11th International Conference on Semantic Computing (ICSC). :48–52.

Semantic Web has brought forth the idea of computing with knowledge, hence, attributing the ability of thinking to machines. Knowledge Graphs represent a major advancement in the construction of the Web of Data where machines are context-aware when answering users' queries. The English Knowledge Graph was a milestone realized by Google in 2012. Even though it is a useful source of information for English users and applications, it does not offer much for the Arabic users and applications. In this paper, we investigated the different challenges and opportunities prone to the life-cycle of the construction of the Arabic Knowledge Graph (AKG) while following some best practices and techniques. Additionally, this work suggests some potential solutions to these challenges. The proprietary factor of data creates a major problem in the way of harvesting this latter. Moreover, when the Arabic data is openly available, it is generally in an unstructured form which requires further processing. The complexity of the Arabic language itself creates a further problem for any automatic or semi-automatic extraction processes. Therefore, the usage of NLP techniques is a feasible solution. Some preliminary results are presented later in this paper. The AKG has very promising outcomes for the Semantic Web in general and the Arabic community in particular. The goal of the Arabic Knowledge Graph is mainly the integration of the different isolated datasets available on the Web. Later, it can be used in both the academic (by providing a large dataset for many different research fields and enhance discovery) and commercial sectors (by improving search engines, providing metadata, interlinking businesses).

Nadgowda, S., Duri, S., Isci, C., Mann, V..  2017.  Columbus: Filesystem Tree Introspection for Software Discovery. 2017 IEEE International Conference on Cloud Engineering (IC2E). :67–74.

Software discovery is a key management function to ensure that systems are free of vulnerabilities, comply with licensing requirements, and support advanced search for systems containing given software. Today, software is predominantly discovered through querying package management tools, or using rules that check for file metadata or contents. These approaches are inadequate as not every software is installed through package managers, and agile development practices lead to frequent deployment of software. Other approaches to software discovery use machine learning methods requiring training phase, or require maintaining knowledge bases. Columbus uses the knowledge of the software packaging practices that evolved over time, and uses the information embedded in the file system impression created by a software package to discover it. Columbus is able to discover software in 92% of all official Docker images. Further, Columbus can be used in problem diagnosis and drift detection situations to compare two different systems, or to determine the evolution of a system overtime.

Kimmig, A., Memory, A., Miller, R. J., Getoor, L..  2017.  A Collective, Probabilistic Approach to Schema Mapping. 2017 IEEE 33rd International Conference on Data Engineering (ICDE). :921–932.

We propose a probabilistic approach to the problem of schema mapping. Our approach is declarative, scalable, and extensible. It builds upon recent results in both schema mapping and probabilistic reasoning and contributes novel techniques in both fields. We introduce the problem of mapping selection, that is, choosing the best mapping from a space of potential mappings, given both metadata constraints and a data example. As selection has to reason holistically about the inputs and the dependencies between the chosen mappings, we define a new schema mapping optimization problem which captures interactions between mappings. We then introduce Collective Mapping Discovery (CMD), our solution to this problem using stateof- the-art probabilistic reasoning techniques, which allows for inconsistencies and incompleteness. Using hundreds of realistic integration scenarios, we demonstrate that the accuracy of CMD is more than 33% above that of metadata-only approaches already for small data examples, and that CMD routinely finds perfect mappings even if a quarter of the data is inconsistent.

Diaz, J. S. B., Medeiros, C. B..  2017.  WorkflowHunt: Combining Keyword and Semantic Search in Scientific Workflow Repositories. 2017 IEEE 13th International Conference on e-Science (e-Science). :138–147.

Scientific datasets and the experiments that analyze them are growing in size and complexity, and scientists are facing difficulties to share such resources. Some initiatives have emerged to try to solve this problem. One of them involves the use of scientific workflows to represent and enact experiment execution. There is an increasing number of workflows that are potentially relevant for more than one scientific domain. However, it is hard to find workflows suitable for reuse given an experiment. Creating a workflow takes time and resources, and their reuse helps scientists to build new workflows faster and in a more reliable way. Search mechanisms in workflow repositories should provide different options for workflow discovery, but it is difficult for generic repositories to provide multiple mechanisms. This paper presents WorkflowHunt, a hybrid architecture for workflow search and discovery for generic repositories, which combines keyword and semantic search to allow finding relevant workflows using different search methods. We validated our architecture creating a prototype that uses real workflows and metadata from myExperiment, and compare search results via WorkflowHunt and via myExperiment's search interface.