Title | ES2Vec: Earth Science Metadata Keyword Assignment using Domain-Specific Word Embeddings |
Publication Type | Conference Paper |
Year of Publication | 2020 |
Authors | Ramasubramanian, Muthukumaran, Muhammad, Hassan, Gurung, Iksha, Maskey, Manil, Ramachandran, Rahul |
Conference Name | 2020 SoutheastCon |
Keywords | classifier, compositionality, Geoscience, Keyword Classification, machine learning, metadata, Metadata Discovery Problem, natural language processing, Neural Network, pubcrawl, resilience, Resiliency, Scalability, Semantics, Task Analysis, Tools, user interfaces, Word2Vec |
Abstract | Earth science metadata keyword assignment is a challenging problem. Dataset curators select appropriate keywords from the Global Change Master Directory (GCMD) set of keywords. The keywords are integral part of search and discovery of these datasets. Hence, selection of keywords are crucial in increasing the discoverability of datasets. Utilizing machine learning techniques, we provide users with automated keyword suggestions as an improved approach to complement manual selection. We trained a machine learning model that leverages the semantic embedding ability of Word2Vec models to process abstracts and suggest relevant keywords. A user interface tool we built to assist data curators in assignment of such keywords is also described. |
DOI | 10.1109/SoutheastCon44009.2020.9249743 |
Citation Key | ramasubramanian_es2vec_2020 |