Visible to the public Automatic Entity Recognition and Typing in Massive Text Data

TitleAutomatic Entity Recognition and Typing in Massive Text Data
Publication TypeConference Paper
Year of Publication2016
AuthorsRen, Xiang, El-Kishky, Ahmed, Ji, Heng, Han, Jiawei
Conference NameProceedings of the 2016 International Conference on Management of Data
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-3531-7
Keywordsentity, entity recognition, entity typing, phrase mining, phrases, pubcrawl170201, text mining, typing
Abstract

In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g., people, product and food) in a scalable way. Since these methods do not rely on annotated data, predefined typing schema or hand-crafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g., news articles, discussion forum posts, and tweets), domains (general vs. bio-medical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management.

URLhttp://doi.acm.org/10.1145/2882903.2912567
DOI10.1145/2882903.2912567
Citation Keyren_automatic_2016