Automatic Entity Recognition and Typing in Massive Text Data
Title | Automatic Entity Recognition and Typing in Massive Text Data |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Ren, Xiang, El-Kishky, Ahmed, Ji, Heng, Han, Jiawei |
Conference Name | Proceedings of the 2016 International Conference on Management of Data |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-3531-7 |
Keywords | entity, entity recognition, entity typing, phrase mining, phrases, pubcrawl170201, text mining, typing |
Abstract | In today's computerized and information-based society, individuals are constantly presented with vast amounts of text data, ranging from news articles, scientific publications, product reviews, to a wide range of textual information from social media. To extract value from these large, multi-domain pools of text, it is of great importance to gain an understanding of entities and their relationships. In this tutorial, we introduce data-driven methods to recognize typed entities of interest in massive, domain-specific text corpora. These methods can automatically identify token spans as entity mentions in documents and label their fine-grained types (e.g., people, product and food) in a scalable way. Since these methods do not rely on annotated data, predefined typing schema or hand-crafted features, they can be quickly adapted to a new domain, genre and language. We demonstrate on real datasets including various genres (e.g., news articles, discussion forum posts, and tweets), domains (general vs. bio-medical domains) and languages (e.g., English, Chinese, Arabic, and even low-resource languages like Hausa and Yoruba) how these typed entities aid in knowledge discovery and management. |
URL | http://doi.acm.org/10.1145/2882903.2912567 |
DOI | 10.1145/2882903.2912567 |
Citation Key | ren_automatic_2016 |