Biblio
In this article, I present the questions that I seek to answer in my PhD research. I posit to analyze natural language text with the help of semantic annotations and mine important events for navigating large text corpora. Semantic annotations such as named entities, geographic locations, and temporal expressions can help us mine events from the given corpora. These events thus provide us with useful means to discover the locked knowledge in them. I pose three problems that can help unlock this knowledge vault in semantically annotated text corpora: i. identifying important events; ii. semantic search; iii. and event analytics.
Online conversations, such as blogs, provide rich amount of information and opinions about popular queries. Given a query, traditional blog sites return a set of conversations often consisting of thousands of comments with complex thread structure. Since the interfaces of these blog sites do not provide any overview of the data, it becomes very difficult for the user to explore and analyze such a large amount of conversational data. In this paper, we present MultiConVis, a visual text analytics system designed to support the exploration of a collection of online conversations. Our system tightly integrates NLP techniques for topic modeling and sentiment analysis with information visualizations, by considering the unique characteristics of online conversations. The resulting interface supports the user exploration, starting from a possibly large set of conversations, then narrowing down to the subset of conversations, and eventually drilling-down to the set of comments of one conversation. Our evaluations through case studies with domain experts and a formal user study with regular blog readers illustrate the potential benefits of our approach, when compared to a traditional blog reading interface.
An increasing number of people are using online social networking services (SNSs), and a significant amount of information related to experiences in consumption is shared in this new media form. Text mining is an emerging technique for mining useful information from the web. We aim at discovering in particular tweets semantic patterns in consumers' discussions on social media. Specifically, the purposes of this study are twofold: 1) finding similarity and dissimilarity between two sets of textual documents that include consumers' sentiment polarities, two forms of positive vs. negative opinions and 2) driving actual content from the textual data that has a semantic trend. The considered tweets include consumers' opinions on US retail companies (e.g., Amazon, Walmart). Cosine similarity and K-means clustering methods are used to achieve the former goal, and Latent Dirichlet Allocation (LDA), a popular topic modeling algorithm, is used for the latter purpose. This is the first study which discover semantic properties of textual data in consumption context beyond sentiment analysis. In addition to major findings, we apply LDA (Latent Dirichlet Allocations) to the same data and drew latent topics that represent consumers' positive opinions and negative opinions on social media.
This paper describes a high-performance and space-efficient memory-resident datastore for text analytics systems based on a hash table for fast access, a dynamic trie for staging and a list of Level-Order Unary Degree Sequence (LOUDS) tries for compactness. We achieve efficient memory allocation and data placement by placing freqently access keys in the hash table, and infrequently accessed keys in the LOUDS tries without using conventional cache algorithms. Our algorithm also dynamically changes memory allocation sizes for these data structures according to the remaining available memory size. This technique yields 38.6% to 52.9% better throughput than a double array trie - a conventional fast and compact datastore.
Word clouds have emerged as a straightforward and visually appealing visualization method for text. They are used in various contexts as a means to provide an overview by distilling text down to those words that appear with highest frequency. Typically, this is done in a static way as pure text summarization. We think, however, that there is a larger potential to this simple yet powerful visualization paradigm in text analytics. In this work, we explore the usefulness of word clouds for general text analysis tasks. We developed a prototypical system called the Word Cloud Explorer that relies entirely on word clouds as a visualization method. It equips them with advanced natural language processing, sophisticated interaction techniques, and context information. We show how this approach can be effectively used to solve text analysis tasks and evaluate it in a qualitative user study.
- « first
- ‹ previous
- 1
- 2
- 3