Terms Mining in Document-Based NoSQL: Response to Unstructured Data
Title | Terms Mining in Document-Based NoSQL: Response to Unstructured Data |
Publication Type | Conference Paper |
Year of Publication | 2014 |
Authors | Lomotey, R.K., Deters, R. |
Conference Name | Big Data (BigData Congress), 2014 IEEE International Congress on |
Date Published | June |
Keywords | analytics-as-a-service framework, association rules, Big Bata, Big Data, classification, Classification algorithms, clustering, data mining, data mining techniques, database management systems, Databases, Dictionaries, document handling, document-based NoSQL, NoSQL, NoSQL database, pattern classification, pattern clustering, Semantics, term classification, Terms, terms mining, text analysis, topics mining, Unstructured Data Mining, unstructured data storage, Viterbi algorithm |
Abstract | Unstructured data mining has become topical recently due to the availability of high-dimensional and voluminous digital content (known as "Big Data") across the enterprise spectrum. The Relational Database Management Systems (RDBMS) have been employed over the past decades for content storage and management, but, the ever-growing heterogeneity in today's data calls for a new storage approach. Thus, the NoSQL database has emerged as the preferred storage facility nowadays since the facility supports unstructured data storage. This creates the need to explore efficient data mining techniques from such NoSQL systems since the available tools and frameworks which are designed for RDBMS are often not directly applicable. In this paper, we focused on topics and terms mining, based on clustering, in document-based NoSQL. This is achieved by adapting the architectural design of an analytics-as-a-service framework and the proposal of the Viterbi algorithm to enhance the accuracy of the terms classification in the system. The results from the pilot testing of our work show higher accuracy in comparison to some previously proposed techniques such as the parallel search. |
DOI | 10.1109/BigData.Congress.2014.99 |
Citation Key | 6906842 |
- document-based NoSQL
- Viterbi algorithm
- unstructured data storage
- Unstructured Data Mining
- topics mining
- text analysis
- terms mining
- terms
- term classification
- Semantics
- pattern clustering
- pattern classification
- NoSQL database
- NoSQL
- analytics-as-a-service framework
- document handling
- Dictionaries
- Databases
- database management systems
- data mining techniques
- Data mining
- clustering
- Classification algorithms
- classification
- Big Data
- Big Bata
- association rules