Visible to the public A Novel Feature Hashing With Efficient Collision Resolution for Bag-of-Words Representation of Text Data

TitleA Novel Feature Hashing With Efficient Collision Resolution for Bag-of-Words Representation of Text Data
Publication TypeConference Paper
Year of Publication2018
AuthorsEclarin, Bobby A., Fajardo, Arnel C., Medina, Ruji P.
Conference NameProceedings of the 2Nd International Conference on Natural Language Processing and Information Retrieval
PublisherACM
ISBN Number978-1-4503-6551-2
Keywordsbag of words, Collision Resolution, composability, Feature Hashing, human factors, Metrics, pubcrawl, Scalability, text analytics, text mining
AbstractText Mining is widely used in many areas transforming unstructured text data from all sources such as patients' record, social media network, insurance data, and news, among others into an invaluable source of information. The Bag Of Words (BoW) representation is a means of extracting features from text data for use in modeling. In text classification, a word in a document is assigned a weight according to its frequency and frequency between different documents; therefore, words together with their weights form the BoW. One way to solve the issue of voluminous data is to use the feature hashing method or hashing trick. However, collision is inevitable and might change the result of the whole process of feature generation and selection. Using the vector data structure, the lookup performance is improved while resolving collision and the memory usage is also efficient.
DOI10.1145/3278293.3278301
Citation Keyeclarin_novel_2018