Visible to the public Biblio

Filters: Author is Fajardo, Arnel C.  [Clear All Filters]
2019-02-14
Eclarin, Bobby A., Fajardo, Arnel C., Medina, Ruji P..  2018.  A Novel Feature Hashing With Efficient Collision Resolution for Bag-of-Words Representation of Text Data. Proceedings of the 2Nd International Conference on Natural Language Processing and Information Retrieval. :12-16.
Text Mining is widely used in many areas transforming unstructured text data from all sources such as patients' record, social media network, insurance data, and news, among others into an invaluable source of information. The Bag Of Words (BoW) representation is a means of extracting features from text data for use in modeling. In text classification, a word in a document is assigned a weight according to its frequency and frequency between different documents; therefore, words together with their weights form the BoW. One way to solve the issue of voluminous data is to use the feature hashing method or hashing trick. However, collision is inevitable and might change the result of the whole process of feature generation and selection. Using the vector data structure, the lookup performance is improved while resolving collision and the memory usage is also efficient.