Relaxing Orthogonality Assumption in Conceptual Text Document Similarity
Title | Relaxing Orthogonality Assumption in Conceptual Text Document Similarity |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Wang, Xiangru, Nourashrafeddin, Seyednaser, Milios, Evangelos |
Conference Name | Proceedings of the 2016 ACM Symposium on Document Engineering |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-4438-8 |
Keywords | composability, concept relatedness, Human Behavior, Metrics, pubcrawl, Scalability, semantic similarity, text analytics, text clustering, wikipedia |
Abstract | By reflecting the degree of proximity or remoteness of documents, similarity measure plays the key role in text analytics. Traditional measures, e.g. cosine similarity, assume that documents are represented in an orthogonal space formed by words as dimensions. Words are considered independent from each other and document similarity is computed based on lexical overlap. This assumption is also made in the bag of concepts representation of documents while the space is formed by concepts. This paper proposes new semantic similarity measures without relying on the orthogonality assumption. By employing Wikipedia as an external resource, we introduce five similarity measures using concept-concept relatedness. Experimental results on real text datasets reveal that eliminating the orthogonality assumption improves the quality of text clustering algorithms. |
URL | http://doi.acm.org/10.1145/2960811.2960813 |
DOI | 10.1145/2960811.2960813 |
Citation Key | wang_relaxing_2016 |