Visible to the public Relaxing Orthogonality Assumption in Conceptual Text Document Similarity

TitleRelaxing Orthogonality Assumption in Conceptual Text Document Similarity
Publication TypeConference Paper
Year of Publication2016
AuthorsWang, Xiangru, Nourashrafeddin, Seyednaser, Milios, Evangelos
Conference NameProceedings of the 2016 ACM Symposium on Document Engineering
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4438-8
Keywordscomposability, concept relatedness, Human Behavior, Metrics, pubcrawl, Scalability, semantic similarity, text analytics, text clustering, wikipedia
Abstract

By reflecting the degree of proximity or remoteness of documents, similarity measure plays the key role in text analytics. Traditional measures, e.g. cosine similarity, assume that documents are represented in an orthogonal space formed by words as dimensions. Words are considered independent from each other and document similarity is computed based on lexical overlap. This assumption is also made in the bag of concepts representation of documents while the space is formed by concepts. This paper proposes new semantic similarity measures without relying on the orthogonality assumption. By employing Wikipedia as an external resource, we introduce five similarity measures using concept-concept relatedness. Experimental results on real text datasets reveal that eliminating the orthogonality assumption improves the quality of text clustering algorithms.

URLhttp://doi.acm.org/10.1145/2960811.2960813
DOI10.1145/2960811.2960813
Citation Keywang_relaxing_2016