Visible to the public Biblio

Filters: Author is Wang, Xiangru  [Clear All Filters]
2017-05-19
Wang, Xiangru, Nourashrafeddin, Seyednaser, Milios, Evangelos.  2016.  Relaxing Orthogonality Assumption in Conceptual Text Document Similarity. Proceedings of the 2016 ACM Symposium on Document Engineering. :69–78.

By reflecting the degree of proximity or remoteness of documents, similarity measure plays the key role in text analytics. Traditional measures, e.g. cosine similarity, assume that documents are represented in an orthogonal space formed by words as dimensions. Words are considered independent from each other and document similarity is computed based on lexical overlap. This assumption is also made in the bag of concepts representation of documents while the space is formed by concepts. This paper proposes new semantic similarity measures without relying on the orthogonality assumption. By employing Wikipedia as an external resource, we introduce five similarity measures using concept-concept relatedness. Experimental results on real text datasets reveal that eliminating the orthogonality assumption improves the quality of text clustering algorithms.