Biblio
One way of evaluating the reusability of a test collection is to determine whether removing the unique contributions of some system would alter the preference order between that system and others. Rank correlation measures such as Kendall's tau are often used for this purpose. Rank correlation measures are appropriate for ordinal measures in which only preference order is important, but many evaluation measures produce system scores in which both the preference order and the magnitude of the score difference are important. Such measures are referred to as interval. Pearson's rho offers one way in which correlation can be computed over results from an interval measure such that smaller errors in the gap size are preferred. When seeking to improve over existing systems, we care the most about comparisons among the best systems. For that purpose we prefer head-weighed measures such as tau\_AP, which is designed for ordinal data. No present head weighted measure fully leverages the information present in interval effectiveness measures. This paper introduces such a measure, referred to as Pearson Rank.