Visible to the public Time Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams

TitleTime Adaptive Sketches (Ada-Sketches) for Summarizing Data Streams
Publication TypeConference Paper
Year of Publication2016
AuthorsShrivastava, Anshumali, Konig, Arnd Christian, Bilenko, Mikhail
Conference NameProceedings of the 2016 International Conference on Management of Data
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-3531-7
KeywordsAlgorithm, approximate counting algorithms, big-data mining, composability, count-min sketches, hash algorithms, hashing, Metrics, pubcrawl, randomized algorithms, Resiliency, Scalability, Sketching, streaming
Abstract

Obtaining frequency information of data streams, in limited space, is a well-recognized problem in literature. A number of recent practical applications (such as those in computational advertising) require temporally-aware solutions: obtaining historical count statistics for both time-points as well as time-ranges. In these scenarios, accuracy of estimates is typically more important for recent instances than for older ones; we call this desirable property Time Adaptiveness. With this observation, [20] introduced the Hokusai technique based on count-min sketches for estimating the frequency of any given item at any given time. The proposed approach is problematic in practice, as its memory requirements grow linearly with time, and it produces discontinuities in the estimation accuracy. In this work, we describe a new method, Time-adaptive Sketches, (Ada-sketch), that overcomes these limitations, while extending and providing a strict generalization of several popular sketching algorithms. The core idea of our method is inspired by the well-known digital Dolby noise reduction procedure that dates back to the 1960s. The theoretical analysis presented could be of independent interest in itself, as it provides clear results for the time-adaptive nature of the errors. An experimental evaluation on real streaming datasets demonstrates the superiority of the described method over Hokusai in estimating point and range queries over time. The method is simple to implement and offers a variety of design choices for future extensions. The simplicity of the procedure and the method's generalization of classic sketching techniques give hope for wide applicability of Ada-sketches in practice.

URLhttp://doi.acm.org/10.1145/2882903.2882946
DOI10.1145/2882903.2882946
Citation Keyshrivastava_time_2016