Sequential Data Cleaning: A Statistical Approach
Title | Sequential Data Cleaning: A Statistical Approach |
Publication Type | Conference Paper |
Year of Publication | 2016 |
Authors | Zhang, Aoqian, Song, Shaoxu, Wang, Jianmin |
Conference Name | Proceedings of the 2016 International Conference on Management of Data |
Publisher | ACM |
Conference Location | New York, NY, USA |
ISBN Number | 978-1-4503-3531-7 |
Keywords | likelihood-based cleaning, pubcrawl170201, speed changes |
Abstract | Errors are prevalent in data sequences, such as GPS trajectories or sensor readings. Existing methods on cleaning sequential data employ a constraint on value changing speeds and perform constraint-based repairing. While such speed constraints are effective in identifying large spike errors, the small errors that do not significantly deviate from the truth and indeed satisfy the speed constraints can hardly be identified and repaired. To handle such small errors, in this paper, we propose a statistical based cleaning method. Rather than declaring a broad constraint of max/min speeds, we model the probability distribution of speed changes. The repairing problem is thus to maximize the likelihood of the sequence w.r.t. the probability of speed changes. We formalize the likelihood-based cleaning problem, show its NP-hardness, devise exact algorithms, and propose several approximate/heuristic methods to trade off effectiveness for efficiency. Experiments on real data sets (in various applications) demonstrate the superiority of our proposal. |
URL | http://doi.acm.org/10.1145/2882903.2915233 |
DOI | 10.1145/2882903.2915233 |
Citation Key | zhang_sequential_2016 |