Visible to the public Biblio

Found 2348 results

Filters: Keyword is privacy  [Clear All Filters]
2018-09-05
Gai, K., Qiu, M..  2017.  An Optimal Fully Homomorphic Encryption Scheme. 2017 ieee 3rd international conference on big data security on cloud (bigdatasecurity), ieee international conference on high performance and smart computing (hpsc), and ieee international conference on intelligent data and security (ids). :101–106.

The expeditious expansion of the networking technologies have remarkably driven the usage of the distributedcomputing as well as services, such as task offloading to the cloud. However, security and privacy concerns are restricting the implementations of cloud computing because of the threats from both outsiders and insiders. The primary alternative of protecting users' data is developing a Fully Homomorphic Encryption (FHE) scheme, which can cover both data protections and data processing in the cloud. Despite many previous attempts addressing this approach, none of the proposed work can simultaneously satisfy two requirements that include the non-noise accuracy and an efficiency execution. This paper focuses on the issue of FHE design and proposes a novel FHE scheme, which is called Optimal Fully Homomorphic Encryption (O-FHE). Our approach utilizes the properties of the Kronecker Product (KP) and designs a mechanism of achieving FHE, which consider both accuracy and efficiency. We have assessed our scheme in both theoretical proofing and experimental evaluations with the confirmed and exceptional results.

Mayle, A., Bidoki, N. H., Masnadi, S., Boeloeni, L., Turgut, D..  2017.  Investigating the Value of Privacy within the Internet of Things. GLOBECOM 2017 - 2017 IEEE Global Communications Conference. :1–6.

Many companies within the Internet of Things (IoT) sector rely on the personal data of users to deliver and monetize their services, creating a high demand for personal information. A user can be seen as making a series of transactions, each involving the exchange of personal data for a service. In this paper, we argue that privacy can be described quantitatively, using the game- theoretic concept of value of information (VoI), enabling us to assess whether each exchange is an advantageous one for the user. We introduce PrivacyGate, an extension to the Android operating system built for the purpose of studying privacy of IoT transactions. An example study, and its initial results, are provided to illustrate its capabilities.

Jia, R., Dong, R., Ganesh, P., Sastry, S., Spanos, C..  2017.  Towards a theory of free-lunch privacy in cyber-physical systems. 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton). :902–910.

Emerging cyber-physical systems (CPS) often require collecting end users' data to support data-informed decision making processes. There has been a long-standing argument as to the tradeoff between privacy and data utility. In this paper, we adopt a multiparametric programming approach to rigorously study conditions under which data utility has to be sacrificed to protect privacy and situations where free-lunch privacy can be achieved, i.e., data can be concealed without hurting the optimality of the decision making underlying the CPS. We formalize the concept of free-lunch privacy, and establish various results on its existence, geometry, as well as efficient computation methods. We propose the free-lunch privacy mechanism, which is a pragmatic mechanism that exploits free-lunch privacy if it exists with the constant guarantee of optimal usage of data. We study the resilience of this mechanism against attacks that attempt to infer the parameter of a user's data generating process. We close the paper by a case study on occupancy-adaptive smart home temperature control to demonstrate the efficacy of the mechanism.

Gaikwad, V. S., Gandle, K. S..  2017.  Ideal complexity cryptosystem with high privacy data service for cloud databases. 2017 1st International Conference on Intelligent Systems and Information Management (ICISIM). :267–270.

Data storage in cloud should come along with high safety and confidentiality. It is accountability of cloud service provider to guarantee the availability and security of client data. There exist various alternatives for storage services but confidentiality and complexity solutions for database as a service are still not satisfactory. Proposed system gives alternative solution for database as a service that integrates benefits of different services along with advance encryption techniques. It yields possibility of applying concurrency on encrypted data. This alternative provides supporting facility to connect dispersed clients with elimination of intermediate proxy by which simplicity can acquired. Performance of proposed system evaluated on basis of theoretical analyses.

Li, C., Palanisamy, B., Joshi, J..  2017.  Differentially Private Trajectory Analysis for Points-of-Interest Recommendation. 2017 IEEE International Congress on Big Data (BigData Congress). :49–56.

Ubiquitous deployment of low-cost mobile positioning devices and the widespread use of high-speed wireless networks enable massive collection of large-scale trajectory data of individuals moving on road networks. Trajectory data mining finds numerous applications including understanding users' historical travel preferences and recommending places of interest to new visitors. Privacy-preserving trajectory mining is an important and challenging problem as exposure of sensitive location information in the trajectories can directly invade the location privacy of the users associated with the trajectories. In this paper, we propose a differentially private trajectory analysis algorithm for points-of-interest recommendation to users that aims at maximizing the accuracy of the recommendation results while protecting the privacy of the exposed trajectories with differential privacy guarantees. Our algorithm first transforms the raw trajectory dataset into a bipartite graph with nodes representing the users and the points-of-interest and the edges representing the visits made by the users to the locations, and then extracts the association matrix representing the bipartite graph to inject carefully calibrated noise to meet έ-differential privacy guarantees. A post-processing of the perturbed association matrix is performed to suppress noise prior to performing a Hyperlink-Induced Topic Search (HITS) on the transformed data that generates an ordered list of recommended points-of-interest. Extensive experiments on a real trajectory dataset show that our algorithm is efficient, scalable and demonstrates high recommendation accuracy while meeting the required differential privacy guarantees.

Takbiri, N., Houmansadr, A., Goeckel, D. L., Pishro-Nik, H..  2017.  Limits of location privacy under anonymization and obfuscation. 2017 IEEE International Symposium on Information Theory (ISIT). :764–768.

The prevalence of mobile devices and location-based services (LBS) has generated great concerns regarding the LBS users' privacy, which can be compromised by statistical analysis of their movement patterns. A number of algorithms have been proposed to protect the privacy of users in such systems, but the fundamental underpinnings of such remain unexplored. Recently, the concept of perfect location privacy was introduced and its achievability was studied for anonymization-based LBS systems, where user identifiers are permuted at regular intervals to prevent identification based on statistical analysis of long time sequences. In this paper, we significantly extend that investigation by incorporating the other major tool commonly employed to obtain location privacy: obfuscation, where user locations are purposely obscured to protect their privacy. Since anonymization and obfuscation reduce user utility in LBS systems, we investigate how location privacy varies with the degree to which each of these two methods is employed. We provide: (1) achievability results for the case where the location of each user is governed by an i.i.d. process; (2) converse results for the i.i.d. case as well as the more general Markov Chain model. We show that, as the number of users in the network grows, the obfuscation-anonymization plane can be divided into two regions: in the first region, all users have perfect location privacy; and, in the second region, no user has location privacy.

Palanisamy, B., Li, C., Krishnamurthy, P..  2017.  Group Differential Privacy-Preserving Disclosure of Multi-level Association Graphs. 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). :2587–2588.

Traditional privacy-preserving data disclosure solutions have focused on protecting the privacy of individual's information with the assumption that all aggregate (statistical) information about individuals is safe for disclosure. Such schemes fail to support group privacy where aggregate information about a group of individuals may also be sensitive and users of the published data may have different levels of access privileges entitled to them. We propose the notion ofεg-Group Differential Privacy that protects sensitive information of groups of individuals at various defined privacy levels, enabling data users to obtain the level of access entitled to them. We present a preliminary evaluation of the proposed notion of group privacy through experiments on real association graph data that demonstrate the guarantees on group privacy on the disclosed data.

Li, W., Song, T., Li, Y., Ma, L., Yu, J., Cheng, X..  2017.  A Hierarchical Game Framework for Data Privacy Preservation in Context-Aware IoT Applications. 2017 IEEE Symposium on Privacy-Aware Computing (PAC). :176–177.

Due to the increasing concerns of securing private information, context-aware Internet of Things (IoT) applications are in dire need of supporting data privacy preservation for users. In the past years, game theory has been widely applied to design secure and privacy-preserving protocols for users to counter various attacks, and most of the existing work is based on a two-player game model, i.e., a user/defender-attacker game. In this paper, we consider a more practical scenario which involves three players: a user, an attacker, and a service provider, and such a complicated system renders any two-player model inapplicable. To capture the complex interactions between the service provider, the user, and the attacker, we propose a hierarchical two-layer three-player game framework. Finally, we carry out a comprehensive numerical study to validate our proposed game framework and theoretical analysis.

2018-08-23
Wang, Ruowen, Azab, Ahmed M., Enck, William, Li, Ninghui, Ning, Peng, Chen, Xun, Shen, Wenbo, Cheng, Yueqiang.  2017.  SPOKE: Scalable Knowledge Collection and Attack Surface Analysis of Access Control Policy for Security Enhanced Android. Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security. :612–624.

SEAndroid is a mandatory access control (MAC) framework that can confine faulty applications on Android. Nevertheless, the effectiveness of SEAndroid enforcement depends on the employed policy. The growing complexity of Android makes it difficult for policy engineers to have complete domain knowledge on every system functionality. As a result, policy engineers sometimes craft over-permissive and ineffective policy rules, which unfortunately increased the attack surface of the Android system and have allowed multiple real-world privilege escalation attacks. We propose SPOKE, an SEAndroid Policy Knowledge Engine, that systematically extracts domain knowledge from rich-semantic functional tests and further uses the knowledge for characterizing the attack surface of SEAndroid policy rules. Our attack surface analysis is achieved by two steps: 1) It reveals policy rules that cannot be justified by the collected domain knowledge. 2) It identifies potentially over-permissive access patterns allowed by those unjustified rules as the attack surface. We evaluate SPOKE using 665 functional tests targeting 28 different categories of functionalities developed by Samsung Android Team. SPOKE successfully collected 12,491 access patterns for the 28 categories as domain knowledge, and used the knowledge to reveal 320 unjustified policy rules and 210 over-permissive access patterns defined by those rules, including one related to the notorious libstagefright vulnerability. These findings have been confirmed by policy engineers.

Zou, Yang, Zeng, Xiaoqin, Liu, Yufeng, Liu, Huiyi.  2017.  Partial Precedence of Context-sensitive Graph Grammars. Proceedings of the 10th International Symposium on Visual Information Communication and Interaction. :16–23.
Context-sensitive graph grammars have been rigorous formalisms for specifying visual programming languages, as they possess sufficient expressive powers and intuitive forms. Efficient parsing mechanisms are essential to these formalisms. However, the existent parsing algorithms are either inefficient or confined to a minority of graph grammars. This paper introduces the notion of partial precedence, defines the partial precedence graph of a graph grammar and theoretically unveils the existence of a valid parsing path conforming to the topological orderings of the partial precedence graph. Then, it provides algorithms for computing the partial precedence graph and presents an approach to improving general parsing algorithms with the graph based on the drawn conclusion. It is shown that the approach can considerably improve the efficiency of general parsing algorithms.
Deterding, Sebastian, Hook, Jonathan, Fiebrink, Rebecca, Gillies, Marco, Gow, Jeremy, Akten, Memo, Smith, Gillian, Liapis, Antonios, Compton, Kate.  2017.  Mixed-Initiative Creative Interfaces. Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems. :628–635.

Enabled by artificial intelligence techniques, we are witnessing the rise of a new paradigm of computational creativity support: mixed-initiative creative interfaces put human and computer in a tight interactive loop where each suggests, produces, evaluates, modifies, and selects creative outputs in response to the other. This paradigm could broaden and amplify creative capacity for all, but has so far remained mostly confined to artificial intelligence for game content generation, and faces many unsolved interaction design challenges. This workshop therefore convenes CHI and game researchers to advance mixed-initiative approaches to creativity support.

Vora, Keval, Tian, Chen, Gupta, Rajiv, Hu, Ziang.  2017.  CoRAL: Confined Recovery in Distributed Asynchronous Graph Processing. Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. :223–236.
Existing distributed asynchronous graph processing systems employ checkpointing to capture globally consistent snapshots and rollback all machines to most recent checkpoint to recover from machine failures. In this paper we argue that recovery in distributed asynchronous graph processing does not require the entire execution state to be rolled back to a globally consistent state due to the relaxed asynchronous execution semantics. We define the properties required in the recovered state for it to be usable for correct asynchronous processing and develop CoRAL, a lightweight checkpointing and recovery algorithm. First, this algorithm carries out confined recovery that only rolls back graph execution states of the failed machines to affect recovery. Second, it relies upon lightweight checkpoints that capture locally consistent snapshots with a reduced peak network bandwidth requirement. Our experiments using real-world graphs show that our technique recovers from failures and finishes processing 1.5x to 3.2x faster compared to the traditional asynchronous checkpointing and recovery mechanism when failures impact 1 to 6 machines of a 16 machine cluster. Moreover, capturing locally consistent snapshots significantly reduces intermittent high peak bandwidth usage required to save the snapshots – the average reduction in 99th percentile bandwidth ranges from 22% to 51% while 1 to 6 snapshot replicas are being maintained.
Seal, S. K., Cianciosa, M. R., Hirshman, S. P., Wingen, A., Wilcox, R. S., Unterberg, E. A..  2017.  Parallel Reconstruction of Three Dimensional Magnetohydrodynamic Equilibria in Plasma Confinement Devices. 2017 46th International Conference on Parallel Processing (ICPP). :282–291.

Fast, accurate three dimensional reconstructions of plasma equilibria, crucial for physics interpretation of fusion data generated within confinement devices like stellarators/ tokamaks, are computationally very expensive and routinely require days, even weeks, to complete using serial approaches. Here, we present a parallel implementation of the three dimensional plasma reconstruction code, V3FIT. A formal analysis to identify the performance bottlenecks and scalability limits of this new parallel implementation, which combines both task and data parallelism, is presented. The theoretical findings are supported by empirical performance results on several thousands of processor cores of a Cray XC30 supercomputer. Parallel V3FIT is shown to deliver over 40X speedup, enabling fusion scientists to carry out three dimensional plasma equilibrium reconstructions at unprecedented scales in only a few hours (instead of in days/weeks) for the first time.

Avrutin, E. A., Ryvkin, B. S., Kostamovaara, J. T..  2017.  Increasing output power of pulsed-eye safe wavelength range laser diodes by strong doping of the n-optical confinement layer. 2017 IEEE High Power Diode Lasers and Systems Conference (HPD). :17–18.

A semi-analytical model for internal optical losses at high power in a 1.5 μm laser diode with strong n-doping in the n-side of the optical confinement layer is created. The model includes intervalence band absorption by holes supplied by both current flow and two-photon absorption. The resulting losses are shown to be substantially lower than those in a similar, but weakly doped structure. Thus a significant improvement in the output power and efficiency by strong n-doping is predicted.

Ziegler, A., Luisier, M..  2017.  Phonon confinement effects in diffusive quantum transport simulations with the effective mass approximation and k·p method. 2017 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD). :25–28.

Despite the continuous shrinking of the transistor dimensions, advanced modeling tools going beyond the ballistic limit of transport are still critically needed to ensure accurate device investigations. For that purpose we present here a straight-forward approach to include phonon confinement effects into dissipative quantum transport calculations based on the effective mass approximation (EMA) and the k·p method. The idea is to scale the magnitude of the deformation potentials describing the electron-phonon coupling to obtain the same low-field mobility as with full-band simulations and confined phonons. This technique is validated by demonstrating that after adjusting the mobility value of n- and p-type silicon nanowire transistors, the resulting EMA and k·p I-V characteristics agree well with those derived from full-band studies.

Ji, X., Yao, X., Tadayon, M. A., Mohanty, A., Hendon, C. P., Lipson, M..  2017.  High confinement and low loss Si3N4waveguides for miniaturizing optical coherence tomography. 2017 Conference on Lasers and Electro-Optics (CLEO). :1–2.

We show high confinement thermally tunable, low loss (0.27 ± 0.04 dB/cm) Si3N4waveguides that are 42 cm long. We show that this platform can enable the miniaturization of traditionally bulky active OCT components.

Bader, S., Gerlach, P., Michalzik, R..  2017.  Optically controlled current confinement in parallel-driven VCSELs. 2017 Conference on Lasers and Electro-Optics Europe European Quantum Electronics Conference (CLEO/Europe-EQEC). :1–1.

We have presented a unique PT-VCSEL arrangement which experimentally demonstrates the process of optically controlled current confinement. Lessons learned will be transferred to future generations of solitary device which will be optimized with respect to the degree of confinement (depending on the parameters of the PT, in particular the current gain), threshold current and electro-optic efficiency.

Keeler, G. A., Campione, S., Wood, M. G., Serkland, D. K., Parameswaran, S., Ihlefeld, J., Luk, T. S., Wendt, J. R., Geib, K. M..  2017.  Reducing optical confinement losses for fast, efficient nanophotonic modulators. 2017 IEEE Photonics Society Summer Topical Meeting Series (SUM). :201–202.

We demonstrate high-speed operation of ultracompact electroabsorption modulators based on epsilon-near-zero confinement in indium oxide (In$_\textrm2$$_\textrm3$\$) on silicon using field-effect carrier density tuning. Additionally, we discuss strategies to enhance modulator performance and reduce confinement-related losses by introducing high-mobility conducting oxides such as cadmium oxide (CdO).

Malavolta, Giulio, Moreno-Sanchez, Pedro, Kate, Aniket, Maffei, Matteo, Ravi, Srivatsan.  2017.  Concurrency and Privacy with Payment-Channel Networks. Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. :455–471.
Permissionless blockchains protocols such as Bitcoin are inherently limited in transaction throughput and latency. Current efforts to address this key issue focus on off-chain payment channels that can be combined in a Payment-Channel Network (PCN) to enable an unlimited number of payments without requiring to access the blockchain other than to register the initial and final capacity of each channel. While this approach paves the way for low latency and high throughput of payments, its deployment in practice raises several privacy concerns as well as technical challenges related to the inherently concurrent nature of payments that have not been sufficiently studied so far. In this work, we lay the foundations for privacy and concurrency in PCNs, presenting a formal definition in the Universal Composability framework as well as practical and provably secure solutions. In particular, we present Fulgor and Rayo. Fulgor is the first payment protocol for PCNs that provides provable privacy guarantees for PCNs and is fully compatible with the Bitcoin scripting system. However, Fulgor is a blocking protocol and therefore prone to deadlocks of concurrent payments as in currently available PCNs. Instead, Rayo is the first protocol for PCNs that enforces non-blocking progress (i.e., at least one of the concurrent payments terminates). We show through a new impossibility result that non-blocking progress necessarily comes at the cost of weaker privacy. At the core of Fulgor and Rayo is Multi-Hop HTLC, a new smart contract, compatible with the Bitcoin scripting system, that provides conditional payments while reducing running time and communication overhead with respect to previous approaches. Our performance evaluation of Fulgor and Rayo shows that a payment with 10 intermediate users takes as few as 5 seconds, thereby demonstrating their feasibility to be deployed in practice.
Li, Xin.  2017.  Improved Non-malleable Extractors, Non-malleable Codes and Independent Source Extractors. Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing. :1144–1156.
In this paper we give improved constructions of several central objects in the literature of randomness extraction and tamper-resilient cryptography. Our main results are: (1) An explicit seeded non-malleable extractor with error � and seed length d=O(logn)+O(log(1/�)loglog(1/�)), that supports min-entropy k=Ω(d) and outputs Ω(k) bits. Combined with the protocol by Dodis and Wichs, this gives a two round privacy amplification protocol with optimal entropy loss in the presence of an active adversary, for all security parameters up to Ω(k/logk), where k is the min-entropy of the shared weak random source. Previously, the best known seeded non-malleable extractors require seed length and min-entropy O(logn)+log(1/�)2O�loglog(1/�), and only give two round privacy amplification protocols with optimal entropy loss for security parameter up to k/2O(�logk). (2) An explicit non-malleable two-source extractor for min entropy k � (1��)n, some constant �\textbackslashtextgreater0, that outputs Ω(k) bits with error 2�Ω(n/logn). We further show that we can efficiently uniformly sample from the pre-image of any output of the extractor. Combined with the connection found by Cheraghchi and Guruswami this gives a non-malleable code in the two-split-state model with relative rate Ω(1/logn). This exponentially improves previous constructions, all of which only achieve rate n�Ω(1). (3) Combined with the techniques by Ben-Aroya et. al, our non-malleable extractors give a two-source extractor for min-entropy O(logn loglogn), which also implies a K-Ramsey graph on N vertices with K=(logN)O(logloglogN). Previously the best known two-source extractor by Ben-Aroya et. al requires min-entropy logn 2O(�logn), which gives a Ramsey graph with K=(logN)2O(�logloglogN). We further show a way to reduce the problem of constructing seeded non-malleable extractors to the problem of constructing non-malleable independent source extractors. Using the non-malleable 10-source extractor with optimal error by Chattopadhyay and Zuckerman, we give a 10-source extractor for min-entropy O(logn). Previously the best known extractor for such min-entropy by Cohen and Schulman requires O(loglogn) sources. Independent of our work, Cohen obtained similar results to (1) and the two-source extractor, except the dependence on � is log(1/�)poly loglog(1/�) and the two-source extractor requires min-entropy logn poly loglogn.
Oleshchuk, V..  2017.  A trust-based security enforcement in disruption-tolerant networks. 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS). 1:514–517.

We propose an approach to enforce security in disruption- and delay-tolerant networks (DTNs) where long delays, high packet drop rates, unavailability of central trusted entity etc. make traditional approaches unfeasible. We use trust model based on subjective logic to continuously evaluate trustworthiness of security credentials issued in distributed manner by network participants to deal with absence of centralised trusted authorities.

Bailer, Werner.  2017.  Efficient Approximate Medoids of Temporal Sequences. Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing. :3:1–3:6.
In order to compactly represent a set of data, its medoid (the element with minimum summed distance to all other elements) is a useful choice. This has applications in clustering, compression and visualisation of data. In multimedia data, the set of data is often sampled as a sequence in time or space, such as a video shot or views of a scene. The exact calculation of the medoid may be costly, especially if the distance function between elements is not trivial. While approximation methods for medoid selection exist, we show in this work that they do not perform well on sequences of images. We thus propose a novel algorithm for efficiently selecting an approximate medoid of a temporal sequence and assess its performance on two large-scale video data sets.
Tian, Sen, Ye, Songtao, Iqbal, Muhammad Faisal Buland, Zhang, Jin.  2017.  A New Approach to the Block-based Compressive Sensing. Proceedings of the 2017 International Conference on Computer Graphics and Digital Image Processing. :21:1–21:5.
The traditional block-based compressive sensing (BCS) approach considers the image to be segmented. However, there is not much literature available on how many numbers of blocks or segments per image would be the best choice for the compression and recovery methods. In this article, we propose a BCS method to find out the optimal way of image retrieval, and the number of the blocks to which into image should be divided. In the theoretical analysis, we analyzed the effect of noise under compression perspective and derived the range of error probability. Experimental results show that the number of blocks of an image has a strong correlation with the image recovery process. As the sampling rate M/N increases, we can find the appropriate number of image blocks by comparing each line.
Yu, Chenhan D., Levitt, James, Reiz, Severin, Biros, George.  2017.  Geometry-oblivious FMM for Compressing Dense SPD Matrices. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. :53:1–53:14.
We present GOFMM (geometry-oblivious FMM), a novel method that creates a hierarchical low-rank approximation, or "compression," of an arbitrary dense symmetric positive definite (SPD) matrix. For many applications, GOFMM enables an approximate matrix-vector multiplication in N log N or even N time, where N is the matrix size. Compression requires N log N storage and work. In general, our scheme belongs to the family of hierarchical matrix approximation methods. In particular, it generalizes the fast multipole method (FMM) to a purely algebraic setting by only requiring the ability to sample matrix entries. Neither geometric information (i.e., point coordinates) nor knowledge of how the matrix entries have been generated is required, thus the term "geometry-oblivious." Also, we introduce a shared-memory parallel scheme for hierarchical matrix computations that reduces synchronization barriers. We present results on the Intel Knights Landing and Haswell architectures, and on the NVIDIA Pascal architecture for a variety of matrices.
Zheng, Yan, Phillips, Jeff M..  2017.  Coresets for Kernel Regression. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. :645–654.
Kernel regression is an essential and ubiquitous tool for non-parametric data analysis, particularly popular among time series and spatial data. However, the central operation which is performed many times, evaluating a kernel on the data set, takes linear time. This is impractical for modern large data sets. In this paper we describe coresets for kernel regression: compressed data sets which can be used as proxy for the original data and have provably bounded worst case error. The size of the coresets are independent of the raw number of data points; rather they only depend on the error guarantee, and in some cases the size of domain and amount of smoothing. We evaluate our methods on very large time series and spatial data, and demonstrate that they incur negligible error, can be constructed extremely efficiently, and allow for great computational gains.