Visible to the public On Computational Thinking, Inferential Thinking and Data Science

TitleOn Computational Thinking, Inferential Thinking and Data Science
Publication TypeConference Paper
Year of Publication2016
AuthorsJordan, Michael I.
Conference NameProceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures
PublisherACM
Conference LocationNew York, NY, USA
ISBN Number978-1-4503-4210-0
KeywordsBig Data, big data privacy, communication, composability, compositionality, Computing Theory, Human Behavior, inference, parallelism, privacy, pubcrawl, Resiliency, Scalability, Statistics
Abstract

The rapid growth in the size and scope of datasets in science and technology has created a need for novel foundational perspectives on data analysis that blend the inferential and computational sciences. That classical perspectives from these fields are not adequate to address emerging problems in "Big Data" is apparent from their sharply divergent nature at an elementary level-in computer science, the growth of the number of data points is a source of "complexity" that must be tamed via algorithms or hardware, whereas in statistics, the growth of the number of data points is a source of "simplicity" in that inferences are generally stronger and asymptotic results can be invoked. On a formal level, the gap is made evident by the lack of a role for computational concepts such as "runtime" in core statistical theory and the lack of a role for statistical concepts such as "risk" in core computational theory. I present several research vignettes aimed at bridging computation and statistics, including the problem of inference under privacy and communication constraints, and ways to exploit parallelism so as to trade off the speed and accuracy of inference.

URLhttps://dl.acm.org/doi/10.1145/2935764.2935826
DOI10.1145/2935764.2935826
Citation Keyjordan_computational_2016