Biblio
It is well-known that online services resort to various cookies to track users through users' online service identifiers (IDs) - in other words, when users access online services, various "fingerprints" are left behind in the cyberspace. As they roam around in the physical world while accessing online services via mobile devices, users also leave a series of "footprints" – i.e., hints about their physical locations - in the physical world. This poses a potent new threat to user privacy: one can potentially correlate the "fingerprints" left by the users in the cyberspace with "footprints" left in the physical world to infer and reveal leakage of user physical world privacy, such as frequent user locations or mobility trajectories in the physical world - we refer to this problem as user physical world privacy leakage via user cyberspace privacy leakage. In this paper we address the following fundamental question: what kind - and how much - of user physical world privacy might be leaked if we could get hold of such diverse network datasets even without any physical location information. In order to conduct an in-depth investigation of these questions, we utilize the network data collected via a DPI system at the routers within one of the largest Internet operator in Shanghai, China over a duration of one month. We decompose the fundamental question into the three problems: i) linkage of various online user IDs belonging to the same person via mobility pattern mining; ii) physical location classification via aggregate user mobility patterns over time; and iii) tracking user physical mobility. By developing novel and effective methods for solving each of these problems, we demonstrate that the question of user physical world privacy leakage via user cyberspace privacy leakage is not hypothetical, but indeed poses a real potent threat to user privacy.