Large-Scale Classification of IPv6-IPv4 Siblings with Variable Clock Skew
Title | Large-Scale Classification of IPv6-IPv4 Siblings with Variable Clock Skew |
Publication Type | Conference Paper |
Year of Publication | 2017 |
Authors | Scheitle, Q., Gasser, O., Rouhi, M., Carle, G. |
Conference Name | 2017 Network Traffic Measurement and Analysis Conference (TMA) |
Date Published | jun |
Publisher | IEEE |
ISBN Number | 978-3-901882-95-1 |
Keywords | Clocks, Decision trees, feature extraction, Hardware, Internet, IP networks, IPv4 addresses, IPv6 addresses, IPv6-IPv4 siblings, large-scale classification, learning (artificial intelligence), machine-learned decision tree, network characteristics, Network reconnaissance, pattern classification, pubcrawl, Resiliency, Servers, TCP timestamps, Training, transport protocols, variable clock skew |
Abstract | Linking the growing IPv6 deployment to existing IPv4 addresses is an interesting field of research, be it for network forensics, structural analysis, or reconnaissance. In this work, we focus on classifying pairs of server IPv6 and IPv4 addresses as siblings, i.e., running on the same machine. Our methodology leverages active measurements of TCP timestamps and other network characteristics, which we measure against a diverse ground truth of 682 hosts. We define and extract a set of features, including estimation of variable (opposed to constant) remote clock skew. On these features, we train a manually crafted algorithm as well as a machine-learned decision tree. By conducting several measurement runs and training in cross-validation rounds, we aim to create models that generalize well and do not overfit our training data. We find both models to exceed 99% precision in train and test performance. We validate scalability by classifying 149k siblings in a large-scale measurement of 371k sibling candidates. We argue that this methodology, thoroughly cross-validated and likely to generalize well, can aid comparative studies of IPv6 and IPv4 behavior in the Internet. Striving for applicability and replicability, we release ready-to-use source code and raw data from our study. |
URL | http://ieeexplore.ieee.org/document/8002901/ |
DOI | 10.23919/TMA.2017.8002901 |
Citation Key | scheitle_large-scale_2017 |
- machine-learned decision tree
- variable clock skew
- transport protocols
- Training
- TCP timestamps
- Servers
- Resiliency
- pubcrawl
- pattern classification
- Network reconnaissance
- network characteristics
- Clocks
- learning (artificial intelligence)
- large-scale classification
- IPv6-IPv4 siblings
- IPv6 addresses
- IPv4 addresses
- IP networks
- internet
- Hardware
- feature extraction
- Decision trees