Biblio
With the arrival of the Internet of Things (IoT), more devices appear online with default credentials or lacking proper security protocols. Consequently, we have seen a rise of powerful DDoS attacks originating from IoT devices in the last years. In most cases the devices were infected by bot malware through the telnet protocol. This has lead to several honeypot studies on telnet-based attacks. However, IoT installations also involve other protocols, for example for Machine-to-Machine communication. Those protocols often provide by default only little security. In this paper, we present a measurement study on attacks against or based on those protocols. To this end, we use data obtained from a /15 network telescope and three honey-pots with 15 IPv4 addresses. We find that telnet-based malware is still widely used and that infected devices are employed not only for DDoS attacks but also for crypto-currency mining. We also see, although at a much lesser frequency, that attackers are looking for IoT-specific services using MQTT, CoAP, UPnP, and HNAP, and that they target vulnerabilities of routers and cameras with HTTP.
Recent work in OS fingerprinting has focused on overcoming random distortion in network and user features during Internet-scale SYN scans. These classification techniques work under an assumption that all parameters of the profiled network are known a-priori – the likelihood of packet loss, the popularity of each OS, the distribution of network delay, and the probability of user modification to each default TCP/IP header value. However, it is currently unclear how to obtain realistic versions of these parameters for the public Internet and/or customize them to a particular network being analyzed. To address this issue, we derive a non-parametric Expectation-Maximization (EM) estimator, which we call Faulds, for the unknown distributions involved in single-probe OS fingerprinting and demonstrate its significantly higher robustness to noise compared to methods in prior work. We apply Faulds to a new scan of 67M webservers and discuss its findings.
In this study, we report on techniques and analyses that enable us to capture Internet-wide activity at individual IP address-level granularity by relying on server logs of a large commercial content delivery network (CDN) that serves close to 3 trillion HTTP requests on a daily basis. Across the whole of 2015, these logs recorded client activity involving 1.2 billion unique IPv4 addresses, the highest ever measured, in agreement with recent estimates. Monthly client IPv4 address counts showed constant growth for years prior, but since 2014, the IPv4 count has stagnated while IPv6 counts have grown. Thus, it seems we have entered an era marked by increased complexity, one in which the sole enumeration of active IPv4 addresses is of little use to characterize recent growth of the Internet as a whole. With this observation in mind, we consider new points of view in the study of global IPv4 address activity. Our analysis shows significant churn in active IPv4 addresses: the set of active IPv4 addresses varies by as much as 25% over the course of a year. Second, by looking across the active addresses in a prefix, we are able to identify and attribute activity patterns to networkm restructurings, user behaviors, and, in particular, various address assignment practices. Third, by combining spatio-temporal measures of address utilization with measures of traffic volume, and sampling-based estimates of relative host counts, we present novel perspectives on worldwide IPv4 address activity, including empirical observation of under-utilization in some areas, and complete utilization, or exhaustion, in others.
Maintaining and updating signature databases is a tedious task that normally requires a large amount of user effort. The problem becomes harder when features can be distorted by observation noise, which we call volatility. To address this issue, we propose algorithms and models to automatically generate signatures in the presence of noise, with a focus on stack fingerprinting, which is a research area that aims to discover the operating system (OS) of remote hosts using TCP/IP packets. Armed with this framework, we construct a database with 420 network stacks, label the signatures, develop a robust classifier for this database, and fingerprint 66M visible webservers on the Internet.