Biblio | CPS-VO

Pevny, Tomas, Ker, Andrew D.. 2018. Exploring Non-Additive Distortion in Steganography. Proceedings of the 6th ACM Workshop on Information Hiding and Multimedia Security. :109-114.

Leading steganography systems make use of the Syndrome-Trellis Code (STC) algorithm to minimize a distortion function while encoding the desired payload, but this constrains the distortion function to be additive. The Gibbs Embedding algorithm works for a certain class of non-additive distortion functions, but has its own limitations and is highly complex. In this short paper we show that it is possible to modify the STC algorithm in a simple way, to minimize a non-additive distortion function suboptimally. We use it for two examples. First, applying it to the S-UNIWARD distortion function, we show that it does indeed reduce distortion, compared with minimizing the additive approximation currently used in image steganography, but that it makes the payload more – not less – detectable. This parallels research attempting to use Gibbs Embedding for the same task. Second, we apply it to distortion defined by the output of a specific detector, as a counter-move in the steganography game. However, unless the Warden is forced to move first (by fixing the detector) this is highly detectable.

Pevny, Tomas, Somol, Petr. 2016. Discriminative Models for Multi-instance Problems with Tree Structure. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :83–91.

Modelling network traffic is gaining importance to counter modern security threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as a prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in the model training phase. We propose a discriminative model that makes decisions based on a computer's all traffic observed during a predefined time window (5 minutes in our case). The model is trained on traffic samples collected over equally-sized time windows for a large number of computers, where the only labels needed are (human) verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training, the model itself learns discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, and demonstrate that the learned traffic patterns can be interpreted as Indicators of Compromise. We implement the discriminative model as a neural network with special structure reflecting two stacked multi instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) that are typically visited by infected computers.

Pevny, Tomas, Somol, Petr. 2016. Discriminative Models for Multi-instance Problems with Tree Structure. Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security. :83–91.

Modelling network traffic is gaining importance to counter modern security threats of ever increasing sophistication. It is though surprisingly difficult and costly to construct reliable classifiers on top of telemetry data due to the variety and complexity of signals that no human can manage to interpret in full. Obtaining training data with sufficiently large and variable body of labels can thus be seen as a prohibitive problem. The goal of this work is to detect infected computers by observing their HTTP(S) traffic collected from network sensors, which are typically proxy servers or network firewalls, while relying on only minimal human input in the model training phase. We propose a discriminative model that makes decisions based on a computer's all traffic observed during a predefined time window (5 minutes in our case). The model is trained on traffic samples collected over equally-sized time windows for a large number of computers, where the only labels needed are (human) verdicts about the computer as a whole (presumed infected vs. presumed clean). As part of training, the model itself learns discriminative patterns in traffic targeted to individual servers and constructs the final high-level classifier on top of them. We show the classifier to perform with very high precision, and demonstrate that the learned traffic patterns can be interpreted as Indicators of Compromise. We implement the discriminative model as a neural network with special structure reflecting two stacked multi instance problems. The main advantages of the proposed configuration include not only improved accuracy and ability to learn from gross labels, but also automatic learning of server types (together with their detectors) that are typically visited by infected computers.