Ever wondered why CNNs outperform old school features based steganalysis detectors?

The folklore has it that CNNs can leverage localized artifacts, while feature based detectors only leverage an accumulation of artifacts.

Our paper CNN Steganalyzers Leverage Local Embedding Artifacts shows experimentally that CNNs can indeed leverage highly localized artifacts. But are also better at accumulating artifacts (“integrators”) than feature based steganalysis.

The EfficientNet layers are pertained on ImageNet, while the inserted ResNet layers (and the FC) are randomly initialized. We show in the paper that pretraining the whole architecture isn’t needed. Surprising, right?

We compare the modified EfficientNets with the vanilla nets, as well as the winner model of the ALASKA II competition (SE-ReSnet 18 with stem pooling & strides disabled) in terms of FLOPs, and Memory needed to train using a batch size of 8 (although this is not a batch size we recommend, we typically used a minimum batch size of 24).

On the ALASKA II dataset (figures below), the post-stem modification boosts the performance while keeping the computational cost and the memory requirements reasonable by increasing the number of unpooled layers in the architecture. The modified models reach state-of-the-art performance with less than 1/2 of the FLOPs of the current best model. On the BOSSBase, the post-stem modification of EfficientNet also achieves state-of-the-art performance. More details in the preprint.

PS1. Patching objects like in the code above is not best practice, but subclassing introduced a lot of boilerplate code… I would love to hear if you have a better solution for this.

PS2. We originally used Luke’s EfficientNet-PyTorch models for the paper, then switched to timm which included ready to use code for different conv blocks.