A few days ago, I published a blog post on writing a python program which transfers style onto a content image using Keras, which you can find here.
The reason why I wrote it using Keras and not Tensorflow, is that I’ve been trying to write a functioning Tensorflow style transfer program for two weeks and I just couldn’t get it working. I gave up on Tensorflow and tried Keras.
If you’ve ever dealt with Keras, you might know that Keras has a package called applications, which contains classes like VGG19, VGG16 and ResNet.
Keras provides you with a native solution of accessing those large public nets, whereas Tensorflow does not.
The problem, which I had with Tensorflow, is that I couldn’t find working weights. I downloaded a couple of files that I found online, all of which were over 500 MBs each, and none of those seamed to work.
It might have been just my stupidity as a programmer, which is the most likely source of the problem, or it might have been an actual malfunction of the weights. I followed a couple of tutorials line by line, and the outcome still looked like crap. Not just that it didn’t look like a very good combination of the two images, but often times, it would just be a plain black, or a plain white image.
After two weeks of torture, I gave up and tried Keras. It was my first time using it, and I have to say… I like Tensorflow better.
Luckily for me, since Tensorflow 1.4, Keras is the official high-level API of Tensorflow, so I didn’t have to install an additional dependency.
You can find the repository here.
Creating the graph
First, we’ll create a class which will contain everything needed to create a Tensorflow computational graph.
At the top of the gist, I created a variable containing the URL of the weights.
All of the previous versions of the VGG net weights which I downloaded were over 500 MBs. It turns out that if you’re just going to use the convolutional part of the net, you only need to download like 70 MBs. Why was not I previously informed of this. This is an very important piece of information and I do not like that I wasn’t told of it.
The last couple of layers are the fully connected layer, which are used for predicton, take up 450-500 MBs which neither I, nor you need to use for this program, yet every tutorial which I read informed me that I have to download the entire 500 MBs file.
The method which I used to download weights requires either Keras and Tensorflow 1.1 or Tensorflow 1.4, which is the way I did it.
You can find another way to download the weights if you, for some stupid reason, do not want to upgrade your version of Tensorflow, but I was too lazy to find an alternative.
The last two methods are the standard way of calculating convolutions and pooling, but the first one is a bit different. The weights are stored in an .h5 file, which consists of datasets . As you can see, first I get the weights and the biases in a single dataset object and then I create numpy arrays out of it.
After We load the the content and style images (both need to be the same shape; you can find how I did it in the this repository), we need to create a VGG object and create the graph. You can use the content image as the init image, or you can pass a random noise image.
Then comes the most important part, calculating the loss.
Calculating content loss is pretty straightforward. All we need to do is calculate the Euclidean distance between the feature maps of the content and the generated image at the specified layer and multiply it by some weight, which is a hyperparameter you can tune.
As I said, it’s pretty simple.
Style loss is a bit different. Since we need to transfer the texture of the style image, we calculate the gram matrices of the generated and the style image and then find the Euclieaden distance between them, multiply that by a constant, divide by the number of layer we take into account and then multiply it by the weight, which is also a hyperparameter.
Sum it all up
After all that, we still have a couple of steps to go.
We need to sum up the content and style losses, and add a third type of loss – total variance loss. This new type of loss, is really just calculating how noisy is the image and try to reduce it, so that our output looks nice and not blurry.
Now the last, and the most rewarding part – the optimization.
Style transfer seems to work best with limited memory BFGS, which scipy includes in their optimize package. But unfortunately, we cannot just simply pass our Tensorflow trainable Variable to the scipy method and sit back. Scipy does not support Tensorflow tensors as input.
But luckily, Tensorflow as created a scipy optimizer interface which does support all that good stuff. It’s called ScypyOptimizerInterface. I know, it’s very creative.
And one of my favourite Game of thrones characters – King Robert, first of his name, King of the Andals and the Rhoynar and the First Men, Lord of the Seven Kingdoms and Protector of the Realm.
I know that I just copied and pasted the results from the Keras post, but it generally looks the same, and I am a bit of slob.
You can find the repository here.