In this repository, I have developed a CycleGAN architecture with embedded Self-Attention Layers, that could solve three different complex tasks. Here the same principle Neural Network architecture has been used to solve the three different tasks such as Colorize sketch, shader and glass remover and turning male face to female. Although truth be told, my model has not exceeded any state of the art performances for the given task, but the architecture was powerful enough to understand the task that has been given to solve and produce considerably good results.
The concept of CycleGAN used in this project is the same as the original. The novel approach that I have added is adding the self-attention layers to the U-net generator and discriminator. The concept of self attention is inspired from the research paper Self-Attention Generative Adversarial Networks. I have modified the self-attention layer discussed in the research paper for better results. In my case, the base formula for attention is shown below.
The base code for the self-attention layer is built around this formula. The self-attention layers added at the bottleneck of the u-net and right before the output Convolution Layer. In the case of the discriminator, the self-attention layers are added right before the zero-padding layer and right before the output layer.
- The entire architecture is built with tensorflow.
- Matplotlib has been used for visualization.
- Numpy has been used for mathematical operations.
- OpenCV have used for the processing of images.
I have trained and validated the model with an image size of 256 and trained over 800 epochs. The default parameters mentioned in the config.py file are the baseline parameters used for training over three different tasks.
The given task is to colorize a input facial sketch image.
Over Training examples
Over Validation examples
The given task is to Transform a Male face into a female face.
Over Training examples
Over Validation examples
The given task is to remove glass and sun-glass from an input facial image. While training the model to solve this task the alpha parameter of LeakyReLU was set to 0.4 instead of the default 0.1 for the above two tasks.
Over Training examples
Over Validation examples
python main.py --height 256 --width 256 --epoch 300 --dataset "./dataset/" --subject 1
python main.py --train False --dataset "./validate/" --validate "face-1001.png" --subject 1
Saves the predicted image with the name "Gan_Output_face-1001.png".
As I have mentioned above the a principle architecture thave solved all three tasks, but I have also found out that modifying the self-attention layer architecture by
def hw_flatten(self, x) :
return layers.Reshape(( -1, x.shape[-2], x.shape[-1]))(x)
instead of
def hw_flatten(self, x) :
return layers.Reshape(( -1, x.shape[-2]* x.shape[-1]))(x)
have improved the outcomes of the model for solving a particular individual case. Also removing the LayerNormalization and Dropout from the self-attention layer have improved the performance for individual cases.
The Self-Attention layer has been used in both generator and discriminator network.
- The model could be further improved with the further turning of Convolution layers or other layers.
- Creating a deeper u-net architecture could also have helped in improving the performance of the model.