This repository contains the Jupyter notebooks used for:
- creating word embeddings of text description of facial images in the used datasets
- StackGAN stage 1 and 2 used to generate faces using the generated word embeddings
Automatic synthesis of realistic images from visual attributes is an important problem in computer vision and has wide applications in law enforcement and entertainment, for instance in image editing, in video games, or for accessibility. With the advent of deep generative convolutional neural networks (CNNs), attempts have been made to synthesize images from attributes or from text-based captions. Subtle differences exist in these various developments made in the novel problem of image generation using text descriptions and visual image attributes using deep learning models, like - vanilla GAN, stackGAN, and CVAEs.
To generate facial images from facial attribute text descriptions using deep learning methods.
Face sketch is the main approach to find a suspect in law enforcement, especially in many cases when facial attribute descriptions of suspects by witnesses are available. These sketches are drawn by hand - a time-consuming process. Being able to generate a realistic colored image based on the description provided by a witness can prove to be extremely beneficial to law enforcement. Even though there has been a lot of recent work and interest in the research community on image generation from text captions and attributes, the problem of specifically generating facial images hasn’t gained as much attention. In this project, our objective is to generate realistic facial images using the given text-based attribute description of a face. This project has the potential to significantly streamline and hasten the process of generating the face of a criminal for identification.