#stylegan2 #non-square #gcp
Notes 📝 based on Training StyleGAN2 Part 2 Video 🎥 taught in the StyleGAN2 DeepDive course 📚by @Derrick Schultz and @Lia Coleman. The asterisk * on each numbered section will link to the video timecode of the tutorial.
1. Start-up Server ⚙️ *
- Login into GCP
- Click launch on your Compute account. (image?)
- Have your dataset ready and uploaded in 📁Google Drive.
- Vertical 🎨 Images: 767px by 1200px
- Square 🎨 Images: 1024px by 1024px
- Refer to the Github
repo skyfkynil/stylegan2 for detailed directions.
- For further information refer to the paper Official TensorFlow Implementation with practical improvements 📄http://arxiv.org/abs/1912.04958
2. SSH Login Open in Browser Window 💻 *
Click Login through SSH Connect, opening a browser window.
- Out of the box doesn’t have static IP (It can be set-up)
3. Activate StyleGAN2 library 🐍 *
- In Terminal activate your anaconda environment.
conda activate stylegan
4. Set-up Dataset Folder 📁 *
- Move into the skyflynil folder and go into the folder datasets
- Place all TFRecords in the datasets folder.
- Create raw_datasets folder
mkdir raw_datasets
- (Why? To differentiate raw images from tfrecord folders)
- Go inside your new folder
cd into raw_datasets
5. Upload Dataset images in GCP ⬆️ *
- Use GDown
- Pass the ID to a file
- On GDrive, toggle Share linking on and copy the ID
gdown --id id-ofyour-gdrive-zip-file
- GServer to GServer is really fast
6. Unzip 🔐 *
- Unzip your gdown file
unzip dataset-name.zip
- Clean up your raw_dataset folder by removing the zip file
rm dataset-name.zip
- Go back to the main styleGAN2 folder
7. Create our TFRecords files 🔮 *
- Create TFRecords from your image files, rather than training from raw-images (optimization)
!python dataset_tool.py create_from_images_raw --res_log2=8 ./dataset/dataset_name ./raw_datasets/dataset-name/
- you should see raw-dataset TFRecords file
- (base) stylegan-ver: 1
Detailed instruction for training your stylegan2 skyflynil notes
Instead of image size of 2^n * 2^n, now you can process your image size as of (min_h x 2^n) X (min_w * 2^n) naturally. For example, 640x384, min_h = 5, min_w =3, n=7. Please make sure all your raw images are preprocessed to the exact same size. To reduce the training set size, JPEG format is preferred.
For image size 1280x768 (hxw), you may choose (min_h, min_w, res_log2) as (10, 6, 7) or (5, 3, 8) , the latter setup is preferred due to deeper and smaller network, change res_log2 argument for dataset creation and training accordingly.
For information on installing your anaconda environment check out the video StyleGAN2 installation on GCP 🎥
Documentation link to listing out your anaconda environments on Terminal
8. Upload & Transfer Learn from a new model 📙 *
- If your dataset are 🎨 images with dimensions 1200px by 768px, or 768px by 1200px (non-square) you must transfer learn from a model trained on those dimensions.
- The model that is default set in transfer learning is 1024px by 1024ox.
- You can't transfer learn if the size of your model and the size of your dataset don't match.
- Setup Results Folder 📁 to Ignore ("rename") 🚫 Square Model *
mv results/00000-pretrained/network-snapshot-1000.pkl results/00000-pretrained/network-snapshot-1000.pkl-ignore
- Get the sharable link to your non-square model pre-trained .pkl file on Google Drive and Gdown the file into the results folder.
gdown --id id-ofyour-pkl-file
9. Train Model ⚙️ *
!python run_training.py --num-gpus=1 --data-dir=./dataset --config=config-f --dataset=your_dataset_name --mirror-augment=true --metric=none --total-kimg=20000 --min-h=5 --min-w=3 --res-log2=8 --result-dir="/content/drive/My Drive/stylegan2/results"
- data-dir should always point to the parent of your parent directory of your TFRecords folder.
- config use config-e (512px) or config-f (1024px), it depends on the size of the image your are outputting to.
- dataset input the name of your dataset folder 📁
- total kimages 20000
- res-log2 means it is a power of 2
- so the model will multiply log2-8 by min-width=5 and min-height=2
- which gives us our 1280px by 768px training dimensions.
- width: 2^6*2 = 768
- height=2^8*5 1200
- This repo recommends that if you are doing 128 it recommends you use 7
- Because that 8th channel although it makes the network a little bit deeper it makes it smaller overall
⚠️ You can’t do a 16:9 or 720p aspect ratios.
10. Check on Your Training 👀 *
- We are ready to start training, it will display that is is training from your last 🌵 .pkl file
- Its might be a bit slower the first time on the same machine because it is caching some files.
- Data shape = [3, 1280, 768] 📐
- Dynamic range = [0, 255] 🚥
- Range 0-255 256 the size that we can work with loading networks
- Custom Cuda commands compile
- (2- 5 minute wait) 🕓
- Outputting the architecture
3. To confirm ✅, terminal should output process above 👆
- Building Tensorflow graph
- Training for 20 kimages
- You will produce an initial 🌵 .pkl file from its current status.
- Outputting pickling up from the same image
- Size of the mini-batches
- Size of gpumem is how much memory the training is using. (underestimates)
- Warning:
⚠️ might throw error if the datasets TFRecords file is not the right shape.
11. Create Training Subprocess 🔗 *
- GCP terminal when closed, also terminates the training process.
- Nohup re-running the script using Nohup is a background process in the GPU that will allow us to close the browser window and continue the training process.
- Other solutions: Install gnu, gmux
- Now we will check nohup.out to asses wether this is working as a background process.
- Cuda version 4.0
- How much GPU are you using
- Process ID python 15.7gb
- This is your command running on the GPU
Kill -9 PID number
- 24 hours of training later 🕓
- In GCP head back to your server into the skyflynil folder 📁
- Since nohup has made the training run as a subprocess you will have to type the Nvdia-smi command to check that it is running properly
- Terminate the sub-process
ls results
- Head inside the results folder which will have the results labeled with your dataset name.
- Training 80k images can start reflecting dataset ok enough.
- Training up to 500k images gets to a really good point
- Truncation values will vary in displayed results.
14. Download file ⤵️ nohup.out 📑 *
- Download the nohup .out file
ls pwd
2. Open in Text Editor and scroll to the bottom
- A tick is a certain number of kimages, depending on what you set your mini-batch too
- How long does it take to train a tick faster with how quickly it takes
Hope these notes help to breakdown the StyleGAN2 video tutorial for reference in the future. 🚀