Train an object classifier on ImageNet using multiple gpus in Torch7
This repo shows how to train a object classifier over ImageNet/Cifar10/Cifar100/MNIST using a multi-threaded, multi-gpu approach.
- Several types of networks like AlexNet, Overfeat, VGG, Googlenet, etc. are available for training;
- Multi-GPU support;
- Data loading/processing using multiple threads;
- Easily apply data augmentation;
- Integration with the
dbcollection
package.
- NVIDIA GPU with compute capability 3.5+ (2GB+ ram)
- torch7
- torchnet
- dbcollection
The main script comes with several options which can be listed by running the script with the flag --help
th main.lua --help
To train a network using the default settings, simply do:
th main.lua
Note: You must have the ImageNet ILSVRC2012 dataset (or any other dataset) setup before running this script. For more information about how to setup your datasets using
dbcollection
see here.
By default, the script trains theAlexNet model on 1 GPU with the CUDNN backend and loads data from disk using 4 CPU threads.
To run an alexnet model using two or more GPUs, set nGPU
to the number of GPUs you want to use (in this example only two are used):
th main.lua -nGPU 2 -netType alexnet
In case you want to specify which gpus do use, do the following:
CUDA_VISIBLE_DEVICES=0,1 th main.lua -nGPU 2 -netType alexnet
Note: this will select the first two GPUs detected in your system.
To use more threads for data loading/processing, use the nThreads
flag to specify the number of threads you want to use.
th main.lua -nThreads 2
For a complete list of available options, please see the opts.lua
file or run th main.lua --help
in the command line.
For most datasets, loading the necessary metadata (filenames, labels, etc.) from disk should carry a very small, almost insignificant overhead compared to loading metadata from memory.
To showcase this, some scripts under benchmark/
for the ImageNet ILSVRC2012 and Cifar10 datasets are available for benchmarking this. Here it is used the average time for 1000 data fetches with batchsize=128
and nThreads=4
.
The train
scores use more data augmentation preprocessing compared to the test
scores which uses less data augmentation techniques.
Dataset | train | test |
---|---|---|
Cifar10 (disk) | 0.01509s | 0.00953s |
Cifar10 (ram) | 0.00772s | 0.00557s |
ILSVRC2012 (disk) | 0.34635 | 0.35729 |
ILSVRC2012 (ram) | 0.34553 | 0.36107 |
Note: This tests were done using a 6-core Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 32GB ram, 2TB SSHD and Ubuntu 14.04. Note that the overhead is very small when using datasets with bigger images like the Imagenet, meaning that the overhead can be hidden by using enough cores or a faster disk.
-
main.lua
(~250 lines) - Script using torchnet's api for training and testing a network over ImageNet. -
utils.lua
(~125 lines) - Multi-gpu functions for loading/storing/setting a model. -
transforms.lua
(~500 lines) - Data augmentation functions, mostly derived from here and here. -
configs.lua
(~200 lines) - Setup configurations (options, model, logger, etc.) -
statistics.lua
(~100 lines) - Computes the dataset's mean/std statistics for 10000 samples and stores it to./cache
dir. -
model.lua
(~40 lines) - Creates/Loads a model from training/testing. -
data.lua
(~110 lines) - Contains the methods to featch/load data of the available datasets.
MIT license (see the LICENSE file)
This code has been inpired on torchnet's mnist training example, soumith's multi-gpu ImageNet training code and @karandwivedi42 multigpu-torchnet.