-
Notifications
You must be signed in to change notification settings - Fork 1k
Train a model
At its simplest, training a model can be invoked with
<train_cpp_binary> train --flagsfile=<path_to_flags>
The flags to the train binary can be passed in a flagfile (see this example flags file) or as flags on the command line:
<train_cpp_binary> [train|continue|fork] \
--datadir <path/to/data/> \
--tokensdir <path/to/tokens/file/> \
--archdir <path/to/architecture/files/> \
--rundir <path/to/save/models/> \
--arch <name_of_architecture.arch> \
--train <train/datasets/ds1.lst,train/datasets/ds2.lst> \
--valid <validation/datasets/ds1.lst,validation/datasets/ds2.lst> \
--lexicon <path/to/lexicon.txt> \
--lr=0.0001 \
--lrcrit=0.0001
To define a file with model architecture for training please have a look at How to write architecture file?
Training supports three modes:
-
train: Train a model from scratch on the given training data. -
continue: Continue training a saved model. This can be used for example to fine-tune with a smaller learning rate. Thecontinueoption makes a best effort to resume training from the most recent checkpoint of a given model as if there were no interruptions. -
fork: Create and train a new model from a saved model. This can be used for example to adapt a saved model to a new dataset.
We give a short description of some of the more important flags here. A complete list of the flag definitions and short descriptions of their meaning can be found here.
The datadir flag is the base path to where all the train and valid
dataset list files live. Every train path will be prefixed by datadir.
Multiple datasets can be passed to train and valid as a comma-separated
list. More details on dataset preparation can be found here.
Similarly, the archdir and tokensdir are (optional) base paths to where the
arch and token files live. For example, the complete architecture file path
will be <archdir>/<arch>. More details on specifying architecture files can be found here.
lexicon flag is used to specify the lexicon which specifies the token sequence for a give word.
The rundir flag is the base directory where the model will be saved and the
runname is the subdirectory that will be created to save the model and
training logs. If runname is unspecified a directory name based on the date,
time and user will be created.
Most of the training hyperparameter flags have default values. Many of these you will not need to change. Some of the more important ones include:
- `lr` : The learning rate for the model parameters.
- `lrcrit` : The learning rate for the criterion parameters.
- `criterion` : Which criterion (e.g. loss function) to use. Options include `ctc`,
`asg` or `seq2seq`.
- `batchsize` : The size of the minibatch to use per GPU.
- `maxgradnorm` : Clip the norm of gradient of the model and criterion parameters
to this value. NB the norm is computed and clipped on the aggregated model
and criterion parameters.