Advanced Training
Tricks: Gradient Accumulation, Gradient Clipping, and EMA
All the training tricks can be configured in the model config files.
After setting, please run tools/train.py
script to initiate training.
Example Yaml Config
train:
gradient_accumulation_steps: 2
clip_grad: True
clip_norm: 5.0
ema: True
ema_decay: 0.9999
Gradient Accumulation
Gradient accumulation is an effective way to address memory limitation issue and allow training with large global batch size.
To enable it, set train.gradient_accumulation_steps
to values larger than 1 in yaml config.
The equivalent global batch size would be
global_batch_size = batch_size * num_devices * gradient_accumulation_steps
Gradient Clipping
Gradient clipping is a method to address gradient explosion/overflow problem and
stabilize model convergence.
To enable it, set train.ema
to True
and optionally adjust the norm value in train.clip_norm
.
EMA
Exponential Moving Average (EMA) can be viewed as a model ensemble method that smooths the model weights.
It can help stabilize model convergence in training and usually leads to better model performance.
To enable it, set train.ema
to True
. You may also adjust train.ema_decay
to control the decay rate.
Resume Training
Resuming training is useful when the training was interrupted unexpectedly.
To resume training, set model.resume
to True
in the yaml config as follows:
model:
resume: True
By default, it will resume from the "train_resume.ckpt" checkpoint file located in the directory
specified in train.ckpt_save_dir
.
If you want to use another checkpoint to resume from, specify the checkpoint path in resume
as follows:
model:
resume: /some/path/to/train_resume.ckpt
Training on OpenI Cloud Platform
Please refer to the MindOCR OpenI Training Guideline