swish_optimize Chinese blog: http://bindog.github.io/blog/2020/05/20/optimize-training-memory-by-op-fusion-gradient-checkpoint/