We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 4c0181c commit cb7fde9Copy full SHA for cb7fde9
.github/workflows/wheel.yml
@@ -127,6 +127,8 @@ jobs:
127
args: "--recompute-swiglu --recompute-norm"
128
- name: "Offload Opt"
129
args: "--offload-opt-m --offload-opt-v --offload-master"
130
+ - name: "Offload Gradient"
131
+ args: "--shard-gradients --offload-grads"
132
# While not strictly a recomputation, chunked attention should be bitwise identical, too
133
- name: "Chunked attention"
134
args: "--recompute-att --attn-bwd-chunks=2"
0 commit comments