Skip to content

Commit

Permalink
fix: add grad spike detection
Browse files Browse the repository at this point in the history
  • Loading branch information
percevalw committed Feb 19, 2025
1 parent 413e7f0 commit 45512a2
Show file tree
Hide file tree
Showing 7 changed files with 201 additions and 56 deletions.
2 changes: 2 additions & 0 deletions changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
- `docs/tutorials/tuning.md`: New tutorial for hyperparameter tuning.
- Provided a [detailed tutorial](./docs/tutorials/tuning.md) on hyperparameter tuning, covering usage scenarios and configuration options.
- `ScheduledOptimizer` (e.g., `@core: "optimizer"`) now supports importing optimizers using their qualified name (e.g., `optim: "torch.optim.Adam"`).
- Added grad spike detection to the `edsnlp.train` script, and per weight layer gradient logging.

### Changed

Expand All @@ -27,6 +28,7 @@
- Ensure we don't overwrite the RNG of the data reader when calling `stream.shuffle()` with no seed
- Raise an error if the batch size in `stream.shuffle(batch_size=...)` is not compatible with the stream
- `eds.split` now keeps doc and span attributes in the sub-documents.
- Fixed mini-batch accumulation for multi-task training

# v0.15.0 (2024-12-13)

Expand Down
4 changes: 2 additions & 2 deletions docs/tutorials/training.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ EDS-NLP supports training models either [from the command line](#from-the-comman
val_data: ${ val_data }
max_steps: 2000
validation_interval: ${ train.max_steps//10 }
max_grad_norm: 1.0
grad_max_norm: 1.0
scorer: ${ scorer }
optimizer: ${ optimizer }
# Do preprocessing in parallel on 1 worker
Expand Down Expand Up @@ -284,7 +284,7 @@ EDS-NLP supports training models either [from the command line](#from-the-comman
val_data=val_data,
scorer={"ner": ner_metric},
optimizer=optimizer,
max_grad_norm=1.0,
grad_max_norm=1.0,
output_dir="artifacts",
# Do preprocessing in parallel on 1 worker
num_workers=1,
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/tuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ train:
val_data: ${ val_data }
max_steps: 400
validation_interval: ${ train.max_steps//2 }
max_grad_norm: 1.0
grad_max_norm: 1.0
scorer: ${ scorer }
optimizer: ${ optimizer }
num_workers: 2
Expand Down
Loading

0 comments on commit 45512a2

Please sign in to comment.