0.4.1
๐จโ๐พ ๐ ๐ค Full compatibility with Transformers' models
Open-source is more than just public code. It's a mindset of sharing, being transparent and collaborating across organizations. It's about building on the shoulders of other projects and advancing together the state of technology. That's why we built on the top of the great Transformers library by huggingface and are excited to release today an even deeper compatibility that simplifies the exchange & comparison of models.
1. Convert models from/to transformers
model = AdaptiveModel.convert_from_transformers("deepset/bert-base-cased-squad2", device="cpu", task_type="question_answering")
transformer_model = model.convert_to_transformers()
2. Load models from their new model hub:
LanguageModel.load("TurkuNLP/bert-base-finnish-cased-v1")
Inferencer.load("deepset/bert-base-cased-squad2", task_type="question_answering")
...
๐ Better & Faster Training
Thanks to @BramVanroy and @johann-petrak we got some really hot new features here:
-
Automatic Mixed Precision (AMP) Training: Speed up your training by ~ 35%! Model params are usually stored with FP32 precision. Some model layers don't need that precision and can be reduced to FP16, which speeds up training and reduces memory footprint. AMP is a smart way of figuring out, for which params we can reduce precision without sacrificing performance (Read more).
Test it by installing apex and setting "use_amp" to "O1" in one of the FARM example scripts. -
More flexible Optimizers & Schedulers: Choose whatever optimizer you like from PyTorch, apex or Transformers. Take your preferred learning rate schedule from Transformers or PyTorch (Read more)
-
Cross-validation: Get more reliable eval metrics on small datasets (see example)
-
Early Stopping: With early stopping, the run stops once a chosen metric is not improving any further and you take the best model up to this point. This helps prevent overfitting on small datasets and reduces training time if your model doesn't improve any further (see example).
โฉ Caching & Checkpointing
Save time if you run similar pipelines (e.g. only experimenting with model params): Store your preprocessed dataset & load it next time from cache:
data_silo = DataSilo(processor=processor, batch_size=batch_size, caching=True)
Start & stop training by saving checkpoints of the trainer:
trainer = Trainer.create_or_load_checkpoint(
...
checkpoint_on_sigterm=True,
checkpoint_every=200,
checkpoint_root_dir=Path(โ/opt/ml/checkpoints/trainingโ),
resume_from_checkpoint=โlatestโ)
The checkpoints include the state of everything that matters (model, optimizer, lr_schedule ...) to resume training. This is particularly useful, if your training crashes (e.g. because you are using spot cloud instances).
โ๏ธ Integration with AWS SageMaker & Training from scratch
We are currently working a lot on simplifying large scale training and deployment. As a first step, we are adding support for training on AWS SageMaker. The interesting part here is the option to use Spot Instances and save about 70% of costs compared to regular instances. This is particularly relevant for training models from scratch, which we introduce in a basic version in this release and will improve over the next weeks. See this tutorial to get started with using SageMaker for training on down-stream tasks.
๐ป Windows support
FARM now also runs on Windows. This implies one breaking change:
We now use pathlib and therefore expect all directory paths to be of type Path
instead of str
#172
A few more changes ...
Modelling
- [enhancement] ALBERT support #169
- [enhancement] DistilBERT support #187
- [enhancement] XLM-Roberta support #181
- [enhancement] Automatically infer layer dims of prediction head #195
- [bug] Implement next_sent_pred flag #198
QA
- [enhancement] Encoding of QA IDs #171
- [enhancement] Remove repeat QA preds from overlapping passages #186
- [enhancement] More options to control predictions of Question Answering Head #183
- [bug] Fix QA example #203
Training
- [enhancement] Use AMP instead of naive fp16. More optimizers. More LR Schedules. #133
- [bug] Fix for use AMP instead of naive fp16 (#133) #180
- [enhancement] Add early stopping and custom metrics #165
- [enhancement] Add checkpointing for training #188
- [enhancement] Add train loss to tqdm. add desc for data preproc. log only 2 samples #175
- [enhancement] Allow custom functions to aggregate loss of prediction heads #220
Eval
Data Handling
- [enhancement] Add caching of datasets in DataSilo #177
- [enhancement] Add option to limit number of processes in datasilo #174
- [enhancement] Add max_multiprocessing_chunksize as a param for DataSilo #168
- [enhancement] Issue59 - Add cross-validation for small datasets #167
- [enhancement] Add max_samples argument to TextClassificationProcessor #204
- [bug] Fix bug with added tokens #197
Other
- [other] Disable multiprocessing in lm_finetuning tests to reduce memory footprint #176
- [bug] Fix device arg in examples #184
- [other] Add error message to train/dev split fn #190
- [enhancement] Add more seeds #192
๐จโ๐พ ๐ฉโ๐พ Thanks to all contributors for making FARMer's life better!
@brandenchan, @tanaysoni, @Timoeller, @tholor, @maknotavailable, @johann-petrak, @BramVanroy