0.4.3
🔢 Changed Multiprocessing in Inferencer
The Inferencer has now a fixed pool of processes instead of creating a new one for every inference call.
This accelerates the processing a bit and solves some problems when using it in combination with Frameworks like gunicorn/FastAPI etc (#329)
Old:
...
inferencer.inference_from_dicts(dicts, num_processes=8)
New:
Inferencer(dicts, num_processes=8)
...
⏩ Streaming Inferencer
You can now also use the Inferencer in a "streaming mode". This is especially useful in production scenarios where the Inferencer is part of a bigger pipeline (e.g. consuming documents from elasticsearch) and you want to get predictions as soon as they are available (#315)
Input: Generator yielding dicts with your text
Output: Generator yielding your predictions
dicts = sample_dicts_generator() # it can be a list of dicts or a generator object
results = inferencer.inference_from_dicts(dicts, streaming=True, multiprocessing_chunksize=20)
for prediction in results: # results is a generator object that yields predictions
print(prediction)
👵 👴 "Classic" baseline models for benchmarking + S3E Pooling
While Transformers are conquering many of the current NLP tasks, there are still quite some tasks (e.g. some document classification) where they are a complete overkill. Benchmarking Transformers with "classic" uncontextualized embedding models is a common, good practice and is now possible without switching frameworks. We added basic support for loading in embeddings models like GloVe, Word2vec and FastText and using them as a "LanguageModels" in FARM (#285)
See the example script
We also added a new pooling method to get sentence or document embeddings from these models that can act as a strong baseline for transformer-based approaches (e.g Sentence-BERT). The method is called S3E and was recently introduced by Wang et al in "Efficient Sentence Embedding via Semantic Subspace Analysis" (#286)
See the example script
A few more changes ...
Modeling
- Cross-validation for Question-Answering #335
- Add option to use max_seq_len tokens for LM Adaptation/Training-from-scratch instead of real sentences #314
- Add english glove models #339
- Implicitly connect heads with processor + check for connection #337
Evaluation & Inference
- Registration of custom evaluation reports #331
- Standalone Evaluation with pretrained models #330
- tqdm progress bar in inferencer #338
- Group NER preds by sample #327
- Fix Processor configs when loading Inferencer #318
Other
- Fix the IOB2 to simple tags check #324
- Update config when saving model to include changes of parameters #323
- Fix Issues with NER format Conversion #322
- Fix error message in loading of Tokenizer #317
- Less verbosity, Fix which Samples and Baskets being Thrown Away #313
👨🌾 👩🌾 Thanks to all contributors for making FARMer's life better!
@brandenchan, @tanaysoni, @Timoeller, @tholor, @bogdankostic, @gsarti