Skip to content

0.4.3

Compare
Choose a tag to compare
@tanaysoni tanaysoni released this 29 Apr 15:52
· 538 commits to master since this release

🔢 Changed Multiprocessing in Inferencer

The Inferencer has now a fixed pool of processes instead of creating a new one for every inference call.
This accelerates the processing a bit and solves some problems when using it in combination with Frameworks like gunicorn/FastAPI etc (#329)

Old:

...
inferencer.inference_from_dicts(dicts, num_processes=8)

New:

Inferencer(dicts, num_processes=8)
...

⏩ Streaming Inferencer

You can now also use the Inferencer in a "streaming mode". This is especially useful in production scenarios where the Inferencer is part of a bigger pipeline (e.g. consuming documents from elasticsearch) and you want to get predictions as soon as they are available (#315)

Input: Generator yielding dicts with your text
Output: Generator yielding your predictions

    dicts = sample_dicts_generator()  # it can be a list of dicts or a generator object
    results = inferencer.inference_from_dicts(dicts, streaming=True, multiprocessing_chunksize=20)
    for prediction in results:  # results is a generator object that yields predictions
        print(prediction)

👵 👴 "Classic" baseline models for benchmarking + S3E Pooling

While Transformers are conquering many of the current NLP tasks, there are still quite some tasks (e.g. some document classification) where they are a complete overkill. Benchmarking Transformers with "classic" uncontextualized embedding models is a common, good practice and is now possible without switching frameworks. We added basic support for loading in embeddings models like GloVe, Word2vec and FastText and using them as a "LanguageModels" in FARM (#285)

See the example script

We also added a new pooling method to get sentence or document embeddings from these models that can act as a strong baseline for transformer-based approaches (e.g Sentence-BERT). The method is called S3E and was recently introduced by Wang et al in "Efficient Sentence Embedding via Semantic Subspace Analysis" (#286)

See the example script


A few more changes ...

Modeling

  • Cross-validation for Question-Answering #335
  • Add option to use max_seq_len tokens for LM Adaptation/Training-from-scratch instead of real sentences #314
  • Add english glove models #339
  • Implicitly connect heads with processor + check for connection #337

Evaluation & Inference

  • Registration of custom evaluation reports #331
  • Standalone Evaluation with pretrained models #330
  • tqdm progress bar in inferencer #338
  • Group NER preds by sample #327
  • Fix Processor configs when loading Inferencer #318

Other

  • Fix the IOB2 to simple tags check #324
  • Update config when saving model to include changes of parameters #323
  • Fix Issues with NER format Conversion #322
  • Fix error message in loading of Tokenizer #317
  • Less verbosity, Fix which Samples and Baskets being Thrown Away #313

👨‍🌾 👩‍🌾 Thanks to all contributors for making FARMer's life better!
@brandenchan, @tanaysoni, @Timoeller, @tholor, @bogdankostic, @gsarti