⛔ [DEPRECATED] Transformers Domain Adaptation

This toolkit improves the performance of HuggingFace transformer models on downstream NLP tasks, by domain-adapting models to the target domain of said NLP tasks (e.g. BERT -> LawBERT).

The overall Domain Adaptation framework can be broken down into three phases:

Data Selection

Select a relevant subset of documents from the in-domain corpus that is likely to be beneficial for domain pre-training (see below)
Vocabulary Augmentation

Extending the vocabulary of the transformer model with domain specific-terminology
Domain Pre-Training

Continued pre-training of transformer model on the in-domain corpus to learn linguistic nuances of the target domain

After a model is domain-adapted, it can be fine-tuned on the downstream NLP task of choice, like any pre-trained transformer model.

Components

This toolkit provides two classes, DataSelector and VocabAugmentor, to simplify the Data Selection and Vocabulary Augmentation steps respectively.

Installation

This package was developed on Python 3.6+ and can be downloaded using pip:

pip install transformers-domain-adaptation

Features

Compatible with the HuggingFace ecosystem:
- transformers 4.x
- tokenizers
- datasets

Usage

Please refer to our Colab guide!

Results

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 223 Commits
.github/workflows		.github/workflows
auto_spot		auto_spot
docs		docs
notebooks		notebooks
scripts		scripts
src		src
tests/unit_test/transformers_domain_adaptation		tests/unit_test/transformers_domain_adaptation
.git-blame-ignore-revs		.git-blame-ignore-revs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
readthedocs.yml		readthedocs.yml
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⛔ [DEPRECATED] Transformers Domain Adaptation

Components

Installation

Features

Usage

Results

About

Releases

Packages

Contributors 4

Languages

License

georgian-io/Transformers-Domain-Adaptation

Folders and files

Latest commit

History

Repository files navigation

⛔ [DEPRECATED] Transformers Domain Adaptation

Components

Installation

Features

Usage

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages