2021_11_25_augmentation_lm

History

Name		Name	Last commit message	Last commit date
parent directory ..
data		data
results		results
README.md		README.md
nlp_augmentation_lm.ipynb		nlp_augmentation_lm.ipynb

README.md

Text Augmentation using large-scale LMs and prompt engineering

Typically, the more data we have, the better performance we can achieve 🤙. However, it is sometimes difficult and/or expensive to annotate a large amount of training data 😞. Therefore, proper data augmentation is useful to boost the model performance.

Large-scale language models (LMs) are excellent few-shot learners, allowing them to be controlled via natural text prompts. In this tip, we leverage three large-scale LMs (GPT-3, GPT-J and GPT-Neo) and prompt engineering to generate very realistic samples from a very small dataset. The model takes as input two real samples from our dataset, embeds them in a carefully designed prompt and generates an augmented mixed sample influenced by the sample sentences. We use the Emotion dataset and distilled BERT pre-trained model and show that this augmentation method boosts the model performance and generates very realistic samples. For more information on text augmentation using large-scale LMs check GPT3Mix.

We recommend to open the notebook using Colab for an interactive explainable experience and optimal rendering of the visuals 👇:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

2021_11_25_augmentation_lm

2021_11_25_augmentation_lm

README.md

Text Augmentation using large-scale LMs and prompt engineering

Files

2021_11_25_augmentation_lm

Directory actions

More options

Directory actions

More options

Latest commit

History

2021_11_25_augmentation_lm

Folders and files

parent directory

README.md

Text Augmentation using large-scale LMs and prompt engineering