The gpt-2
model is a one of Generative Pre-trained Transformer (GPT) model family, pre-trained on a very large corpus of English data in a self-supervised fashion. The GPT architecture implements a deep neural network, specifically a transformer model, which uses attention in place of previous recurrence- and convolution-based architectures. Attention mechanisms allow the model to selectively focus on segments of input text it predicts to be the most relevant. GPT-2 is trained with a simple objective: predict the next word, given all of the previous words within some text.
More details provided in the paper, repository and model card.
Metric | Value |
---|---|
Type | Text Prediction |
GFlops | 293.0489 |
MParams | 175.6203 |
Source framework | PyTorch* |
GFlops calculated for 1, 1024
input shape, that is suitable for long context
Perplexity obtained on WikiText-2 raw character level data dataset for converted model.
Metric | Value |
---|---|
Perplexity | 29.00% |
Token ids, name: input
, dynamic shape in the format B, L
, where:
B
- batch sizeL
- sequence length
Token ids, name: input
, dynamic shape in the format B, L
, where:
B
- batch sizeL
- sequence length
Prediction scores of language modeling head, name: output
, dynamic shape B, L, 50257
in the format B, L, S
, where:
B
- batch sizeL
- sequence lengthS
- vocab size
Prediction scores of language modeling head, name: output
, dynamic shape B, L, 50257
in the format B, L, S
, where:
B
- batch sizeL
- sequence lengthS
- vocab size
You can download models and if necessary convert them into OpenVINO™ IR format using the Model Downloader and other automation tools as shown in the examples below.
An example of using the Model Downloader:
omz_downloader --name <model_name>
An example of using the Model Converter:
omz_converter --name <model_name>
The model can be used in the following demos provided by the Open Model Zoo to show its capabilities:
The original model is distributed under the mit License.