Skip to content

Commit 7d3c7d7

Browse files
author
Ben King
committed
Add comment to explain delayed wrapping of tokenizer
1 parent e34a157 commit 7d3c7d7

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

silnlp/nmt/hugging_face_config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1212,6 +1212,8 @@ def translate(
12121212
model.config.max_length = 512
12131213
lang_codes: Dict[str, str] = self._config.data["lang_codes"]
12141214

1215+
# The tokenizer isn't wrapped until after calling _create_inference_model,
1216+
# because the tokenizer's input/output language codes are set there
12151217
if isinstance(tokenizer, (NllbTokenizer, NllbTokenizerFast)):
12161218
tokenizer = PunctuationNormalizingTokenizer(tokenizer)
12171219

0 commit comments

Comments
 (0)