Skip to content

Bug in ErnieMConverter Class #41

@YiandLi

Description

@YiandLi

Using -m-large version, but met a bug in class ErnieMConverter(Converter):

Traceback (most recent call last):
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/src/run.py", line 23, in <module>
    ie = UIEPredictor(model='uie-m-large', schema=schema, device="cuda" if torch.cuda.is_available() else "cpu")
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/uie_predictor.py", line 146, in __init__
    self._prepare_predictor()
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/uie_predictor.py", line 160, in _prepare_predictor
    self._tokenizer = ErnieMTokenizerFast.from_pretrained(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2017, in from_pretrained
    return cls._from_pretrained(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/tokenization_utils_base.py", line 2249, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/tokenizer.py", line 477, in __init__
    super().__init__(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/tokenization_utils_fast.py", line 114, in __init__
    fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/convert_slow_tokenizer.py", line 1342, in convert_slow_tokenizer
    return converter_class(transformer_tokenizer).converted()
  File "/Users/liuyilin/Downloads/NLP_project/Kaggle_PIIDD/uie_pytorch/tokenizer.py", line 576, in __init__
    from transformers.utils import sentencepiece_model_pb2 as model_pb2
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/transformers/utils/sentencepiece_model_pb2.py", line 91, in <module>
    _descriptor.EnumValueDescriptor(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/google/protobuf/descriptor.py", line 789, in __new__
    _message.Message._CheckCalledFromGeneratedFile()
TypeError: Descriptors cannot be created directly.
If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:
 1. Downgrade the protobuf package to 3.20.x or lower.
 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions