-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
protobuf DecodeError when updating a (weird) document #8
Comments
The same problem seems to appear when parsing lists like the following (from wikipedia), which makes this problem much worse since it isn't just about texts in the wrong language.
|
@f11r So, I think the problem is that you have
The response has binarized trees. This might also fix the error on the second document you mentioned. |
Yes, this "fixes" it by segmenting/tokenizing the document again. However, I need to keep the tokenization the way it is provided to corenlp and only use the parser (I'm aware of the quality implications of using only part of the pipeline). If you think this is caused by the |
Ah I see. The problem is indeed caused by |
Running the following code leads to a
google.protobuf.message.DecodeError: Error parsing message
.This admittedly is a very weird document. A chinese(?) text snuck into my English pipeline and caused this error. I then iteratively removed parts of the document and this is the smallest document where my pipeline still produced the error. I then went ahead and serialized it to a string. I'm thus not certain if the final document could actually be reduced further while causing that error.
I'm using version
3.8.0
of corenlp, python-stanford-corenlp and corenlp-protobuf, protobuf3.5.0.post1
.The text was updated successfully, but these errors were encountered: