-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
google.protobuf.message.DecodeError: Error parsing message #154
Comments
Got the same problem here. I would like to know what the problem actually means? |
Hi @legolego, I ran your provided example, and both text2 and text3 gave me an timeout issue. I think the main problem is that both of the sentences are too long for CoreNLP to process in a specified timeout period. This is especially true for the dependency parser and the coref annotator, which slow down significantly for very long sentences. After removing So in general I wasn't able to reproduce the protobuf error that you mentioned. Are you still seeing the same issue (also even after removing |
Thanks for the reply :) Running with Running with I don't think it is a timeout issue. Is there some other way to debug? |
I was not able to run the test on a Windows system. I wonder if this is an issue only on Windows. Do you have access to a Linux or macOS system where you can test out the above example? |
It would take me a while, I don't have good access to a different machine. |
Actually, can you try removing |
Ok, I tried a few different ways, including starting the server from a command prompt like this:
The first time I run it after newly starting the server, but not in subsequent runs, with either text2 or text3, I see this in my DOS window:
My Java version is:
I changed my code a little to this:
And it finishes without the protobuf error, but it prints nothing after "const_parse: "
|
For your first trial with command-line start of the CoreNLP server above, I am no expert of protobuf, but the warning message does suggest that it is an issue with protobuf rather than the CoreNLP server. For your second trial with Python start of the CoreNLP server above, it is expected that you won't see any constituency parse output, because you are not specifying annotators, and the default annotator list does not include With all that said, I think neither of these two is related to the original protobuf decoding error, which I cannot reproduce on macOS or Ubuntu. @HsiehTommy, when you said you encountered the same error, are you also using a Windows OS? |
Ok, I tried reinstall some versions of things. I found a new protobuf.jar here: I ran the server again by adding I tried reinstalling stanfordnlp with pip, but kept/keep seeing this error:
Which version of
I also manually took the latest versions of all the files in (\venv\Lib\site-packages\stanfordnlp\server and \stanfordnlp\protobuf) from your source code here, but still have the protobuf error, only the line number changed a little(403 -> 432):
Is there something else I can try? |
Regarding Also, why is it saying that |
Ok, I'll try conda soon, but for stanfordnlp, I am installing 0.2.0...
If I try installing it without no-deps, I get the errors:
and I didn't understand which library is complaining with this: |
Yes those versions do look like This is really strange. With some quick search, I have found this StackOverflow issue, which suggests that you are the only one with this issue when installing And then I found this issue, which suggests that certain versions of torch are not available for Windows on PyPI, which could be causing this. And then if you go to the pytorch website, their suggested pip installation for Windows is So the current solution for Windows system is: 1) installing |
I think going forward one way to solve this for Windows users is to have official conda distributions, given that |
I'll try the conda install here shortly with a new python project. My current non-conda project worked previously on a different Windows computer, but I've had problems migrating to a new computer, and also to the new stanfordnlp library from the corenlp python library I was using before.... pytorch is a new requirement that wasn't there before. I'll let you know how it goes, thank you! |
I tried conda and have the same protobuf error with text3, but not text2...
|
I just tried the other python-corenlp library(https://github.com/stanfordnlp/python-stanford-corenlp) and I'm getting a similar message. This is in the new project with conda. With these changes from before(https://github.com/stanfordnlp/stanfordnlp/issues/154#issuecomment-553713985):
and everything else the same, my error is now:
Is there something in the formatting of my text maybe? It's strange both libraries give me basically the same error... Where else can I look? |
If you start the client with |
Also its possible you are running out of RAM if you are running a constituency parse of a ridiculously long sentence like that. |
Thank you for replying! I don't get anything obvious with And yes,
The client line in python is: |
Does it fail if you remove |
Sure, it looks like the If I start the server with: I get this output (with tokenize, ssplit, pos annotators starting too, not sure why): My code is as follows.
What is the correct way to call the annotators, from the DOS window or in the client line in the code? |
Since Starting the CoreNLP server in command line and in the Python For your given example above, again weirdly I was not able to reproduce the issue on either MacOS or Linux. I am asking a colleague to test it on Windows for me and will get back if he can reproduce the issue. My current guess is that there was nothing wrong with the Python-end protobuf call, but that the CoreNLP server somehow messed up the output serialized string on a long input sentence, and when the Python-end protobuf tried to decode the serialized string it was unable to do so. Do you still see the same protobuf error if you change the first word "in" in |
I'm not seeing the same error on Windows. In a clean download of corenlp, I ran
I then copy & pasted your example code and got constituency parses back for both text4 and text5. My python protobuf is 3.6.1, stanfordnlp 0.2.0, not sure what else would be relevant. Do you have any further insight on how to trigger a problem? |
I upgrade my python protobuf to 3.10.0 and still got results for the const_parse for both sentences. |
Not being there it's hard to know for certain but it might be something in your environment. You could do various things like make a new virtual environment and installing stanfordnlp into that virtualenv and checking if the problem persists. |
Thank you again for replying! :) Ok, I tried replacing "in" in
It's also weird that both this |
As I said earlier, I think it is also possible that there is nothing wrong with your Python environment, but that something is wrong on the CoreNLP side. I will also suggest you to try to reinstall CoreNLP, and then rerun your example. |
You should make sure the |
I'm a little suspicious about corenlp-protobuf being 3.8.0. It is
deprecated at this point. Can you uninstall that and try again?
What I did to get it to run was to download and run corenlp directly rather
than run it through github. Does that help?
…On Tue, Nov 19, 2019, 4:59 PM legolego ***@***.***> wrote:
Thank you again for replying! :) Ok, I tried replacing "in" in text4 with
"int", "it", "i", and "bob", and they all worked, and "in" again failed. My
results from pip freeze are:
(patentmoto3) C:\gitProjects\patentmoto3>pip freeze
certifi==2019.9.11
cffi==1.13.2
chardet==3.0.4
corenlp-protobuf==3.8.0
idna==2.8
mkl-fft==1.0.15
mkl-random==1.1.0
mkl-service==2.3.0
numpy==1.17.2
olefile==0.46
Pillow==6.2.1
protobuf==3.10.0
pycparser==2.19
requests==2.22.0
scipy==1.3.1
six==1.13.0
stanford-corenlp==3.9.2
stanfordnlp==0.2.0
torch==1.3.1
torchvision==0.4.2
tqdm==4.38.0
urllib3==1.25.7
wincertstore==0.2
It's also weird that both this stanfordnlp library, and the other
python-stanford-corenlp library both fail similarly... what would I look
for in my environment that might cause that?
I'll try making a new environment, though I have two now... one with venv
and one with conda, and they both fail.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/stanfordnlp/stanfordnlp/issues/154?email_source=notifications&email_token=AA2AYWKNHFO5FKZMJHAQG7LQUSDYZA5CNFSM4JHOLTR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQKDBA#issuecomment-555786628>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWMZN3EQE3VVGRE7WRDQUSDYZANCNFSM4JHOLTRQ>
.
|
Thank you for your suggestion to re-download corenlp! That wasn't exactly it, but it led me to what I think is the answer :) I downloaded 3.9.2 and put it in its own new folder, and called like like @AngledLuffa did just above, and it worked! Then I started downloading both supplemental 3.9.2 english model jars here: https://stanfordnlp.github.io/CoreNLP/ . First the small one(kbp) downloaded and I ran with it, and it worked, then the large one finished, and it failed! Removing the large model jar (not the kbp, stanford-english-corenlp-2018-10-05-models.jar) fixed it! It works now in both the conda and venv environments. Thank you very much! |
Very weird. If it's not too much trouble, would you send us the output of
the server with and without stanford-english-corenlp-2018-10-05-models.jar
? I wonder if you've uncovered a bug in one of the models. I'm not sure
I'll be able to debug it in the next few days, but it's something to look
into.
…On Tue, Nov 19, 2019 at 9:05 PM legolego ***@***.***> wrote:
Thank you for your suggestion to re-download corenlp! That wasn't exactly
it, but it led me to what I think is the answer :) I downloaded 3.9.2 and
put it in its own new folder, and called like like @AngledLuffa
<https://github.com/AngledLuffa> did just above, and it worked! Then I
started downloading both supplemental 3.9.2 english model jars here:
https://stanfordnlp.github.io/CoreNLP/ . First the small one(kbp)
downloaded and I ran with it, and it worked, then the large one finished,
and it failed! Removing the large model jar (not the kbp,
stanford-english-corenlp-2018-10-05-models.jar) fixed it! It works now in
both the conda and venv environments. Thank you very much!
Is there something I'm doing wrong with the models apart from putting them
into the same directory as corenlp? Could you let me know if the mac and
linux versions fail here too? Any idea when a new compiled version be
available to download? Thank you!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/stanfordnlp/stanfordnlp/issues/154?email_source=notifications&email_token=AA2AYWLMHWXRAV2R72O4SRTQUTASTA5CNFSM4JHOLTR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEQXBDI#issuecomment-555839629>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWLM23TUXYVPMJ6GS3TQUTASTANCNFSM4JHOLTRQ>
.
|
Sure... this is the output without either library:
and the output with the non-kbp model added to the directory:
There's nothing obvious to me in there... |
Different parser models are used in the 2 cases. So the error occurs with the englishSR parser but not with englishPCFG. |
Yep. I will investigate
…On Wed, Nov 20, 2019 at 1:26 PM J38 ***@***.***> wrote:
Different parser models are used in the 2 cases. So the error occurs with
the englishSR parser but not with englishPCFG.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/stanfordnlp/stanfordnlp/issues/154?email_source=notifications&email_token=AA2AYWJ3K7RG3AVT4G2HTBTQUWTRVA5CNFSM4JHOLTR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEVEPGQ#issuecomment-556418970>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWJCLJSTKFZKIACMQR3QUWTRVANCNFSM4JHOLTRQ>
.
|
Progress! I can recreate the error on my end. Now I guess I should try to fix it |
Yay! I'm glad I wasn't making it up! :D |
I think we've diagnosed it (although we reserve the right to be wrong).
Using the SR parser, the parse tree for this particular sentence goes deep
enough that it exceeds protobuf's built in recursion limit. There's
various ways we can prevent this from happening in future versions, but
ultimately the parse tree you get for this kind of sentence won't be too
useful anyway. I suggest detecting the exception and skipping it for now.
…On Thu, Nov 21, 2019 at 2:38 PM legolego ***@***.***> wrote:
Yay! I'm glad I wasn't making it up! :D
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<https://github.com/stanfordnlp/stanfordnlp/issues/154?email_source=notifications&email_token=AA2AYWPRKO7AURP6CQSSX7LQU4EUTA5CNFSM4JHOLTR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE34UAQ#issuecomment-557304322>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AA2AYWP6XGQZNQGLA2VQLFLQU4EUTANCNFSM4JHOLTRQ>
.
|
Good news. We found the python commands needed to chance the recursion
limit:
from google.protobuf.pyext._message import SetAllowOversizeProtos
SetAllowOversizeProtos(True)
Just add that before calling the protocol buffer and it should work.
…On Thu, Nov 21, 2019 at 9:24 PM John Bauer ***@***.***> wrote:
I think we've diagnosed it (although we reserve the right to be wrong).
Using the SR parser, the parse tree for this particular sentence goes deep
enough that it exceeds protobuf's built in recursion limit. There's
various ways we can prevent this from happening in future versions, but
ultimately the parse tree you get for this kind of sentence won't be too
useful anyway. I suggest detecting the exception and skipping it for now.
On Thu, Nov 21, 2019 at 2:38 PM legolego ***@***.***> wrote:
> Yay! I'm glad I wasn't making it up! :D
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/stanfordnlp/stanfordnlp/issues/154?email_source=notifications&email_token=AA2AYWPRKO7AURP6CQSSX7LQU4EUTA5CNFSM4JHOLTR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE34UAQ#issuecomment-557304322>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AA2AYWP6XGQZNQGLA2VQLFLQU4EUTANCNFSM4JHOLTRQ>
> .
>
|
Ok, that worked for me too, thank you for all your help! :) |
FYI, I've now made a simple fix at stanfordnlp/stanfordnlp@a55953f. The client code will now catch this Since this is in the |
On the java side, we're going to flatten any trees which have more than 80 layers (trees like this are basically going to be useless anyway). We could in theory change the wire format, but from what I understand, other people have written modules which process the current wire format, so we're more or less locked into this format for now. This will be available in the next release of corenlp. |
Description
I think this is similar to a bug in the old python library:
python-stanford-corenlp.
I'm trying to copy the demo for the client here or here.
but with my own texts... text2 works and text3 doesn't, the only differemce between them in the very last word.
The error I get is:
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expect it to finish. text=text2 succeeds, but text=text3 fails with the above error. The only difference between the texts is the last word 'his' (could really be anything I think).
Environment:
Additional context
I've also gotten a timeout error for some sentences, but it's intermittent. I'm not sure of they're related, but this is easier to reproduce.
The text was updated successfully, but these errors were encountered: