-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crawler exit stuck #769
Comments
Issue still present on 1.5.4 ; I strongly suspect this is somehow linked to new retry logic and the usage of sizeLimit / timeLimit. How can I help to further diagnose the problem? |
Hm, just to confirm, the crawler prints |
Have not seen that before - and I don't think its related to retries, since all that happens after is:
We could add a timeout to closeLog() and setStatus(), have not seen any issues stalling there before.. |
No, sorry, look at timestamps, it got stuck at |
And first task got stuck at |
Crawler version : 1.5.1 (will update "soon")
We have a situation where the crawler gets interrupted due to the time limit but never exits. I sent a SIGTERM to the crawler and this is the result (we have multiple occurences of the crawler getting stuck, and for two of them I sent the SIGTERM and got same result as shown below. What is weird is that we've probably been blacklisted because all pages before the time limit seems to be ending with
Direct fetch of page URL timed out
error.I have another stuck task which is a bit different:
The text was updated successfully, but these errors were encountered: