You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
first of all - thanks for this useful and amazing piece of software!
Unfortunately in my recent project it is important to crawl the whole website, so the URL's the crawler catches timeout on, should be rescheduled and visited once again. Googling brought me to the old homepage of the project (https://code.google.com/p/crawler4j/issues/detail?id=261) where I found out that crawler4j retries several times.
However, the URL's causing timeouts appear only once in my logfiles (which alone doesn't neccesarily mean the erroneous behaviour of the crawler -- it could as well be that they get succesfully fetched upon the first retry). Unfortunately the URL's can't be found in my database after the crawler terminates neither -- which ensures me that the retry didn't take place.
Could you help me with that?
Best,
Wojciech
The text was updated successfully, but these errors were encountered:
You could set the URL into a list and check each web page status code and if there is no status code you could re-run the crawler on those URLs. shouldVisit(Page page, WebURL url) {
Hello everyone,
first of all - thanks for this useful and amazing piece of software!
Unfortunately in my recent project it is important to crawl the whole website, so the URL's the crawler catches timeout on, should be rescheduled and visited once again. Googling brought me to the old homepage of the project (https://code.google.com/p/crawler4j/issues/detail?id=261) where I found out that crawler4j retries several times.
However, the URL's causing timeouts appear only once in my logfiles (which alone doesn't neccesarily mean the erroneous behaviour of the crawler -- it could as well be that they get succesfully fetched upon the first retry). Unfortunately the URL's can't be found in my database after the crawler terminates neither -- which ensures me that the retry didn't take place.
Could you help me with that?
Best,
Wojciech
The text was updated successfully, but these errors were encountered: