-
Notifications
You must be signed in to change notification settings - Fork 501
Open
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.
Milestone
Description
Hello, I was trying my crawler in your webpage (specifically in https://crawlee.dev/js/api/core/changelog) and I encountered this error:
�[90m[crawlee.crawlers._basic._basic_crawler]�[0m �[33mWARN �[0m Retrying request to https://crawlee.dev/js/api/core/changelog due to: URL should be absoluteFile "python3.12/site-packages/yarl/_url.py", line 628, in _origin, raise ValueError("URL should be absolute")
This only happens when I set respect_robots_txt_file=True, I tried putting it to false and it doesn't fail. This is my crawler config in case it helps:
crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
playwright_crawler_specific_kwargs={
"browser_type": "chromium",
"headless": True,
},
configure_logging=True,
use_session_pool=True,
request_handler_timeout=timedelta(seconds=120),
respect_robots_txt_file=True,
)
I am not planning to crawl your page ;) , I was using it just as an example but it looks like there is some error when checking robots.txt with a relative path maybe?
Metadata
Metadata
Assignees
Labels
bugSomething isn't working.Something isn't working.t-toolingIssues with this label are in the ownership of the tooling team.Issues with this label are in the ownership of the tooling team.