ValueError("URL should be absolute") when crawling https://crawlee.dev/js/api/core/changelog and respecting robots.txt

[Hello](https://youtu.be/YQHsXMglC9A?si=qCgmBOg31tEl6AFv&t=80), I was trying my crawler in your webpage (specifically in https://crawlee.dev/js/api/core/changelog) and I encountered this error:

[90m[crawlee.crawlers._basic._basic_crawler][0m [33mWARN [0m Retrying request to https://crawlee.dev/js/api/core/changelog due to: URL should be absoluteFile "python3.12/site-packages/yarl/_url.py", line 628, in _origin,     raise ValueError("URL should be absolute")

This only happens when I set respect_robots_txt_file=True, I tried putting it to false and it doesn't fail. This is my crawler config in case it helps:

```
        crawler = AdaptivePlaywrightCrawler.with_beautifulsoup_static_parser(
            playwright_crawler_specific_kwargs={
                "browser_type": "chromium",
                "headless": True,
            },
            configure_logging=True,
            use_session_pool=True,
            request_handler_timeout=timedelta(seconds=120),
            respect_robots_txt_file=True,
        )
```

I am not planning to crawl your page ;) , I was using it just as an example but it looks like there is some error when checking robots.txt with a relative path maybe?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ValueError("URL should be absolute") when crawling https://crawlee.dev/js/api/core/changelog and respecting robots.txt #1499

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ValueError("URL should be absolute") when crawling https://crawlee.dev/js/api/core/changelog and respecting robots.txt #1499

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions