Skip to content

PlaywrightCrawler doesn't have gotoOptions #1576

@phughesion-h3

Description

@phughesion-h3

In the JavaScript version, the PuppeteerCrawler has gotoOptions, which I believe allows you to define what wait_until state you want.
https://crawlee.dev/js/api/puppeteer-crawler#PuppeteerGoToOptions

The PlaywrightCrawler just uses the default page.goto, which defaults to "load".
https://github.com/apify/crawlee-python/blob/9d4ae6439c301abe7439281a5786b8f166d67623/src/crawlee/crawlers/_playwright/_playwright_crawler.py#L300C1-L301C1

Some sites take ages to load and I would like my request_handler to run after "domcontentloaded", since I don't need to wait for the full page to load to get what I need. As it is now, my request_handler will never be called because the site has an issue preventing it from loading all of the way.

I don't just want to increase the timeout, I want to be able to specify what options _navigate should use when calling goto.

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions