Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add helper function send_request for PlaywrightCrawler using APIRequestContext bound to the browser context #1134

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Apr 3, 2025

Description

  • add helper function send_request for PlaywrightCrawler using APIRequestContext bound to the browser context

Issues

@Mantisus
Copy link
Collaborator Author

Mantisus commented Apr 3, 2025

After discussions with @janbuchar in Slack, we came to the decision that this approach is best for PlaywrightCrawler. The requests will be closer to the browser context, and will use the same proxies that were set when the context was opened.

However, there is a problem. Playwright does not propagate the headers set with set_extra_http_headers to the request context. This results in the request context not receiving fingerprint headers.

It relevant #1055

Refactoring would be required to pass the headers set when opening the browser context to the crawling context. Or wait for Playwright to do something about it on their end

@Mantisus Mantisus requested review from janbuchar and Copilot April 3, 2025 12:43
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

tests/unit/crawlers/_playwright/test_playwright_crawler.py:486

  • Ensure that send_request_response.read() is decoded (e.g. using .decode('utf-8')) before passing it to json.loads to avoid potential type errors when parsing bytes.
check_data['send_request'] = dict(json.loads(send_request_response.read()))

@Mantisus Mantisus self-assigned this Apr 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a playwright-based HttpClient implementation
1 participant