Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add header generator and integrate it into HTTPX client #530

Merged
merged 3 commits into from
Sep 17, 2024

Conversation

vdusek
Copy link
Collaborator

@vdusek vdusek commented Sep 16, 2024

Description

  • This is the first version of the header generator, providing common HTTP headers including user agent.
    • User-agent is picked randomly from a random pool of 1000 user agents from the Apify fingerprint dataset.
  • This is integrated into the HTTPX client and will be further used in the Playwright fingerprint injector (Mask Playwright's "headless" headers #401).

Issues

Testing

  • New unit tests implemented.

Checklist

  • CI passed

@github-actions github-actions bot added this to the 98th sprint - Tooling team milestone Sep 16, 2024
@github-actions github-actions bot added the t-tooling Issues with this label are in the ownership of the tooling team. label Sep 16, 2024
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Pull Request Tookit has failed!

Pull request is neither linked to an issue or epic nor labeled as adhoc!

@vdusek vdusek changed the title feat: add fingerprint generator and integrate it into HTTPX client [WIP] feat: add header generator and integrate it into HTTPX client [WIP] Sep 17, 2024
@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Sep 17, 2024
@vdusek vdusek changed the title feat: add header generator and integrate it into HTTPX client [WIP] feat: add header generator and integrate it into HTTPX client Sep 17, 2024
@vdusek vdusek marked this pull request as ready for review September 17, 2024 08:57
src/crawlee/http_clients/_httpx.py Outdated Show resolved Hide resolved
src/crawlee/http_clients/_httpx.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@janbuchar janbuchar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vdusek vdusek merged commit b63f9f9 into master Sep 17, 2024
19 checks passed
@vdusek vdusek deleted the add-fingerprint-generator branch September 17, 2024 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add header generator and integrate it into HTTPX client
2 participants