Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudflare managed challenges not passing anymore #37

Open
ydeagan opened this issue Feb 1, 2025 · 8 comments
Open

cloudflare managed challenges not passing anymore #37

ydeagan opened this issue Feb 1, 2025 · 8 comments
Labels
detection related to bot detection unconfirmed not reproduced yet

Comments

@ydeagan
Copy link

ydeagan commented Feb 1, 2025

The cloudflare managed challenges aren't passing anymore with patchright.

here's the website: https://www.searchpeoplefree.com/email/test/Z21haWwtY29t0

i tried with multiple other ones, here's my code:

import random
import uuid
import os
import threading
import tempfile
from patchright.sync_api import sync_playwright


class CloudSolver:
    def __init__(self, website_url, thread_count, proxies_file='proxies.txt'):
        self.proxies = self.load_proxies(proxies_file)
        self.results = {}
        self.solver_extension = os.path.join(
            os.path.dirname(__file__), "solver")
        self.ad_block_extension = os.path.join(
            os.path.dirname(__file__), "adblock")
        self.website_url = website_url
        self.thread_count = thread_count

    def load_proxies(self, proxies_file):
        with open(proxies_file, 'r') as file:
            return [proxy.strip() for proxy in file]

    def parse_proxy(self, proxy):
        auth, address = proxy.split('@')
        user, pass_ = auth.split(':')
        ip, port = address.split(':')
        return user, pass_, ip, port

    def solve_task(self, result_id):
        while True:
            random_chrome_user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"
            proxy = random.choice(self.proxies)
            try:
                user, pass_, ip, port = self.parse_proxy(proxy)
                proxy_url = f"http://{user}:{pass_}@{ip}:{port}"

                with tempfile.TemporaryDirectory() as temp_dir:
                    with sync_playwright() as p:
                        browser = p.chromium.launch_persistent_context(
                            channel="chrome",
                            no_viewport=True,
                            user_data_dir=temp_dir, user_agent=random_chrome_user_agent,
                            args=[
                                f"--load-extension={self.solver_extension},{self.ad_block_extension}"
                            ],
                            headless=False,
                            proxy={"server": f"http://{ip}:{port}",
                                   "username": user, "password": pass_}
                        )
                        page = browser.new_page()
                        page.goto(self.website_url)

                        try:
                            page.wait_for_selector(
                                "text=review", state="hidden", timeout=30000)
                        except Exception:
                            continue

                        self.results[result_id] = {
                            "cookies": page.context.cookies(),
                            "proxy": proxy_url,
                            "user-agent": random_chrome_user_agent
                        }
                        break
            except Exception:
                continue

    def generate_solvers(self, thread_count):
        solve_threads = []
        for _ in range(thread_count):
            result_id = str(uuid.uuid4())
            thread = threading.Thread(
                target=self.solve_task, args=(result_id,))
            solve_threads.append(thread)
            thread.start()

        for thread in solve_threads:
            thread.join()

    def start(self):
        self.generate_solvers(self.thread_count)
@kaliiiiiiiiii
Copy link
Collaborator

You're adding a lot of stuff which is not recommended, such as chrome extensions & attempting to change the useragent.

Further, you might simply be sending to many requests from your device or being fingerprinted.

Unless this is reproducible on a clean machine & IP without all that extra stuff - this does not fall within the current scope of patchright

@kaliiiiiiiiii kaliiiiiiiiii added unconfirmed not reproduced yet detection related to bot detection labels Feb 1, 2025
@kaliiiiiiiiii kaliiiiiiiiii transferred this issue from Kaliiiiiiiiii-Vinyzu/patchright-python Feb 1, 2025
@Zamion101
Copy link

Zamion101 commented Feb 1, 2025

I think this is because of a regression in Playwright and their new approach to handling Proxies. I do also have problem managing proxy connection with the new 1.50.1 version of the Playwright. Everything works when I downgrade to 1.50.0

@battall
Copy link

battall commented Feb 2, 2025

I'm experiencing a similar cloudflare bypass inconsistency between platforms:

Test domain: 376betturkey.com

  • Same IP
  • Same Chromium version
  • No custom user agents
  • headless: false

Behavior:

  • macOS: Successfully bypasses Cloudflare's "click to verify" challenge
  • Linux (aarch64): Gets stuck at the verification screen

Additionally, on macOS, using any custom user agent (with devices[device]) results in the same behavior as Linux - failing to bypass.
I haven't been able to identify the exact cause or create a minimal reproduction case yet, but wanted to share this observation as it might be platform-specific.

@kaliiiiiiiiii
Copy link
Collaborator

kaliiiiiiiiii commented Feb 2, 2025

Linux (aarch64): Gets stuck at the verification screen

Linux aarch64 is probably a red flag for antibots (also depends on your distro). Note that patchright currently does not intend to cover fingerprinting.

using any custom user agent

This is to expect, since there are plenty of other (redundant) indicators. There's a simple solution though: Don't attempt to change the UA.

@Vinyzu
Copy link
Contributor

Vinyzu commented Feb 3, 2025

I think this is because of a regression in Playwright and their new approach to handling Proxies. I do also have problem managing proxy connection with the new 1.50.1 version of the Playwright. Everything works when I downgrade to 1.50.0

Do you have the relevant source code for this in the playwright repository?
A quick skim through the version comparison didnt gave me anything in that direction...

@kaliiiiiiiiii
Copy link
Collaborator

kaliiiiiiiiii commented Feb 3, 2025

playwright/compare/v1.50.0...v1.50.1 would be somewhere in here, if there's a breaking change

Maybe smth changed & your proxy is applied over some middlewar (=mitm) which changes the TLS//SSL fingerprint?

@massive-bot
Copy link

massive-bot commented Feb 3, 2025

I'm having a similar issue, and it's there with and without a proxy. Using google chrome as well, not chromium
Some clues:

  • Locally I'm running on a mac. If I run my server (which will also create the patchright browser) on my mac, it passes.
  • If I build it in a docker container, and install google chrome stable, and then run the server in the container, it fails/get's stuck.
    RUN apt-get update && apt-get install -y wget gnupg && \ wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - && \ echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" > /etc/apt/sources.list.d/google-chrome.list && \ apt-get update && \ apt-get install -y google-chrome-stable && \ rm -rf /var/lib/apt/lists/*

example site: https://www.crunchbase.com/organization/anthropic
The above is true with and without UA + fingerprinting modifications

@kaliiiiiiiiii
Copy link
Collaborator

I'm having a similar issue, and it's there with and without a proxy. Using google chrome as well, not chromium Some clues:

Well then it's probably not due to the proxy (but doesn't necessarily mean that using proxies wouldn't have any issues)

  • Locally I'm running on a mac. If I run my server (which will also create the patchright browser) on my mac, it passes.

Well that sounds like fingerprinting to me, which is currently not covered by patchright.

@Vinyzu Suppose this issue can be closed as not planned unless we can reproduce any of the issues stated on a regular machine ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
detection related to bot detection unconfirmed not reproduced yet
Projects
None yet
Development

No branches or pull requests

6 participants