Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

page_timout does not work for crawler.arun_many #436

Open
KanishkNavale opened this issue Jan 9, 2025 · 0 comments
Open

page_timout does not work for crawler.arun_many #436

KanishkNavale opened this issue Jan 9, 2025 · 0 comments

Comments

@KanishkNavale
Copy link

The code:

async def get_documents_from_urls(urls: List[str]) -> Optional[List[Document]]:
    logger.info("Querying the internet for relevant information...")

    crawler = AsyncWebCrawler()
    await crawler.start()

    crawler_config = CrawlerRunConfig(
        excluded_tags=["form", "nav", "img", "script", "aside", "footer"],
        remove_overlay_elements=True,
        remove_forms=True,
        verbose=False,
        page_timeout=6000,
    )

    results = await crawler.arun_many(
        urls=urls,
        config=crawler_config,
    )

    await crawler.close()
    return generate_docs_from_crawls(results)

The Error:

 × Unexpected error in _crawl_web at line 1205 in _crawl_web (.venv/lib/python3.12/site-                               │
│ packages/crawl4ai/async_crawler_strategy.py):                                                                         │
│   Error: Failed on navigating ACS-GOTO:                                                                               │
│   Page.goto: Timeout 60000ms exceeded.                                                                                │
│   Call log:                                                                                                           │
│   - navigating to "https://tintin.fandom.com/wiki/Tintin", waiting until "domcontentloaded"                           │
│                                                                                                                       │
│                                                                                                                       │
│   Code context:                                                                                                       │
│   1200                                                                                                                │
│   1201                       response = await page.goto(                                                              │
│   1202                           url, wait_until=config.wait_until, timeout=config.page_timeout                       │
│   1203                       )                                                                                        │
│   1204                   except Error as e:                                                                           │
│   1205 →                     raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}")                          │
│   1206                                                                                                                │
│   1207                   await self.execute_hook("after_goto", page, context=context, url=url, response=response)     │
│   1208                                                                                                                │
│   1209                   if response is None:                                                                         │
│   1210                       status_code = 200 

Note:
This works fine for the method crawler.arun()

@KanishkNavale KanishkNavale changed the title page_timout does not work for arun_may page_timout does not work for crawler.arun_many Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant