Skip to content

page_timout does not work for crawler.arun_many #436

Closed
@KanishkNavale

Description

@KanishkNavale

The code:

async def get_documents_from_urls(urls: List[str]) -> Optional[List[Document]]:
    logger.info("Querying the internet for relevant information...")

    crawler = AsyncWebCrawler()
    await crawler.start()

    crawler_config = CrawlerRunConfig(
        excluded_tags=["form", "nav", "img", "script", "aside", "footer"],
        remove_overlay_elements=True,
        remove_forms=True,
        verbose=False,
        page_timeout=6000,
    )

    results = await crawler.arun_many(
        urls=urls,
        config=crawler_config,
    )

    await crawler.close()
    return generate_docs_from_crawls(results)

The Error:

 × Unexpected error in _crawl_web at line 1205 in _crawl_web (.venv/lib/python3.12/site-                               │
│ packages/crawl4ai/async_crawler_strategy.py):                                                                         │
│   Error: Failed on navigating ACS-GOTO:                                                                               │
│   Page.goto: Timeout 60000ms exceeded.                                                                                │
│   Call log:                                                                                                           │
│   - navigating to "https://tintin.fandom.com/wiki/Tintin", waiting until "domcontentloaded"                           │
│                                                                                                                       │
│                                                                                                                       │
│   Code context:                                                                                                       │
│   1200                                                                                                                │
│   1201                       response = await page.goto(                                                              │
│   1202                           url, wait_until=config.wait_until, timeout=config.page_timeout                       │
│   1203                       )                                                                                        │
│   1204                   except Error as e:                                                                           │
│   1205 →                     raise RuntimeError(f"Failed on navigating ACS-GOTO:\n{str(e)}")                          │
│   1206                                                                                                                │
│   1207                   await self.execute_hook("after_goto", page, context=context, url=url, response=response)     │
│   1208                                                                                                                │
│   1209                   if response is None:                                                                         │
│   1210                       status_code = 200 

Note:
This works fine for the method crawler.arun()

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions