Skip to content

menloresearch/crawl4ai

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€πŸ€– Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper

Note: This is a fork of the original Crawl4AI project, customized for MCP (Model Context Protocol) agent usage.

MCP Agent Setup Guide

This guide helps you set up Crawl4AI as an MCP (Model Context Protocol) agent.

Quick Start: Most users should choose Option 2 (Local Setup) as it supports both headed mode (with visible browser) and headless mode. Only use Docker if you specifically need a containerized environment and don't mind headless-only operation.

View Original README for complete project documentation.

πŸ“‹ Prerequisites

  • Python 3.10+
  • Docker and Docker Compose (for Docker setup)
  • Git

🐳 Option 1: Docker Setup (Headless Only)

⚠️ Important: Docker setup runs in headless mode only (no visual browser window). If you want to see the browser while crawling, use Option 2 instead.

Step 1: Configure Environment

# Copy the example environment file
cp deploy/docker/.llm.env.example .llm.env

Step 2: Set Configuration

# Copy the headless configuration
cp deploy/docker/config_headless.yml deploy/docker/config.yml

Step 3: Start the Service

# Build and start the container
docker compose up --build -d

πŸ’» Option 2: Local Setup (Recommended)

βœ… Recommended: Local setup supports both headless and headed modes. Choose headed mode to see the browser window while crawling (useful for debugging and development).

Step 1: Install Crawl4AI

# Install the package in development mode
pip install -e .

# Install Playwright browser dependencies
python -m playwright install --with-deps chromium

Step 2: Install API Dependencies

# Navigate to the Docker directory and install requirements
cd deploy/docker/
pip install -r requirements.txt

Step 3: Choose Your Configuration

For Headless Mode (No Visual Browser)

cp config_headless.yml config.yml

For Headed Mode (With Visual Browser)

cp config_headed.yml config.yml

Step 4: Start the Server

# Start the API server
gunicorn --bind 0.0.0.0:11235 --workers 1 --threads 4 --timeout 1800 --graceful-timeout 30 --keep-alive 300 --log-level info --worker-class uvicorn.workers.UvicornWorker server:app

πŸš€ Next Steps

Once set up, your MCP agent will be running and ready to handle web crawling requests!

πŸ”— API Endpoints

  • Main API: http://localhost:11235
  • Schema: http://localhost:11235/mcp/schema - View the API schema and available endpoints
  • MCP SSE: http://localhost:11235/mcp/sse - Server-Sent Events endpoint for MCP
  • MCP Streamable HTTP: http://localhost:11235/mcp/http - Streamable HTTP endpoint for MCP (thanks to PR #1212)

πŸ§ͺ Test with MCP Inspector

After setup, test your installation with the MCP Inspector:

# Run the MCP inspector
npx @modelcontextprotocol/inspector

When prompted by the MCP inspector, you can use either transport option:

Option 1: Server-Sent Events (SSE)

  • URL: http://127.0.0.1:11235/mcp/sse
  • Connection type: sse

Option 2: Streamable HTTP

  • URL: http://127.0.0.1:11235/mcp/http
  • Connection type: Streamable HTTP

Once connected, you can test the setup by calling available tools such as:

  • visit_tool - Visit and crawl web pages
  • google_search_markdown - Search Google and get markdown results

This helps verify that your MCP agent is working correctly with the selected website.

About

πŸš€πŸ€– Crawl4AI: Open-source LLM Friendly Web Crawler & Scraper. Don't be shy, join here: https://discord.gg/jP8KfhDhyN

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.4%
  • Other 0.6%