A comprehensive research tool that integrates multiple MCP servers with Anthropic's Claude for intelligent information gathering and organization.
- Pure MCP Integration: Connects to Brave, Puppeteer, and Notion MCP servers with full AI-driven tool selection
- Anthropic Tool Calling: Uses Claude to intelligently call Brave search and Puppeteer scraping tools
- Content Scraping: Extracts detailed information from web pages using intelligent tool selection
- Vector Storage: Stores research data in a Chroma vector database with semantic search capabilities
- Summarization: Generates comprehensive summaries of research findings
- Notion Integration: Organizes and saves research results in Notion
- Python 3.10+
- Node.js 16+ (for JavaScript-based MCP servers)
- Anthropic API key (Claude 3 Sonnet model access required)
- Brave Search API key
- Notion API token and workspace access
- Clone this repository
- Create and activate a virtual environment:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
Create a .env file in the project root with the following variables:
# Required API Keys
ANTHROPIC_API_KEY=your_anthropic_api_key_here
BRAVE_API_KEY=your_brave_api_key_here
NOTION_TOKEN=your_notion_token_here
# Optional Settings
NOTION_PAGE_ID=your_notion_page_id_here
CHROMA_PERSIST_DIRECTORY=research_db
- ANTHROPIC_API_KEY: Get from Anthropic Console
- BRAVE_API_KEY: Get from Brave Search API
- NOTION_TOKEN: Get from Notion Integrations
- NOTION_PAGE_ID (optional): ID of the Notion page where research will be saved
The application uses ChromaDB for vector storage with semantic search capabilities. The system uses:
- PersistentClient: For reliable on-disk storage
- SentenceTransformer Embeddings: Using the "all-MiniLM-L6-v2" model for high-quality embeddings
- Automatic Data Persistence: Changes are automatically saved to disk
By default, data is stored in a local directory named research_db. You can modify this by setting the CHROMA_PERSIST_DIRECTORY environment variable.
# Optional: Install additional ChromaDB dependencies for better performance
pip install chromadb[all]First-time setup may take longer as the sentence transformer model is downloaded.
You'll need to set up the following MCP servers:
-
Brave Search MCP Server:
npx -y @modelcontextprotocol/server-brave-search
This will initialize the server and provide a path to the server script.
-
Puppeteer MCP Server:
npx -y @modelcontextprotocol/server-puppeteer
-
Notion MCP Server:
npx -y @modelcontextprotocol/server-notion
Make note of the paths to each server script, as you'll need them when running the application.
Run the application with:
python main.py "your research query" \
--brave-server path/to/brave/server.js \
--puppeteer-server path/to/puppeteer/server.js \
--notion-server path/to/notion/server.jspython main.py "The impact of quantum computing on cybersecurity" \
--brave-server ./node_modules/.bin/mcp-server-brave-search \
--puppeteer-server ./node_modules/.bin/mcp-server-puppeteer \
--notion-server ./node_modules/.bin/mcp-server-notion-
MCP Server Connection:
- Connects to each MCP server
- Lists available tools and their parameters
-
Anthropic Tool Calling for Search:
- Converts MCP tool schemas to Anthropic tool format
- Uses Claude to intelligently select and call Brave search tools
- No fallbacks - Claude is fully responsible for tool selection
-
Anthropic Tool Calling for Scraping:
- Analyzes each URL to determine the best scraping approach
- Uses Claude to select appropriate Puppeteer tools based on the content
- Strictly follows MCP philosophy - only scrapes URLs where Claude selects a tool
-
Content Processing:
- Stores all data in the ChromaDB vector database with semantic embeddings
- Generates a comprehensive summary using Claude
-
Result Organization:
- Saves organized research results to Notion
- Maintains a persistent local database of all research data for future queries
The application follows a pure Model Context Protocol (MCP) specification:
- Server Discovery: Connects to MCP servers and discovers available tools
- Tool Schema Conversion: Converts MCP tool schemas to Anthropic tool format
- AI-Driven Tool Selection: Claude has full responsibility for selecting appropriate tools
- Tool Execution: Executes MCP tool calls based on Claude's recommendations without fallbacks
The application uses ChromaDB's PersistentClient with the sentence-transformers embedding model. You can customize these settings:
- Storage Path: Set
CHROMA_PERSIST_DIRECTORYin your.envfile - Embedding Model: The code uses "all-MiniLM-L6-v2" by default, but you can modify it for different models
- Collection Configuration: Metadata and settings can be adjusted in the code
For production use, consider setting up a dedicated directory on persistent storage.
For better Notion integration:
- Create a dedicated Notion integration at Notion Integrations
- Share a specific page or database with your integration
- Copy the page ID and set it as
NOTION_PAGE_IDin your.envfile
- Problem: Unable to connect to MCP servers
- Solution: Ensure servers are running and paths are correct. Try running the server commands in a separate terminal to verify they work.
- Problem: "Missing required environment variables" error
- Solution: Check that all required API keys are set in your
.envfile with the correct variable names.
- Problem: Claude is not selecting tools
- Solution: Ensure you're using Anthropic API version 0.18.0+ and Claude 3 Sonnet or higher model.
- Problem: URLs aren't being scraped
- Solution: Check the logs to see if Claude is selecting tools. If not, try updating the prompts in the code.
- Problem: "Failed to initialize ChromaDB" error
- Solution: Ensure the directory specified in
CHROMA_PERSIST_DIRECTORYis writable. - Problem: Slow first-time startup
- Solution: This is normal as the sentence transformer model downloads. Subsequent runs will be faster.