An AI-powered automation tool that controls web browsers through natural language commands
Browser Agent is a sophisticated Python application that enables users to control web browsers through natural language instructions. It uses a combination of AI models, browser automation, and DOM analysis to navigate websites, fill forms, click buttons, and perform complex web tasks based on natural language descriptions.
The agent features intelligent page analysis, robust error handling, and flexible interaction capabilities, making it ideal for automating repetitive browser tasks, web scraping, site testing, and interactive browsing sessions.
Why use Browser Agent?
- Simplify Web Automation - No more complex automation scripts or browser extensions
- Reduce Learning Curve - Use natural language instead of programming syntax
- Improve Productivity - Automate repetitive web tasks with minimal effort
- Enhance Accessibility - Enable browser control for users with limited technical knowledge
- Rapid Prototyping - Quickly test and iterate on web workflows
- 🗣️ Natural Language Control: Control your browser with simple human language instructions
- 🔍 Intelligent Page Analysis: Automatic detection and mapping of interactive elements on web pages
- 🧭 Context-Aware Navigation: Smart navigation with history tracking and state awareness
- 📝 Form Handling: Fill forms, select options from dropdowns, and submit data seamlessly
- ⚡ Dynamic Content Support: Handle AJAX, infinite scrolling, popups, and dynamically loaded content
- 🔄 Error Recovery: Robust error detection and recovery strategies
- 👤 User Interaction: Request information from the user during task execution when needed
- 🌍 Multi-Browser Support: Connect to existing Chrome browsers or launch new instances
- 🔧 Multiple LLM Providers: Support for OpenAI, Azure OpenAI, Groq, and Anthropic
- Python 3.11 or higher
- uv (Python package manager) - recommended - Install uv
- Chrome browser
# Clone the repository
git clone https://github.com/yourusername/browser-agent.git
cd browser-agent
# Install dependencies (recommended: uv)
uv sync
# Install Playwright browsers
uv run playwright install chromium
# Set up environment variables
cp .env.example .env
# Edit .env with your API keys
# Check system status
uv run main.py diagnose
# Launch the agent
uv run main.py run
# Run with a specific task
uv run main.py run --task "Navigate to google.com and search for 'AI news'"
Comprehensive documentation is available in the docs/
folder:
- Installation Guide - Complete installation instructions and troubleshooting
- Quick Start Guide - Get up and running in minutes
- Configuration Guide - Customize the agent for your needs
- CLI Reference - Complete command-line interface documentation
- Usage Examples - Real-world examples and use cases
- Browser Tools Reference - Complete guide to available automation tools
- Architecture Guide - Technical architecture and design patterns
- Troubleshooting Guide - Common issues and solutions
Task | Command |
---|---|
Install | uv sync && uv run playwright install chromium |
Run agent | uv run main.py run |
Check status | uv run main.py diagnose |
Get help | uv run main.py help |
View configuration | uv run main.py config get |
- Web Automation: Automate repetitive web tasks
- Site Testing: Test web applications with natural language
- Data Collection: Extract information from websites
- Interactive Assistance: Guide users through complex processes
- Research Tasks: Gather and analyze information from multiple sources
- Cannot handle CAPTCHA or complex authentication challenges
- May face challenges with highly dynamic web applications
- Not designed for high-security operations (banking, etc.)
- Performance depends on website complexity and instructions
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
Made with ❤️ by rkvalandasu