Skip to content

A smart AI agent that controls your browser with natural language, using Playwright, LangChain, and advanced LLMs to navigate, analyze, and perform tasks.

License

Notifications You must be signed in to change notification settings

rkvalandas/browser_agent

Repository files navigation

🌐 Browser Agent

Python License

An AI-powered automation tool that controls web browsers through natural language commands

AI Browser Navigation Demo

AI Browser Navigation Demo

Click the image above to watch the demo video

📋 Table of Contents

🔍 Overview

Browser Agent is a sophisticated Python application that enables users to control web browsers through natural language instructions. It uses a combination of AI models, browser automation, and DOM analysis to navigate websites, fill forms, click buttons, and perform complex web tasks based on natural language descriptions.

The agent features intelligent page analysis, robust error handling, and flexible interaction capabilities, making it ideal for automating repetitive browser tasks, web scraping, site testing, and interactive browsing sessions.

Why use Browser Agent?
  • Simplify Web Automation - No more complex automation scripts or browser extensions
  • Reduce Learning Curve - Use natural language instead of programming syntax
  • Improve Productivity - Automate repetitive web tasks with minimal effort
  • Enhance Accessibility - Enable browser control for users with limited technical knowledge
  • Rapid Prototyping - Quickly test and iterate on web workflows

✨ Features

  • 🗣️ Natural Language Control: Control your browser with simple human language instructions
  • 🔍 Intelligent Page Analysis: Automatic detection and mapping of interactive elements on web pages
  • 🧭 Context-Aware Navigation: Smart navigation with history tracking and state awareness
  • 📝 Form Handling: Fill forms, select options from dropdowns, and submit data seamlessly
  • ⚡ Dynamic Content Support: Handle AJAX, infinite scrolling, popups, and dynamically loaded content
  • 🔄 Error Recovery: Robust error detection and recovery strategies
  • 👤 User Interaction: Request information from the user during task execution when needed
  • 🌍 Multi-Browser Support: Connect to existing Chrome browsers or launch new instances
  • 🔧 Multiple LLM Providers: Support for OpenAI, Azure OpenAI, Groq, and Anthropic

🚀 Quick Start

Prerequisites

  • Python 3.11 or higher
  • uv (Python package manager) - recommended - Install uv
  • Chrome browser

Installation

# Clone the repository
git clone https://github.com/yourusername/browser-agent.git
cd browser-agent

# Install dependencies (recommended: uv)
uv sync

# Install Playwright browsers
uv run playwright install chromium

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

Basic Usage

# Check system status
uv run main.py diagnose

# Launch the agent
uv run main.py run

# Run with a specific task
uv run main.py run --task "Navigate to google.com and search for 'AI news'"

📚 Documentation

Comprehensive documentation is available in the docs/ folder:

Getting Started

Usage and Examples

Advanced Topics

Quick Reference

Task Command
Install uv sync && uv run playwright install chromium
Run agent uv run main.py run
Check status uv run main.py diagnose
Get help uv run main.py help
View configuration uv run main.py config get

💼 Use Cases

  • Web Automation: Automate repetitive web tasks
  • Site Testing: Test web applications with natural language
  • Data Collection: Extract information from websites
  • Interactive Assistance: Guide users through complex processes
  • Research Tasks: Gather and analyze information from multiple sources

⚠️ Limitations

  • Cannot handle CAPTCHA or complex authentication challenges
  • May face challenges with highly dynamic web applications
  • Not designed for high-security operations (banking, etc.)
  • Performance depends on website complexity and instructions

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with ❤️ by rkvalandasu

About

A smart AI agent that controls your browser with natural language, using Playwright, LangChain, and advanced LLMs to navigate, analyze, and perform tasks.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages