🌐 Browser Agent

An AI-powered automation tool that controls web browsers through natural language commands

AI Browser Navigation Demo

Click the image above to watch the demo video

📋 Table of Contents

Overview
Features
Quick Start
Documentation
Contributing
License

🔍 Overview

Browser Agent is a sophisticated Python application that enables users to control web browsers through natural language instructions. It uses a combination of AI models, browser automation, and DOM analysis to navigate websites, fill forms, click buttons, and perform complex web tasks based on natural language descriptions.

The agent features intelligent page analysis, robust error handling, and flexible interaction capabilities, making it ideal for automating repetitive browser tasks, web scraping, site testing, and interactive browsing sessions.

Why use Browser Agent?

Simplify Web Automation - No more complex automation scripts or browser extensions
Reduce Learning Curve - Use natural language instead of programming syntax
Improve Productivity - Automate repetitive web tasks with minimal effort
Enhance Accessibility - Enable browser control for users with limited technical knowledge
Rapid Prototyping - Quickly test and iterate on web workflows

✨ Features

🗣️ Natural Language Control: Control your browser with simple human language instructions
🔍 Intelligent Page Analysis: Automatic detection and mapping of interactive elements on web pages
🧭 Context-Aware Navigation: Smart navigation with history tracking and state awareness
📝 Form Handling: Fill forms, select options from dropdowns, and submit data seamlessly
⚡ Dynamic Content Support: Handle AJAX, infinite scrolling, popups, and dynamically loaded content
🔄 Error Recovery: Robust error detection and recovery strategies
👤 User Interaction: Request information from the user during task execution when needed
🌍 Multi-Browser Support: Connect to existing Chrome browsers or launch new instances
🔧 Multiple LLM Providers: Support for OpenAI, Azure OpenAI, Groq, and Anthropic

🚀 Quick Start

Prerequisites

Python 3.11 or higher
uv (Python package manager) - recommended - Install uv
Chrome browser

Installation

# Clone the repository
git clone https://github.com/yourusername/browser-agent.git
cd browser-agent

# Install dependencies (recommended: uv)
uv sync

# Install Playwright browsers
uv run playwright install chromium

# Set up environment variables
cp .env.example .env
# Edit .env with your API keys

Basic Usage

# Check system status
uv run main.py diagnose

# Launch the agent
uv run main.py run

# Run with a specific task
uv run main.py run --task "Navigate to google.com and search for 'AI news'"

📚 Documentation

Comprehensive documentation is available in the docs/ folder:

Getting Started

Installation Guide - Complete installation instructions and troubleshooting
Quick Start Guide - Get up and running in minutes
Configuration Guide - Customize the agent for your needs

Usage and Examples

CLI Reference - Complete command-line interface documentation
Usage Examples - Real-world examples and use cases
Browser Tools Reference - Complete guide to available automation tools

Advanced Topics

Architecture Guide - Technical architecture and design patterns
Troubleshooting Guide - Common issues and solutions

Quick Reference

Task	Command
Install	`uv sync && uv run playwright install chromium`
Run agent	`uv run main.py run`
Check status	`uv run main.py diagnose`
Get help	`uv run main.py help`
View configuration	`uv run main.py config get`

💼 Use Cases

Web Automation: Automate repetitive web tasks
Site Testing: Test web applications with natural language
Data Collection: Extract information from websites
Interactive Assistance: Guide users through complex processes
Research Tasks: Gather and analyze information from multiple sources

⚠️ Limitations

Cannot handle CAPTCHA or complex authentication challenges
May face challenges with highly dynamic web applications
Not designed for high-security operations (banking, etc.)
Performance depends on website complexity and instructions

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Made with ❤️ by rkvalandasu

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
agent		agent
browser		browser
cli		cli
configurations		configurations
docs		docs
examples		examples
images		images
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
Readme.md		Readme.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🌐 Browser Agent

AI Browser Navigation Demo

📋 Table of Contents

🔍 Overview

✨ Features

🚀 Quick Start

Prerequisites

Installation

Basic Usage

📚 Documentation

Getting Started

Usage and Examples

Advanced Topics

Quick Reference

💼 Use Cases

⚠️ Limitations

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

rkvalandas/browser_agent

Folders and files

Latest commit

History

Repository files navigation

🌐 Browser Agent

AI Browser Navigation Demo

📋 Table of Contents

🔍 Overview

✨ Features

🚀 Quick Start

Prerequisites

Installation

Basic Usage

📚 Documentation

Getting Started

Usage and Examples

Advanced Topics

Quick Reference

💼 Use Cases

⚠️ Limitations

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages