Skip to content

deaspo/automated_browser

Repository files navigation

🤖 AI Browser Automation Agent & Analyzer

A sophisticated web automation platform that combines AI-powered browser control with intelligent session analysis. This system uses advanced language models to autonomously navigate websites, complete tasks, and provide detailed performance evaluations.

🌟 Features

Core Capabilities

  • 🧠 AI-Powered Browser Agent: Natural language task execution with intelligent decision-making
  • 📹 Comprehensive Session Recording: Full rrweb-based recording of all browser interactions
  • 🔍 Intelligent Performance Analysis: AI-powered evaluation and scoring of task completion
  • 🎮 Interactive Web Dashboard: Modern, responsive interface for task management
  • 📊 Session Replay System: Visual playback of recorded automation sessions
  • 💾 Export & Analytics: Downloadable session data with detailed metadata

Advanced Features

  • 🎯 Token Optimization: Intelligent HTML simplification reduces API costs by up to 90%
  • 🔄 Adaptive Retry Logic: Smart error recovery with contextual retry mechanisms
  • 📈 Performance Scoring: 1-10 similarity scoring with detailed reasoning
  • 🔒 Environment-based Configuration: Secure credential management
  • 📱 Responsive Design: Full mobile and desktop compatibility
  • 🚀 TypeScript Architecture: Type-safe, maintainable codebase

🏗️ Architecture & Technology

Backend Stack

  • Runtime: Node.js with TypeScript
  • Web Framework: Express.js with comprehensive API endpoints
  • Browser Automation: Playwright for cross-browser compatibility
  • AI Integration: Azure OpenAI API with GPT models
  • Session Recording: rrweb for DOM event capture
  • File Management: JSZip for archive creation

Frontend Stack

  • UI Framework: Vanilla JavaScript with modern ES6+ features
  • Styling: CSS Variables with responsive design principles
  • State Management: Class-based architecture with event delegation
  • Real-time Updates: Fetch API with async/await patterns

Key Components

🎯 AIBrowserAgent

Primary automation engine with intelligent task execution

  • Natural language task interpretation
  • Adaptive web navigation strategies
  • Comprehensive error handling and recovery
  • Token-optimized LLM communication
  • Session recording integration

🔍 AIAgentAnalyzer

Intelligent session evaluation system

  • Advanced event preprocessing and analysis
  • Multi-criteria performance scoring
  • Detailed reasoning generation
  • Configurable evaluation parameters

📹 AIAgentBrowserRecorder

Comprehensive session capture system

  • Real-time DOM event recording
  • Automatic file management
  • Metadata generation and storage
  • Safe error handling for problematic pages

🎬 AIAgentBrowserReplay

Visual session playback system

  • Local HTTP server for replay hosting
  • Interactive rrweb player integration
  • Browser automation for seamless viewing
  • Recording validation and error handling

🚀 Quick Start

Prerequisites

  • Node.js 18.0 or higher
  • npm 8.0 or higher
  • Azure OpenAI API access with deployment

1. Installation

# Clone the repository
git clone <repository-url>
cd repository_dir

# Install dependencies
npm install

# Build the project
npm run build

2. Environment Configuration

Create a .env file in the project root:

# Azure OpenAI Configuration
AZURE_OPENAI_API_KEY="your_api_key_here"
AZURE_OPENAI_RESOURCE_NAME="your_resource_name"
AZURE_OPENAI_DEPLOYMENT_NAME="your_deployment_name"
AZURE_OPENAI_API_VERSION="your_model_api_version"

How to get Azure OpenAI credentials:

  1. Create an Azure OpenAI resource in the Azure portal
  2. Deploy a GPT-3.5-turbo or GPT-4 model
  3. Copy the API key, resource name, and deployment name
  4. Use the latest available API version

3. Running the Application

Development Mode (with hot reload)

npm run dev

Production Mode

npm start

Replay Latest Recording

npm run replay

The dashboard will be available at http://localhost:3000

🎮 Usage Guide

Starting Automation Tasks

  1. Navigate to Dashboard: Open http://localhost:3000
  2. Enter Task Description: Provide clear, specific instructions
    Example: "Go to Wikipedia, search for 'Machine Learning',
    click on the first result, and scroll to the Applications section"
    
  3. Set Initial URL: Starting point for the automation
  4. Configure Max Retries: Number of attempts (1-50)
  5. Launch Agent: Monitor progress in console logs

Reviewing Results

  • Recordings Table: View all completed sessions with AI analysis
  • Performance Scores: 0-100% similarity ratings with reasoning
  • Session Replay: Visual playback with interactive controls
  • Download Archives: Complete session data in ZIP format

Advanced Features

Session Analysis

Each recording includes:

  • Task Summary: AI-generated description of actions taken
  • Similarity Score: Performance rating (1-10 scale)
  • Detailed Reasoning: Explanation of scoring decision
  • Timestamp & Metadata: Complete session information

Replay Controls

  • Play/pause/speed controls
  • Timeline scrubbing
  • Step-by-step navigation
  • Full-screen viewing

🧪 Testing Scenarios

E-commerce Automation

Task: "Navigate to Amazon, search for 'wireless headphones',
filter by ratings above 4 stars, and add the first result to cart"
URL: https://amazon.com

Information Gathering

Task: "Search Wikipedia for 'Quantum Computing', navigate to
the History section, and find the first mention of IBM"
URL: https://wikipedia.org

Form Interactions

Task: "Go to the contact page, fill out the form with test data,
and submit the inquiry"
URL: https://example-company.com

Social Media

Task: "Navigate to GitHub, search for 'playwright', and star
the official Microsoft repository"
URL: https://github.com

🔧 Configuration

Environment Variables

Variable Description Required
AZURE_OPENAI_API_KEY Azure OpenAI API authentication key
AZURE_OPENAI_RESOURCE_NAME Azure resource identifier
AZURE_OPENAI_DEPLOYMENT_NAME Model deployment name
AZURE_OPENAI_API_VERSION API version (recommend latest)

Application Settings

Recording Configuration:

  • Default recordings per page: 5
  • Maximum events analyzed: 100
  • Recording format: .vbrec (JSON)
  • Metadata format: .json

Agent Parameters:

  • Default max retries: 10
  • Browser timeout: 10 seconds
  • Navigation timeout: 60 seconds
  • Element wait timeout: 5 seconds

📁 Project Structure

src/
├── classes/                    # Core TypeScript classes
│   ├── AIBrowserAgent.ts      # Main automation engine
│   ├── AIAgentAnalyzer.ts     # Performance analysis system
│   ├── AIAgentBrowserRecorder.ts # Session recording
│   └── AIAgentBrowserReplay.ts   # Session replay system
├── ui/
│   └── homepage.ts            # Dashboard HTML generator
├── main.ts                    # Express server & API routes
└── replay.ts                  # Standalone replay utility

recordings/                     # Generated session files
├── *.vbrec                    # Binary recording files
└── *.json                     # Session metadata

Configuration Files:
├── package.json               # Dependencies & scripts
├── tsconfig.json             # TypeScript configuration
├── eslint.config.mjs         # Linting rules
└── .env                      # Environment variables

🚦 API Endpoints

POST /start-agent

Start a new automation task

{
	"task": "Natural language task description",
	"url": "https://starting-url.com",
	"maxRetries": 10
}

GET /recordings?page=1

Retrieve paginated recordings list

{
  "recordings": [...],
  "totalPages": 5,
  "currentPage": 1
}

POST /replay

Start session replay

{
	"filename": "recording-file.vbrec"
}

GET /download/:filename

Download session archive Returns ZIP file containing recording and metadata

🛠️ Development

Code Style & Standards

  • TypeScript: Strict type checking enabled
  • ESLint: Comprehensive linting with custom rules
  • Code Organization: Class-based architecture
  • Documentation: JSDoc comments for all public methods
  • Error Handling: Comprehensive try-catch blocks

Building & Testing

# Development build with watch mode
npm run dev

# Production build
npm run build

# Lint checking
npm run lint

# Type checking
npm run type-check

Contributing Guidelines

  1. Follow existing code style and documentation patterns
  2. Add comprehensive JSDoc comments for new methods
  3. Include error handling for all external API calls
  4. Test changes with multiple browser automation scenarios
  5. Update README for any new features or configuration options

🔍 Troubleshooting

Common Issues

Agent fails to start

  • Verify Azure OpenAI credentials in .env
  • Check API quota and deployment status
  • Ensure proper permissions for recordings directory

Recording playback fails

  • Verify recording file integrity
  • Check browser popup blockers
  • Ensure sufficient system memory

Task execution errors

  • Review browser console for detailed error messages
  • Verify target website accessibility
  • Check for anti-automation measures (CAPTCHA, rate limiting)

Performance optimization

  • Adjust maxRetries based on task complexity
  • Use specific, actionable task descriptions
  • Test with simpler scenarios first

Debug Mode

Enable detailed logging by setting console output level:

console.log('Debug mode enabled');

📄 License

This project is licensed under the MIT License. See LICENSE file for details.

🤝 Support & Community

  • Issues: Report bugs and feature requests via GitHub Issues
  • Discussions: Join community discussions for tips and best practices
  • Documentation: Comprehensive inline documentation and examples
  • Updates: Regular updates with new features and improvements

Built with ❤️ using TypeScript, Playwright, and Azure OpenAI

About

A sophisticated web automation platform that combines AI-powered browser control with intelligent session analysis. This system uses advanced language models to autonomously navigate websites, complete tasks, and provide detailed performance evaluations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors