🤖 AI Browser Automation Agent & Analyzer

A sophisticated web automation platform that combines AI-powered browser control with intelligent session analysis. This system uses advanced language models to autonomously navigate websites, complete tasks, and provide detailed performance evaluations.

🌟 Features

Core Capabilities

🧠 AI-Powered Browser Agent: Natural language task execution with intelligent decision-making
📹 Comprehensive Session Recording: Full rrweb-based recording of all browser interactions
🔍 Intelligent Performance Analysis: AI-powered evaluation and scoring of task completion
🎮 Interactive Web Dashboard: Modern, responsive interface for task management
📊 Session Replay System: Visual playback of recorded automation sessions
💾 Export & Analytics: Downloadable session data with detailed metadata

Advanced Features

🎯 Token Optimization: Intelligent HTML simplification reduces API costs by up to 90%
🔄 Adaptive Retry Logic: Smart error recovery with contextual retry mechanisms
📈 Performance Scoring: 1-10 similarity scoring with detailed reasoning
🔒 Environment-based Configuration: Secure credential management
📱 Responsive Design: Full mobile and desktop compatibility
🚀 TypeScript Architecture: Type-safe, maintainable codebase

🏗️ Architecture & Technology

Backend Stack

Runtime: Node.js with TypeScript
Web Framework: Express.js with comprehensive API endpoints
Browser Automation: Playwright for cross-browser compatibility
AI Integration: Azure OpenAI API with GPT models
Session Recording: rrweb for DOM event capture
File Management: JSZip for archive creation

Frontend Stack

UI Framework: Vanilla JavaScript with modern ES6+ features
Styling: CSS Variables with responsive design principles
State Management: Class-based architecture with event delegation
Real-time Updates: Fetch API with async/await patterns

Key Components

🎯 AIBrowserAgent

Primary automation engine with intelligent task execution

Natural language task interpretation
Adaptive web navigation strategies
Comprehensive error handling and recovery
Token-optimized LLM communication
Session recording integration

🔍 AIAgentAnalyzer

Intelligent session evaluation system

Advanced event preprocessing and analysis
Multi-criteria performance scoring
Detailed reasoning generation
Configurable evaluation parameters

📹 AIAgentBrowserRecorder

Comprehensive session capture system

Real-time DOM event recording
Automatic file management
Metadata generation and storage
Safe error handling for problematic pages

🎬 AIAgentBrowserReplay

Visual session playback system

Local HTTP server for replay hosting
Interactive rrweb player integration
Browser automation for seamless viewing
Recording validation and error handling

🚀 Quick Start

Prerequisites

Node.js 18.0 or higher
npm 8.0 or higher
Azure OpenAI API access with deployment

1. Installation

# Clone the repository
git clone <repository-url>
cd repository_dir

# Install dependencies
npm install

# Build the project
npm run build

2. Environment Configuration

Create a .env file in the project root:

# Azure OpenAI Configuration
AZURE_OPENAI_API_KEY="your_api_key_here"
AZURE_OPENAI_RESOURCE_NAME="your_resource_name"
AZURE_OPENAI_DEPLOYMENT_NAME="your_deployment_name"
AZURE_OPENAI_API_VERSION="your_model_api_version"

How to get Azure OpenAI credentials:

Create an Azure OpenAI resource in the Azure portal
Deploy a GPT-3.5-turbo or GPT-4 model
Copy the API key, resource name, and deployment name
Use the latest available API version

3. Running the Application

Development Mode (with hot reload)

npm run dev

Production Mode

npm start

Replay Latest Recording

npm run replay

The dashboard will be available at http://localhost:3000

🎮 Usage Guide

Starting Automation Tasks

Navigate to Dashboard: Open http://localhost:3000

Enter Task Description: Provide clear, specific instructions

Example: "Go to Wikipedia, search for 'Machine Learning',
click on the first result, and scroll to the Applications section"

Set Initial URL: Starting point for the automation
Configure Max Retries: Number of attempts (1-50)
Launch Agent: Monitor progress in console logs

Reviewing Results

Recordings Table: View all completed sessions with AI analysis
Performance Scores: 0-100% similarity ratings with reasoning
Session Replay: Visual playback with interactive controls
Download Archives: Complete session data in ZIP format

Advanced Features

Session Analysis

Each recording includes:

Task Summary: AI-generated description of actions taken
Similarity Score: Performance rating (1-10 scale)
Detailed Reasoning: Explanation of scoring decision
Timestamp & Metadata: Complete session information

Replay Controls

Play/pause/speed controls
Timeline scrubbing
Step-by-step navigation
Full-screen viewing

🧪 Testing Scenarios

E-commerce Automation

Task: "Navigate to Amazon, search for 'wireless headphones',
filter by ratings above 4 stars, and add the first result to cart"
URL: https://amazon.com

Information Gathering

Task: "Search Wikipedia for 'Quantum Computing', navigate to
the History section, and find the first mention of IBM"
URL: https://wikipedia.org

Form Interactions

Task: "Go to the contact page, fill out the form with test data,
and submit the inquiry"
URL: https://example-company.com

Social Media

Task: "Navigate to GitHub, search for 'playwright', and star
the official Microsoft repository"
URL: https://github.com

🔧 Configuration

Environment Variables

Variable	Description	Required
`AZURE_OPENAI_API_KEY`	Azure OpenAI API authentication key	✅
`AZURE_OPENAI_RESOURCE_NAME`	Azure resource identifier	✅
`AZURE_OPENAI_DEPLOYMENT_NAME`	Model deployment name	✅
`AZURE_OPENAI_API_VERSION`	API version (recommend latest)	✅

Application Settings

Recording Configuration:

Default recordings per page: 5
Maximum events analyzed: 100
Recording format: .vbrec (JSON)
Metadata format: .json

Agent Parameters:

Default max retries: 10
Browser timeout: 10 seconds
Navigation timeout: 60 seconds
Element wait timeout: 5 seconds

📁 Project Structure

src/
├── classes/                    # Core TypeScript classes
│   ├── AIBrowserAgent.ts      # Main automation engine
│   ├── AIAgentAnalyzer.ts     # Performance analysis system
│   ├── AIAgentBrowserRecorder.ts # Session recording
│   └── AIAgentBrowserReplay.ts   # Session replay system
├── ui/
│   └── homepage.ts            # Dashboard HTML generator
├── main.ts                    # Express server & API routes
└── replay.ts                  # Standalone replay utility

recordings/                     # Generated session files
├── *.vbrec                    # Binary recording files
└── *.json                     # Session metadata

Configuration Files:
├── package.json               # Dependencies & scripts
├── tsconfig.json             # TypeScript configuration
├── eslint.config.mjs         # Linting rules
└── .env                      # Environment variables

🚦 API Endpoints

POST `/start-agent`

Start a new automation task

{
	"task": "Natural language task description",
	"url": "https://starting-url.com",
	"maxRetries": 10
}

GET `/recordings?page=1`

Retrieve paginated recordings list

{
  "recordings": [...],
  "totalPages": 5,
  "currentPage": 1
}

POST `/replay`

Start session replay

{
	"filename": "recording-file.vbrec"
}

GET `/download/:filename`

Download session archive Returns ZIP file containing recording and metadata

🛠️ Development

Code Style & Standards

TypeScript: Strict type checking enabled
ESLint: Comprehensive linting with custom rules
Code Organization: Class-based architecture
Documentation: JSDoc comments for all public methods
Error Handling: Comprehensive try-catch blocks

Building & Testing

# Development build with watch mode
npm run dev

# Production build
npm run build

# Lint checking
npm run lint

# Type checking
npm run type-check

Contributing Guidelines

Follow existing code style and documentation patterns
Add comprehensive JSDoc comments for new methods
Include error handling for all external API calls
Test changes with multiple browser automation scenarios
Update README for any new features or configuration options

🔍 Troubleshooting

Common Issues

Agent fails to start

Verify Azure OpenAI credentials in .env
Check API quota and deployment status
Ensure proper permissions for recordings directory

Recording playback fails

Verify recording file integrity
Check browser popup blockers
Ensure sufficient system memory

Task execution errors

Review browser console for detailed error messages
Verify target website accessibility
Check for anti-automation measures (CAPTCHA, rate limiting)

Performance optimization

Adjust maxRetries based on task complexity
Use specific, actionable task descriptions
Test with simpler scenarios first

Debug Mode

Enable detailed logging by setting console output level:

console.log('Debug mode enabled');

📄 License

This project is licensed under the MIT License. See LICENSE file for details.

🤝 Support & Community

Issues: Report bugs and feature requests via GitHub Issues
Discussions: Join community discussions for tips and best practices
Documentation: Comprehensive inline documentation and examples
Updates: Regular updates with new features and improvements

Built with ❤️ using TypeScript, Playwright, and Azure OpenAI

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
.husky/_		.husky/_
docs		docs
src		src
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
REFACTORING_SUMMARY.md		REFACTORING_SUMMARY.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

🤖 AI Browser Automation Agent & Analyzer

🌟 Features

Core Capabilities

Advanced Features

🏗️ Architecture & Technology

Backend Stack

Frontend Stack

Key Components

🎯 AIBrowserAgent

🔍 AIAgentAnalyzer

📹 AIAgentBrowserRecorder

🎬 AIAgentBrowserReplay

🚀 Quick Start

Prerequisites

1. Installation

2. Environment Configuration

3. Running the Application

Development Mode (with hot reload)

Production Mode

Replay Latest Recording

🎮 Usage Guide

Starting Automation Tasks

Reviewing Results

Advanced Features

Session Analysis

Replay Controls

🧪 Testing Scenarios

E-commerce Automation

Information Gathering

Form Interactions

Social Media

🔧 Configuration

Environment Variables

Application Settings

📁 Project Structure

🚦 API Endpoints

POST /start-agent

GET /recordings?page=1

POST /replay

GET /download/:filename

🛠️ Development

Code Style & Standards

Building & Testing

Contributing Guidelines

🔍 Troubleshooting

Common Issues

Debug Mode

📄 License

🤝 Support & Community

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

POST `/start-agent`

GET `/recordings?page=1`

POST `/replay`

GET `/download/:filename`

Packages