A sophisticated web automation platform that combines AI-powered browser control with intelligent session analysis. This system uses advanced language models to autonomously navigate websites, complete tasks, and provide detailed performance evaluations.
- 🧠 AI-Powered Browser Agent: Natural language task execution with intelligent decision-making
- 📹 Comprehensive Session Recording: Full rrweb-based recording of all browser interactions
- 🔍 Intelligent Performance Analysis: AI-powered evaluation and scoring of task completion
- 🎮 Interactive Web Dashboard: Modern, responsive interface for task management
- 📊 Session Replay System: Visual playback of recorded automation sessions
- 💾 Export & Analytics: Downloadable session data with detailed metadata
- 🎯 Token Optimization: Intelligent HTML simplification reduces API costs by up to 90%
- 🔄 Adaptive Retry Logic: Smart error recovery with contextual retry mechanisms
- 📈 Performance Scoring: 1-10 similarity scoring with detailed reasoning
- 🔒 Environment-based Configuration: Secure credential management
- 📱 Responsive Design: Full mobile and desktop compatibility
- 🚀 TypeScript Architecture: Type-safe, maintainable codebase
- Runtime: Node.js with TypeScript
- Web Framework: Express.js with comprehensive API endpoints
- Browser Automation: Playwright for cross-browser compatibility
- AI Integration: Azure OpenAI API with GPT models
- Session Recording: rrweb for DOM event capture
- File Management: JSZip for archive creation
- UI Framework: Vanilla JavaScript with modern ES6+ features
- Styling: CSS Variables with responsive design principles
- State Management: Class-based architecture with event delegation
- Real-time Updates: Fetch API with async/await patterns
Primary automation engine with intelligent task execution
- Natural language task interpretation
- Adaptive web navigation strategies
- Comprehensive error handling and recovery
- Token-optimized LLM communication
- Session recording integration
Intelligent session evaluation system
- Advanced event preprocessing and analysis
- Multi-criteria performance scoring
- Detailed reasoning generation
- Configurable evaluation parameters
Comprehensive session capture system
- Real-time DOM event recording
- Automatic file management
- Metadata generation and storage
- Safe error handling for problematic pages
Visual session playback system
- Local HTTP server for replay hosting
- Interactive rrweb player integration
- Browser automation for seamless viewing
- Recording validation and error handling
- Node.js 18.0 or higher
- npm 8.0 or higher
- Azure OpenAI API access with deployment
# Clone the repository
git clone <repository-url>
cd repository_dir
# Install dependencies
npm install
# Build the project
npm run buildCreate a .env file in the project root:
# Azure OpenAI Configuration
AZURE_OPENAI_API_KEY="your_api_key_here"
AZURE_OPENAI_RESOURCE_NAME="your_resource_name"
AZURE_OPENAI_DEPLOYMENT_NAME="your_deployment_name"
AZURE_OPENAI_API_VERSION="your_model_api_version"How to get Azure OpenAI credentials:
- Create an Azure OpenAI resource in the Azure portal
- Deploy a GPT-3.5-turbo or GPT-4 model
- Copy the API key, resource name, and deployment name
- Use the latest available API version
npm run devnpm startnpm run replayThe dashboard will be available at http://localhost:3000
- Navigate to Dashboard: Open
http://localhost:3000 - Enter Task Description: Provide clear, specific instructions
Example: "Go to Wikipedia, search for 'Machine Learning', click on the first result, and scroll to the Applications section" - Set Initial URL: Starting point for the automation
- Configure Max Retries: Number of attempts (1-50)
- Launch Agent: Monitor progress in console logs
- Recordings Table: View all completed sessions with AI analysis
- Performance Scores: 0-100% similarity ratings with reasoning
- Session Replay: Visual playback with interactive controls
- Download Archives: Complete session data in ZIP format
Each recording includes:
- Task Summary: AI-generated description of actions taken
- Similarity Score: Performance rating (1-10 scale)
- Detailed Reasoning: Explanation of scoring decision
- Timestamp & Metadata: Complete session information
- Play/pause/speed controls
- Timeline scrubbing
- Step-by-step navigation
- Full-screen viewing
Task: "Navigate to Amazon, search for 'wireless headphones',
filter by ratings above 4 stars, and add the first result to cart"
URL: https://amazon.com
Task: "Search Wikipedia for 'Quantum Computing', navigate to
the History section, and find the first mention of IBM"
URL: https://wikipedia.org
Task: "Go to the contact page, fill out the form with test data,
and submit the inquiry"
URL: https://example-company.com
Task: "Navigate to GitHub, search for 'playwright', and star
the official Microsoft repository"
URL: https://github.com
| Variable | Description | Required |
|---|---|---|
AZURE_OPENAI_API_KEY |
Azure OpenAI API authentication key | ✅ |
AZURE_OPENAI_RESOURCE_NAME |
Azure resource identifier | ✅ |
AZURE_OPENAI_DEPLOYMENT_NAME |
Model deployment name | ✅ |
AZURE_OPENAI_API_VERSION |
API version (recommend latest) | ✅ |
Recording Configuration:
- Default recordings per page: 5
- Maximum events analyzed: 100
- Recording format: .vbrec (JSON)
- Metadata format: .json
Agent Parameters:
- Default max retries: 10
- Browser timeout: 10 seconds
- Navigation timeout: 60 seconds
- Element wait timeout: 5 seconds
src/
├── classes/ # Core TypeScript classes
│ ├── AIBrowserAgent.ts # Main automation engine
│ ├── AIAgentAnalyzer.ts # Performance analysis system
│ ├── AIAgentBrowserRecorder.ts # Session recording
│ └── AIAgentBrowserReplay.ts # Session replay system
├── ui/
│ └── homepage.ts # Dashboard HTML generator
├── main.ts # Express server & API routes
└── replay.ts # Standalone replay utility
recordings/ # Generated session files
├── *.vbrec # Binary recording files
└── *.json # Session metadata
Configuration Files:
├── package.json # Dependencies & scripts
├── tsconfig.json # TypeScript configuration
├── eslint.config.mjs # Linting rules
└── .env # Environment variables
Start a new automation task
{
"task": "Natural language task description",
"url": "https://starting-url.com",
"maxRetries": 10
}Retrieve paginated recordings list
{
"recordings": [...],
"totalPages": 5,
"currentPage": 1
}Start session replay
{
"filename": "recording-file.vbrec"
}Download session archive Returns ZIP file containing recording and metadata
- TypeScript: Strict type checking enabled
- ESLint: Comprehensive linting with custom rules
- Code Organization: Class-based architecture
- Documentation: JSDoc comments for all public methods
- Error Handling: Comprehensive try-catch blocks
# Development build with watch mode
npm run dev
# Production build
npm run build
# Lint checking
npm run lint
# Type checking
npm run type-check- Follow existing code style and documentation patterns
- Add comprehensive JSDoc comments for new methods
- Include error handling for all external API calls
- Test changes with multiple browser automation scenarios
- Update README for any new features or configuration options
Agent fails to start
- Verify Azure OpenAI credentials in
.env - Check API quota and deployment status
- Ensure proper permissions for recordings directory
Recording playback fails
- Verify recording file integrity
- Check browser popup blockers
- Ensure sufficient system memory
Task execution errors
- Review browser console for detailed error messages
- Verify target website accessibility
- Check for anti-automation measures (CAPTCHA, rate limiting)
Performance optimization
- Adjust
maxRetriesbased on task complexity - Use specific, actionable task descriptions
- Test with simpler scenarios first
Enable detailed logging by setting console output level:
console.log('Debug mode enabled');This project is licensed under the MIT License. See LICENSE file for details.
- Issues: Report bugs and feature requests via GitHub Issues
- Discussions: Join community discussions for tips and best practices
- Documentation: Comprehensive inline documentation and examples
- Updates: Regular updates with new features and improvements
Built with ❤️ using TypeScript, Playwright, and Azure OpenAI