An intelligent web automation agent that combines OpenAI's GPT-4 with Playwright browser automation to perform complex web tasks with visual feedback and interactive controls.
๐ Watch Live Demo on X/Twitter
See the AI agent in action automating real websites with intelligent form filling and navigation!
- Task Analysis: Uses GPT-4 to understand and break down automation tasks
- Smart Element Detection: Automatically finds and matches form elements with task requirements
- Dynamic Action Planning: Creates step-by-step execution plans based on page content
- Adaptive Execution: Handles various website layouts and structures
- Real-time Status Updates: Visual badges showing current automation status
- Progress Indicators: Progress bars for multi-step operations
- Click Animations: Visual indicators showing where actions are performed
- Loading Spinners: Professional loaders during AI processing and wait times
- Element Highlighting: Visual feedback for targeted elements
- Fast Navigation: Optimized browser settings for quick page loads
- Reduced Timeouts: Streamlined wait times without sacrificing reliability
- Efficient AI Calls: Minimized token usage with focused prompts
- Smart Caching: Reuses browser instances when possible
- Mid-size Browser Window: Optimized 1200x800 viewport with 80% zoom
- Interactive Input System: Simple automation task input
- Comprehensive Logging: Detailed execution logs with timing information
- Error Handling: Robust error recovery and reporting
node.js >= 18.0.0
npm or yarn package manager
OpenAI API key- Clone the repository:
git clone https://github.com/atish23/ai-web-automation-agent.git
cd ai-web-automation-agent- Install dependencies:
npm install- Set up environment variables:
# Create .env file
echo "OPENAI_API_KEY=your_openai_api_key_here" > .envnode ai-web-automation-agent.jsWhen you run the agent, you'll see a clean interface:
๐ฏ What would you like to do today?
โฏ ๐ค Automate Website (describe any automation task)
๐ Exit (goodbye)
Task: "Go to ui.chaicode.com and fill signup form"
Task: "Navigate to YouTube and search for AI tutorials"
Task: "Go to GitHub and search for automation projects"
Task: "Visit Amazon and search for laptops"
The project follows a modular architecture for maintainability and extensibility:
ai-web-automation-agent/
โโโ classes/ # Core utility classes
โ โโโ Logger.js # Enhanced logging with colors & formatting
โ โโโ Timer.js # Performance timing measurements
โ โโโ BrowserUIAnimator.js # Visual feedback & animations
โ โโโ BeautifulCLI.js # CLI styling & user interaction
โโโ tools/ # Modular automation tools
โ โโโ analyzeTaskTool.js # AI-powered task analysis
โ โโโ openBrowserTool.js # Browser initialization
โ โโโ analyzeFormDOMTool.js # DOM-based form analysis
โ โโโ findRelevantElementsTool.js # Smart element discovery
โ โโโ matchTaskWithElementsTool.js # AI action plan creation
โ โโโ clickAtCoordinatesTool.js # Precise click automation
โ โโโ fillFieldAtCoordinatesTool.js # Form field filling
โ โโโ executeActionPlanTool.js # Action execution
โ โโโ checkExecutionStatusTool.js # Status verification
โโโ ui/ # User interface components
โ โโโ UserInterface.js # Main UI orchestration
โโโ ai-web-automation-agent.js # Main application entry point
โโโ package.json # Dependencies and scripts
โโโ README.md # This file
Handles all visual feedback and animations:
// Show loading spinner
await BrowserUIAnimator.showLoader(page, 'Processing...');
// Display status updates
await BrowserUIAnimator.showStatusBadge(page, 'Form filled!');
// Animate clicks
await BrowserUIAnimator.showClickAnimation(page, x, y);Each automation capability is isolated in its own tool:
- analyzeTaskTool: AI-powered task understanding
- openBrowserTool: Optimized browser setup
- analyzeFormDOMTool: Form detection & analysis
- findRelevantElementsTool: Smart element discovery
- matchTaskWithElementsTool: AI action planning
- executeActionPlanTool: Step-by-step execution
- clickAtCoordinatesTool: Precise interactions
- fillFieldAtCoordinatesTool: Form filling
- checkExecutionStatusTool: Success verification
Logger.info('Task started', { task });
Logger.tool('click_at_coordinates', 'completed', duration, { x, y });
Logger.success('Automation completed successfully');Manages startup animations, user input, and graceful shutdown
- Window Size: 1200x800 (mid-size for optimal viewing)
- Zoom Level: 80% (zoomed out for better overview)
- Position: Offset from top-left corner
- Performance: Optimized Chrome flags for faster automation
The agent includes a comprehensive animation system that provides:
- Status Badges: Slide-in notifications showing current status
- Progress Bars: Visual progress tracking for multi-step operations
- Loading Spinners: Professional loading indicators during AI processing
- Click Animations: Ripple effects showing where clicks occur
- Element Highlighting: Visual indicators for targeted elements
All animations use modern CSS with:
- Smooth transitions and easing functions
- Backdrop blur effects for modern appearance
- Auto-cleanup to prevent visual clutter
- Responsive design for various screen sizes
OPENAI_API_KEY=your_openai_api_key_here # Required: OpenAI API access
NODE_ENV=development # Optional: Environment settingThe agent launches Chrome with optimized settings:
const browserArgs = [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-web-security',
'--window-size=1200,800',
'--window-position=100,50',
'--force-device-scale-factor=0.8',
'--disable-blink-features=AutomationControlled',
'--disable-features=VizDisplayCompositor'
];The agent includes comprehensive error handling:
- Network timeouts: Graceful handling of slow-loading pages
- Element not found: Automatic retry and alternative strategies
- AI API errors: Fallback mechanisms and error reporting
- Browser crashes: Automatic browser restart and recovery
- Graceful shutdown: Clean browser closure on exit signals
- โก 50% faster navigation with
domcontentloadedwait strategy - ๐ฏ 60% reduced wait times with optimized timeouts
- ๐ง Minimal token usage with focused AI prompts
- ๐ฑ๏ธ Instant interactions with reduced animation delays
- Browser launch: ~2-3 seconds
- Page navigation: ~1-2 seconds
- AI task analysis: ~2-4 seconds
- Action plan creation: ~3-5 seconds
- Form filling: ~1-2 seconds per field
- GPT-4 Vision: Can analyze page screenshots when needed
- Context Awareness: Maintains conversation context across operations
- Adaptive Prompting: Adjusts AI prompts based on task complexity
- Smart Retry: AI-powered retry strategies for failed operations
The agent is designed for easy extension:
// Add custom tools
const customTool = tool({
name: 'custom_action',
description: 'Performs custom automation',
parameters: z.object({...}),
async execute(params) {
// Custom automation logic
}
});- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit changes:
git commit -m 'Add amazing feature' - Push to branch:
git push origin feature/amazing-feature - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for the powerful GPT-4 API
- Microsoft Playwright for robust browser automation
- The Open Source Community for inspiration and tools
Built with โค๏ธ for intelligent web automation
- Demo: Watch on X/Twitter
- Author: @aatish2393
- Repository: GitHub