Skip to content

Latest commit

 

History

History
255 lines (197 loc) · 7.47 KB

File metadata and controls

255 lines (197 loc) · 7.47 KB

Implementation Summary

Overview

Successfully implemented a complete Model Context Protocol (MCP) server for debugging and automating native Electron applications on macOS.

What Was Built

Core Components

  1. MCP Server (src/server.ts, src/index.ts)

    • Built using @modelcontextprotocol/sdk v1.21.0
    • Stdio transport for communication
    • 24 registered tools across 4 categories
  2. CDP Client (src/lib/cdp-client.ts)

    • Chrome DevTools Protocol client using chrome-remote-interface
    • Manages connections to Electron apps
    • Supports DOM inspection, JavaScript execution, screenshots
  3. Accessibility Manager (src/lib/accessibility.ts)

    • macOS UI automation using robotjs
    • Mouse control (move, click, drag)
    • Keyboard control (type, press keys)
    • Screenshot capture
  4. Permissions Manager (src/lib/permissions.ts)

    • Checks and manages macOS system permissions
    • Uses node-mac-permissions
    • Provides user-friendly setup instructions

Tool Categories

1. DOM Inspection Tools (7 tools)

  • list_electron_targets - Discover Electron windows
  • connect_to_electron_target - Establish CDP connection
  • get_dom_tree - Retrieve DOM structure
  • query_selector - Find elements by CSS selector
  • query_selector_all - Find all matching elements
  • execute_javascript - Run code in Electron context
  • take_electron_screenshot - Capture Electron window

2. UI Automation Tools (9 tools)

  • get_mouse_position - Get cursor position
  • move_mouse - Move cursor
  • click - Click (left/right/middle)
  • double_click - Double click
  • drag - Drag and drop
  • type_text - Type text
  • press_key - Press keys with modifiers
  • take_screenshot - Capture screen/region
  • get_screen_size - Get display dimensions

3. Global Hotkey Tools (4 tools)

  • trigger_hotkey - Custom keyboard shortcuts
  • trigger_common_macos_hotkey - System hotkeys (Spotlight, Mission Control, etc.)
  • simulate_app_shortcut - App-specific shortcuts
  • send_key_sequence - Complex key sequences

4. Permission Tools (4 tools)

  • check_permission - Check single permission
  • check_all_permissions - Check all permissions
  • get_permission_instructions - Get setup guide
  • request_permission - Request permission

Example Electron App

Created a test application (examples/test-electron-app/) with:

  • Interactive UI elements (buttons, inputs)
  • Event logging
  • Styled interface
  • Pre-configured for debugging (--inspect=9222)

Technical Stack

Dependencies

Core:

  • @modelcontextprotocol/sdk ^1.21.0 - MCP protocol implementation
  • zod ^3.23.8 - Schema validation

Electron Debugging:

  • chrome-remote-interface ^0.33.2 - CDP client

macOS Automation:

  • robotjs ^0.6.0 - Native UI automation
  • node-mac-permissions ^2.3.0 - Permission management

Development:

  • typescript ^5.7.2
  • tsx ^4.19.2
  • eslint ^8.57.1
  • prettier ^3.4.2

Architecture

electron-native-mcp/
├── src/
│   ├── index.ts              # Entry point (stdio transport)
│   ├── server.ts             # MCP server configuration
│   ├── types/
│   │   └── index.ts          # TypeScript type definitions
│   ├── lib/
│   │   ├── cdp-client.ts     # Chrome DevTools Protocol client
│   │   ├── accessibility.ts  # macOS UI automation
│   │   └── permissions.ts    # Permission management
│   └── tools/
│       ├── dom/              # DOM inspection tools
│       ├── ui/               # UI automation tools
│       ├── hotkey/           # Hotkey tools
│       └── permissions/      # Permission tools
├── examples/
│   └── test-electron-app/    # Test Electron application
├── dist/                     # Compiled JavaScript
├── package.json
├── tsconfig.json
├── README.md
├── USAGE.md
└── IMPLEMENTATION.md

Key Features

1. DOM Inspection via CDP

  • Connect to any Electron app running with --inspect flag
  • Query DOM using CSS selectors
  • Execute arbitrary JavaScript
  • Take screenshots of web content

2. Native UI Automation

  • Control mouse and keyboard
  • Works with any macOS application
  • Pixel-perfect positioning
  • Support for all mouse buttons and keyboard modifiers

3. Global Hotkeys

  • Trigger system-wide keyboard shortcuts
  • Pre-configured common macOS hotkeys
  • Custom key combinations
  • Complex key sequences

4. Permission Management

  • Check permission status
  • Request permissions programmatically
  • User-friendly setup instructions
  • Supports all required macOS permissions

Implementation Challenges & Solutions

Challenge 1: nut.js Unavailability

Problem: nut.js is no longer publicly available on npm
Solution: Switched to robotjs for UI automation

Challenge 2: MCP SDK API Changes

Problem: Tool registration API differs from documentation
Solution: Used registerTool() method with proper schema structure

Challenge 3: TypeScript Type Compatibility

Problem: ToolResult type didn't match MCP SDK expectations
Solution: Added index signature [x: string]: unknown to ToolResult interface

Challenge 4: Permission Management

Problem: macOS requires multiple system permissions
Solution: Created comprehensive permission manager with instructions

Testing

Manual Testing Steps

  1. Build the server:

    npm install
    npm run build
  2. Run the test Electron app:

    cd examples/test-electron-app
    npm install
    npm start
  3. Test DOM inspection:

    • List targets
    • Connect to test app
    • Query selectors
    • Execute JavaScript
  4. Test UI automation:

    • Check permissions
    • Move mouse
    • Click buttons
    • Type text
  5. Test hotkeys:

    • Trigger Spotlight
    • Trigger custom shortcuts

Future Enhancements

Potential Improvements

  1. Enhanced Image Recognition

    • Add OCR capabilities
    • Template matching for UI elements
  2. Recording & Playback

    • Record user actions
    • Replay automation sequences
  3. Better Error Handling

    • More descriptive error messages
    • Automatic retry logic
  4. Performance Optimization

    • Connection pooling for CDP
    • Caching of DOM queries
  5. Additional Platforms

    • Windows support
    • Linux support
  6. Advanced CDP Features

    • Network interception
    • Console log capture
    • Performance profiling

Known Limitations

  1. macOS Only: Currently only supports macOS due to permission system and robotjs
  2. RobotJS Dependencies: Requires native compilation (Xcode Command Line Tools)
  3. CDP Port: Assumes default port 9222 (configurable)
  4. Screenshot Format: Limited to PNG/JPEG for Electron, PNG for native
  5. Permission Prompts: Some permissions require manual user action

Conclusion

Successfully delivered a fully functional MCP server that meets all requirements:

✅ See native Electron app DOM (via CDP)
✅ Click buttons in native Electron app (via robotjs)
✅ Trigger global hotkeys on macOS (via robotjs)

The server provides 24 tools across 4 categories, comprehensive documentation, and a test application for validation.

Files Created

  • Core implementation: 13 TypeScript files
  • Configuration: 5 files (package.json, tsconfig.json, eslint, prettier, gitignore)
  • Documentation: 3 files (README.md, USAGE.md, IMPLEMENTATION.md)
  • Example app: 4 files (package.json, main.js, index.html, README.md)

Total: 25 files, ~2500 lines of code