|
| 1 | +# Direct Simulator/IDE Integration Strategy |
| 2 | + |
| 3 | +> Research and architecture notes for moving agentation-mobile beyond the web UI mirror approach to direct simulator and IDE integration. |
| 4 | +
|
| 5 | +## Problem Statement |
| 6 | + |
| 7 | +Currently, agentation-mobile's annotation workflow requires a separate browser tab running the web UI, which screen-mirrors the simulator device. The goal is to bring the annotation experience directly onto or next to the simulator/emulator, similar to how tools like RocketSim overlay on top of the iOS Simulator. |
| 8 | + |
| 9 | +## Competitive Landscape |
| 10 | + |
| 11 | +### Existing Tools (Single-Platform, No AI Agent Integration) |
| 12 | + |
| 13 | +| Tool | Platform | What It Does | AI Integration | |
| 14 | +|------|----------|-------------|----------------| |
| 15 | +| [RocketSim](https://www.rocketsim.app) | iOS/Xcode only | macOS companion app: floating panel next to Simulator with grids, rulers, color picker, Figma design overlay, network monitor, recording | None | |
| 16 | +| [UIInspector](https://forums.swift.org/t/uiinspector-runtime-ui-debugging-tool-for-ios/80109) | iOS only | In-app runtime overlay: dimension measurement, 3D hierarchy, color picker, property inspector, grid overlay | None | |
| 17 | +| [DebugSwift](https://github.com/DebugSwift/DebugSwift) | iOS only | In-app toolkit: grid overlay, 3D view hierarchy, touch indicators, SwiftUI render tracking | None | |
| 18 | +| [Android Layout Inspector](https://developer.android.com/studio/debug/layout-inspector) | Android only | Built into Android Studio: real-time view hierarchy, 3D mode, Compose recomposition tracking, property inspection | None | |
| 19 | +| [uhufor/inspector](https://github.com/uhufor/inspector) | Android only | Debug overlay library for visual layout element inspection | None | |
| 20 | +| [Agentation](https://agentation.dev/) (web) | Web only | Click elements, annotate with intent/severity, structured output for AI agents. MCP integration. React component detection. Sessions, webhooks | Full MCP | |
| 21 | + |
| 22 | +### Key Insight |
| 23 | + |
| 24 | +Every existing tool is locked to one platform. No tool combines: |
| 25 | +1. Runtime UI inspection (element trees, bounding boxes, component hierarchies) |
| 26 | +2. Structured annotation for AI agents (intent, severity, MCP tools, threaded conversations) |
| 27 | +3. Cross-platform support (Swift, Kotlin, React Native, Flutter) |
| 28 | + |
| 29 | +agentation-mobile is the only project at this intersection. |
| 30 | + |
| 31 | +### Agentation (Web) 2.0 Feature Parity |
| 32 | + |
| 33 | +Agentation 2.0 added: MCP integration, sessions, annotation schema with intent/severity, webhooks, React component detection. agentation-mobile already has all of this, plus multi-device sessions, native platform bridges, recording engine, SQLite persistence, SSE event streaming, and 18+ MCP tools. |
| 34 | + |
| 35 | +## What We Already Have |
| 36 | + |
| 37 | +### Bridge Architecture (Already Connects to Simulators) |
| 38 | + |
| 39 | +The bridges already integrate directly with simulators via native tooling: |
| 40 | + |
| 41 | +| Bridge | Connection Method | Element Source | Screenshot | Source Locations | |
| 42 | +|--------|------------------|---------------|------------|-----------------| |
| 43 | +| **Android** | ADB (`adb shell uiautomator dump`) | UIAutomator XML + SDK HTTP (port 4748) | `adb exec-out screencap -p` | SDK provides | |
| 44 | +| **iOS** | `xcrun simctl` | Accessibility API + SDK HTTP (port 4748) | `xcrun simctl io screenshot` | SDK provides | |
| 45 | +| **React Native** | CDP/Hermes WebSocket via Metro | React fiber tree + native fallback | Delegates to native | Fiber `_debugSource` | |
| 46 | +| **Flutter** | Dart VM Service WebSocket | Widget tree + render objects | Delegates to native | `creationLocation` in VM | |
| 47 | + |
| 48 | +All bridges implement the `IPlatformBridge` interface and use a dual data source approach: |
| 49 | +- **System API** (UIAutomator, Accessibility, CDP, VM Service) for broad coverage |
| 50 | +- **In-app SDK** (HTTP server on port 4748) for source locations and enriched data |
| 51 | +- Merged via spatial overlap matching (50%+ bounding box intersection) |
| 52 | + |
| 53 | +### In-App SDKs (Already Exist) |
| 54 | + |
| 55 | +| SDK | Status | What It Does | |
| 56 | +|-----|--------|-------------| |
| 57 | +| `sdks/react-native` | Functional | Fiber tree walker, `AgentationOverlay` component, `AgentationProvider` context, animation detection, webhook support | |
| 58 | +| `sdks/kotlin` | Functional | HTTP server on port 4748, element tree endpoint, hit-test endpoint, `ElementInspector` integration | |
| 59 | +| `sdks/swift` | Functional | HTTP server on port 4748 (BSD sockets + GCD), same endpoints as Kotlin SDK | |
| 60 | +| `sdks/flutter` | Placeholder | Minimal config package. Flutter integration relies on Dart VM Service inspector extensions | |
| 61 | + |
| 62 | +### The Gap |
| 63 | + |
| 64 | +The bridges and SDKs already do the hard work (element extraction, screenshots, source mapping). The web UI is just a visualization/interaction layer. The question is about providing a better UX for the annotation step itself. |
| 65 | + |
| 66 | +## Integration Paths |
| 67 | + |
| 68 | +### Path 1: macOS Companion App (RocketSim Approach) |
| 69 | + |
| 70 | +**Effort:** Medium | **Impact:** High | **Recommended: Yes (primary)** |
| 71 | + |
| 72 | +A native macOS SwiftUI app that overlays on/beside any simulator window. |
| 73 | + |
| 74 | +**Technical approach:** |
| 75 | +- Use `CGWindowListCopyWindowInfo` to find Simulator.app / Android Emulator windows by process name |
| 76 | +- Create a transparent `NSWindow` with `level = .floating` positioned relative to the simulator window |
| 77 | +- Track simulator window position/size changes in real-time (via accessibility observer or polling) |
| 78 | +- Render annotation UI (element highlighting, click-to-annotate, comment panel) on the overlay |
| 79 | +- Translate click coordinates on overlay to device coordinates using screen size ratio |
| 80 | +- Communicate with agentation-mobile server via HTTP (same as web UI) |
| 81 | + |
| 82 | +**Key macOS APIs:** |
| 83 | +- `CGWindowListCopyWindowInfo(_:_:)` - enumerate windows, get bounds |
| 84 | +- `NSWindow` with `.floating` level, `isOpaque = false`, transparent background |
| 85 | +- `AXObserver` / accessibility notifications for window move/resize tracking |
| 86 | +- `NSEvent.addGlobalMonitorForEvents` for click interception when needed |
| 87 | + |
| 88 | +**Why this is the RocketSim model:** |
| 89 | +RocketSim is NOT an Xcode extension - it's a standalone macOS app that detects the Simulator window and positions itself accordingly. This is the proven approach. |
| 90 | + |
| 91 | +**Pros:** |
| 92 | +- Works with ALL simulators (iOS Simulator, Android Emulator, any device mirror) |
| 93 | +- No IDE dependency |
| 94 | +- Unified experience across all platforms |
| 95 | +- Single codebase (Swift/SwiftUI) |
| 96 | + |
| 97 | +**Cons:** |
| 98 | +- macOS-only (acceptable since simulators run on macOS anyway) |
| 99 | +- Requires separate app install |
| 100 | +- Need to handle various simulator window configurations |
| 101 | + |
| 102 | +**Implementation structure:** |
| 103 | +``` |
| 104 | +apps/ |
| 105 | + macos-companion/ |
| 106 | + AgentationMobile/ |
| 107 | + App.swift # Main app entry |
| 108 | + SimulatorTracker.swift # CGWindowListCopyWindowInfo + window tracking |
| 109 | + OverlayWindow.swift # NSWindow overlay management |
| 110 | + AnnotationPanel.swift # Side panel UI (SwiftUI) |
| 111 | + ElementHighlighter.swift # Overlay rendering for element boundaries |
| 112 | + ServerClient.swift # HTTP client to agentation-mobile server |
| 113 | + CoordinateMapper.swift # Screen-to-device coordinate translation |
| 114 | +``` |
| 115 | + |
| 116 | +### Path 2: Enhanced In-App SDK Overlays (UIInspector/DebugSwift Approach) |
| 117 | + |
| 118 | +**Effort:** Low-Medium | **Impact:** Medium | **Recommended: Yes (parallel)** |
| 119 | + |
| 120 | +Enhance existing SDKs to render annotation overlays directly inside the running app. |
| 121 | + |
| 122 | +**Interaction model:** |
| 123 | +- Long-press or shake gesture activates inspection mode |
| 124 | +- Overlay highlights elements as finger moves, showing component name/hierarchy |
| 125 | +- Tap to select element, type comment |
| 126 | +- Annotation is sent to server and immediately available to AI agent via MCP |
| 127 | + |
| 128 | +**Per-platform work:** |
| 129 | + |
| 130 | +| SDK | Current State | Work Needed | |
| 131 | +|-----|--------------|-------------| |
| 132 | +| React Native | `AgentationOverlay` exists | Enhance with element highlighting on touch, annotation input UI, gesture activation | |
| 133 | +| Swift (iOS) | HTTP server only | Add `UIWindow`-level overlay with hit-testing, element highlight views, annotation sheet | |
| 134 | +| Kotlin (Android) | HTTP server only | Add `WindowManager` overlay or `FrameLayout` overlay with touch interception, element highlighting | |
| 135 | +| Flutter | Placeholder | Build Dart package with `Overlay` widget, element tree walker using `WidgetInspectorService`, annotation UI | |
| 136 | + |
| 137 | +**Pros:** |
| 138 | +- Cross-platform, works everywhere the app runs |
| 139 | +- Works on physical devices (not just simulators) |
| 140 | +- No additional macOS app needed |
| 141 | +- Developers already integrate the SDK |
| 142 | + |
| 143 | +**Cons:** |
| 144 | +- 4 separate overlay implementations to maintain |
| 145 | +- Flutter SDK needs to be built from scratch in Dart |
| 146 | +- Requires SDK integration in every app being tested |
| 147 | +- Cannot annotate native screens outside the app (splash screens, system dialogs) |
| 148 | + |
| 149 | +### Path 3: IDE Extensions |
| 150 | + |
| 151 | +**Effort:** High | **Impact:** High per IDE | **Recommended: Later phase** |
| 152 | + |
| 153 | +#### VS Code Extension (React Native + Flutter developers) |
| 154 | +- WebView-based side panel showing live device mirror + annotation tools |
| 155 | +- Tree view showing element hierarchy from bridge data |
| 156 | +- Click element in panel -> highlights on device, click in tree -> scrolls to source file |
| 157 | +- Uses VS Code's `registerWebviewViewProvider` API |
| 158 | +- Communicates with agentation-mobile server via HTTP |
| 159 | + |
| 160 | +#### Android Studio / IntelliJ Plugin |
| 161 | +- Custom `ToolWindow` with embedded JCEF (Chromium) WebView |
| 162 | +- Could integrate with existing Layout Inspector data |
| 163 | +- Show annotation panel alongside emulator panel |
| 164 | +- Uses [IntelliJ Platform Plugin SDK](https://plugins.jetbrains.com/docs/intellij/android-studio.html) |
| 165 | + |
| 166 | +#### Xcode |
| 167 | +- **Not feasible as extension.** XcodeKit Source Editor Extensions can only modify source text, not add UI panels |
| 168 | +- The macOS companion app (Path 1) IS the Xcode integration story |
| 169 | +- Could add "Open in Agentation" menu item via Source Editor Extension that launches companion app |
| 170 | + |
| 171 | +**Pros:** Deep IDE integration, feels native |
| 172 | +**Cons:** 3 separate implementations, high maintenance, Xcode barely supports it |
| 173 | + |
| 174 | +### Path 4: Enhanced Web UI with OS-Level Integration |
| 175 | + |
| 176 | +**Effort:** Low | **Impact:** Medium | **Recommended: Quick wins** |
| 177 | + |
| 178 | +Keep web UI but make it feel more native: |
| 179 | + |
| 180 | +- **Tauri/Electron wrapper**: `agentation-mobile open` launches frameless desktop window that can be positioned alongside simulator |
| 181 | +- **Always-on-top mode**: Browser window stays above other windows |
| 182 | +- **Deep links to IDE**: Click annotation -> opens `vscode://file/{path}:{line}` or `xed --line {line} {path}` (Xcode) |
| 183 | +- **OS notifications**: Desktop notifications when new annotation arrives while web UI is in background |
| 184 | +- **Keyboard shortcuts**: Global hotkeys to toggle annotation mode |
| 185 | + |
| 186 | +## Recommended Phased Approach |
| 187 | + |
| 188 | +### Phase 1: Quick Wins (Path 4) |
| 189 | +- Add deep link support (annotation -> IDE) to web UI |
| 190 | +- Add `agentation-mobile open` CLI command that launches web UI in default browser with specific window size |
| 191 | +- Test usability improvement |
| 192 | + |
| 193 | +### Phase 2: SDK Overlays (Path 2) |
| 194 | +- Enhance React Native `AgentationOverlay` with tap-to-annotate flow |
| 195 | +- Add visual overlay to Swift SDK (UIWindow-level) |
| 196 | +- Add visual overlay to Kotlin SDK (WindowManager) |
| 197 | +- Build Flutter SDK with Dart overlay |
| 198 | +- This gives direct-on-device annotation for all 4 platforms |
| 199 | + |
| 200 | +### Phase 3: macOS Companion App (Path 1) |
| 201 | +- Build SwiftUI macOS app |
| 202 | +- Implement simulator window tracking |
| 203 | +- Overlay annotation UI on top of any simulator |
| 204 | +- Distribute via Homebrew cask / direct download |
| 205 | +- This is the premium experience and the main differentiator |
| 206 | + |
| 207 | +### Phase 4: IDE Extensions (Path 3) |
| 208 | +- VS Code extension first (largest developer audience for RN + Flutter) |
| 209 | +- Android Studio plugin second |
| 210 | +- Xcode: handled by macOS companion app |
| 211 | + |
| 212 | +## Architecture Note |
| 213 | + |
| 214 | +The AI agent loop (MCP tools, event bus, server, store) remains unchanged regardless of which frontend is used. The annotation can come from: |
| 215 | +- Web UI click |
| 216 | +- SDK overlay tap |
| 217 | +- macOS companion app click |
| 218 | +- IDE extension interaction |
| 219 | +- Direct API call |
| 220 | + |
| 221 | +All paths converge on the same HTTP API -> EventBus -> MCP pipeline. This is a strength of the current architecture. |
| 222 | + |
| 223 | +## References |
| 224 | + |
| 225 | +- [RocketSim](https://www.rocketsim.app) - macOS companion app model |
| 226 | +- [RocketSim GitHub Issues](https://github.com/AvdLee/RocketSimApp/issues/102) - window positioning discussion |
| 227 | +- [CGWindowListCopyWindowInfo](https://developer.apple.com/documentation/coregraphics/1455137-cgwindowlistcopywindowinfo) - macOS window enumeration |
| 228 | +- [NSWindow](https://developer.apple.com/documentation/appkit/nswindow) - overlay window creation |
| 229 | +- [UIInspector](https://forums.swift.org/t/uiinspector-runtime-ui-debugging-tool-for-ios/80109) - in-app overlay model |
| 230 | +- [DebugSwift](https://github.com/DebugSwift/DebugSwift) - iOS debugging toolkit |
| 231 | +- [Android Layout Inspector](https://developer.android.com/studio/debug/layout-inspector) - Android Studio built-in |
| 232 | +- [Agentation 2.0](https://agentation.dev/blog/introducing-agentation-2) - web-only predecessor |
| 233 | +- [XcodeKit](https://developer.apple.com/documentation/xcodekit/creating-a-source-editor-extension) - limited Xcode extension API |
| 234 | +- [IntelliJ Platform Plugin SDK](https://plugins.jetbrains.com/docs/intellij/android-studio.html) - Android Studio plugins |
| 235 | +- [ADB UI hierarchy extraction](https://www.repeato.app/extracting-layout-and-view-information-via-adb/) - UIAutomator dump details |
0 commit comments