Skip to content

Commit e28cd73

Browse files
web-flowclaude
andcommitted
Align with Agentation: SQLite persistence, event naming, session lifecycle, env vars, and 12-gap implementation
- Add SQLite store (IStore interface, SqliteStore, createStore factory) with WAL mode and graceful memory fallback - Rename event types from colon to dot notation (annotation.created, session.updated, etc.) - Add session status lifecycle (active/approved/closed) with PATCH endpoint - Add thread message IDs and toAFS() interop conversion - Add env var support (AGENTATION_MOBILE_STORE, _PORT, _WEBHOOK_URL, _WEBHOOKS, _WEBHOOK_SECRET, _EVENT_RETENTION_DAYS) - Add dismiss with reason (required reason param adds thread message before dismissing) - Enhance export detail levels (compact/standard/detailed/forensic) with ComponentDetectionMode - Add SSE filtering by deviceId and platform query params - Add webhookUrl support to React Native SDK - Add startAll() convenience function for programmatic server+MCP startup - Enhance CLI init with MCP auto-registration, add doctor diagnostics command - Generate JSON Schema (Draft 2020-12) from Zod schemas with build:schema script - Add direct simulator integration research doc - Update tests for new event names and dismiss reason requirement Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent dcd7bb8 commit e28cd73

32 files changed

Lines changed: 2414 additions & 318 deletions

biome.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@
4242
".turbo",
4343
"sdks/swift/.build",
4444
"sdks/kotlin/.gradle",
45-
"sdks/kotlin/build"
45+
"sdks/kotlin/build",
46+
"schema"
4647
]
4748
}
4849
}
Lines changed: 235 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,235 @@
1+
# Direct Simulator/IDE Integration Strategy
2+
3+
> Research and architecture notes for moving agentation-mobile beyond the web UI mirror approach to direct simulator and IDE integration.
4+
5+
## Problem Statement
6+
7+
Currently, agentation-mobile's annotation workflow requires a separate browser tab running the web UI, which screen-mirrors the simulator device. The goal is to bring the annotation experience directly onto or next to the simulator/emulator, similar to how tools like RocketSim overlay on top of the iOS Simulator.
8+
9+
## Competitive Landscape
10+
11+
### Existing Tools (Single-Platform, No AI Agent Integration)
12+
13+
| Tool | Platform | What It Does | AI Integration |
14+
|------|----------|-------------|----------------|
15+
| [RocketSim](https://www.rocketsim.app) | iOS/Xcode only | macOS companion app: floating panel next to Simulator with grids, rulers, color picker, Figma design overlay, network monitor, recording | None |
16+
| [UIInspector](https://forums.swift.org/t/uiinspector-runtime-ui-debugging-tool-for-ios/80109) | iOS only | In-app runtime overlay: dimension measurement, 3D hierarchy, color picker, property inspector, grid overlay | None |
17+
| [DebugSwift](https://github.com/DebugSwift/DebugSwift) | iOS only | In-app toolkit: grid overlay, 3D view hierarchy, touch indicators, SwiftUI render tracking | None |
18+
| [Android Layout Inspector](https://developer.android.com/studio/debug/layout-inspector) | Android only | Built into Android Studio: real-time view hierarchy, 3D mode, Compose recomposition tracking, property inspection | None |
19+
| [uhufor/inspector](https://github.com/uhufor/inspector) | Android only | Debug overlay library for visual layout element inspection | None |
20+
| [Agentation](https://agentation.dev/) (web) | Web only | Click elements, annotate with intent/severity, structured output for AI agents. MCP integration. React component detection. Sessions, webhooks | Full MCP |
21+
22+
### Key Insight
23+
24+
Every existing tool is locked to one platform. No tool combines:
25+
1. Runtime UI inspection (element trees, bounding boxes, component hierarchies)
26+
2. Structured annotation for AI agents (intent, severity, MCP tools, threaded conversations)
27+
3. Cross-platform support (Swift, Kotlin, React Native, Flutter)
28+
29+
agentation-mobile is the only project at this intersection.
30+
31+
### Agentation (Web) 2.0 Feature Parity
32+
33+
Agentation 2.0 added: MCP integration, sessions, annotation schema with intent/severity, webhooks, React component detection. agentation-mobile already has all of this, plus multi-device sessions, native platform bridges, recording engine, SQLite persistence, SSE event streaming, and 18+ MCP tools.
34+
35+
## What We Already Have
36+
37+
### Bridge Architecture (Already Connects to Simulators)
38+
39+
The bridges already integrate directly with simulators via native tooling:
40+
41+
| Bridge | Connection Method | Element Source | Screenshot | Source Locations |
42+
|--------|------------------|---------------|------------|-----------------|
43+
| **Android** | ADB (`adb shell uiautomator dump`) | UIAutomator XML + SDK HTTP (port 4748) | `adb exec-out screencap -p` | SDK provides |
44+
| **iOS** | `xcrun simctl` | Accessibility API + SDK HTTP (port 4748) | `xcrun simctl io screenshot` | SDK provides |
45+
| **React Native** | CDP/Hermes WebSocket via Metro | React fiber tree + native fallback | Delegates to native | Fiber `_debugSource` |
46+
| **Flutter** | Dart VM Service WebSocket | Widget tree + render objects | Delegates to native | `creationLocation` in VM |
47+
48+
All bridges implement the `IPlatformBridge` interface and use a dual data source approach:
49+
- **System API** (UIAutomator, Accessibility, CDP, VM Service) for broad coverage
50+
- **In-app SDK** (HTTP server on port 4748) for source locations and enriched data
51+
- Merged via spatial overlap matching (50%+ bounding box intersection)
52+
53+
### In-App SDKs (Already Exist)
54+
55+
| SDK | Status | What It Does |
56+
|-----|--------|-------------|
57+
| `sdks/react-native` | Functional | Fiber tree walker, `AgentationOverlay` component, `AgentationProvider` context, animation detection, webhook support |
58+
| `sdks/kotlin` | Functional | HTTP server on port 4748, element tree endpoint, hit-test endpoint, `ElementInspector` integration |
59+
| `sdks/swift` | Functional | HTTP server on port 4748 (BSD sockets + GCD), same endpoints as Kotlin SDK |
60+
| `sdks/flutter` | Placeholder | Minimal config package. Flutter integration relies on Dart VM Service inspector extensions |
61+
62+
### The Gap
63+
64+
The bridges and SDKs already do the hard work (element extraction, screenshots, source mapping). The web UI is just a visualization/interaction layer. The question is about providing a better UX for the annotation step itself.
65+
66+
## Integration Paths
67+
68+
### Path 1: macOS Companion App (RocketSim Approach)
69+
70+
**Effort:** Medium | **Impact:** High | **Recommended: Yes (primary)**
71+
72+
A native macOS SwiftUI app that overlays on/beside any simulator window.
73+
74+
**Technical approach:**
75+
- Use `CGWindowListCopyWindowInfo` to find Simulator.app / Android Emulator windows by process name
76+
- Create a transparent `NSWindow` with `level = .floating` positioned relative to the simulator window
77+
- Track simulator window position/size changes in real-time (via accessibility observer or polling)
78+
- Render annotation UI (element highlighting, click-to-annotate, comment panel) on the overlay
79+
- Translate click coordinates on overlay to device coordinates using screen size ratio
80+
- Communicate with agentation-mobile server via HTTP (same as web UI)
81+
82+
**Key macOS APIs:**
83+
- `CGWindowListCopyWindowInfo(_:_:)` - enumerate windows, get bounds
84+
- `NSWindow` with `.floating` level, `isOpaque = false`, transparent background
85+
- `AXObserver` / accessibility notifications for window move/resize tracking
86+
- `NSEvent.addGlobalMonitorForEvents` for click interception when needed
87+
88+
**Why this is the RocketSim model:**
89+
RocketSim is NOT an Xcode extension - it's a standalone macOS app that detects the Simulator window and positions itself accordingly. This is the proven approach.
90+
91+
**Pros:**
92+
- Works with ALL simulators (iOS Simulator, Android Emulator, any device mirror)
93+
- No IDE dependency
94+
- Unified experience across all platforms
95+
- Single codebase (Swift/SwiftUI)
96+
97+
**Cons:**
98+
- macOS-only (acceptable since simulators run on macOS anyway)
99+
- Requires separate app install
100+
- Need to handle various simulator window configurations
101+
102+
**Implementation structure:**
103+
```
104+
apps/
105+
macos-companion/
106+
AgentationMobile/
107+
App.swift # Main app entry
108+
SimulatorTracker.swift # CGWindowListCopyWindowInfo + window tracking
109+
OverlayWindow.swift # NSWindow overlay management
110+
AnnotationPanel.swift # Side panel UI (SwiftUI)
111+
ElementHighlighter.swift # Overlay rendering for element boundaries
112+
ServerClient.swift # HTTP client to agentation-mobile server
113+
CoordinateMapper.swift # Screen-to-device coordinate translation
114+
```
115+
116+
### Path 2: Enhanced In-App SDK Overlays (UIInspector/DebugSwift Approach)
117+
118+
**Effort:** Low-Medium | **Impact:** Medium | **Recommended: Yes (parallel)**
119+
120+
Enhance existing SDKs to render annotation overlays directly inside the running app.
121+
122+
**Interaction model:**
123+
- Long-press or shake gesture activates inspection mode
124+
- Overlay highlights elements as finger moves, showing component name/hierarchy
125+
- Tap to select element, type comment
126+
- Annotation is sent to server and immediately available to AI agent via MCP
127+
128+
**Per-platform work:**
129+
130+
| SDK | Current State | Work Needed |
131+
|-----|--------------|-------------|
132+
| React Native | `AgentationOverlay` exists | Enhance with element highlighting on touch, annotation input UI, gesture activation |
133+
| Swift (iOS) | HTTP server only | Add `UIWindow`-level overlay with hit-testing, element highlight views, annotation sheet |
134+
| Kotlin (Android) | HTTP server only | Add `WindowManager` overlay or `FrameLayout` overlay with touch interception, element highlighting |
135+
| Flutter | Placeholder | Build Dart package with `Overlay` widget, element tree walker using `WidgetInspectorService`, annotation UI |
136+
137+
**Pros:**
138+
- Cross-platform, works everywhere the app runs
139+
- Works on physical devices (not just simulators)
140+
- No additional macOS app needed
141+
- Developers already integrate the SDK
142+
143+
**Cons:**
144+
- 4 separate overlay implementations to maintain
145+
- Flutter SDK needs to be built from scratch in Dart
146+
- Requires SDK integration in every app being tested
147+
- Cannot annotate native screens outside the app (splash screens, system dialogs)
148+
149+
### Path 3: IDE Extensions
150+
151+
**Effort:** High | **Impact:** High per IDE | **Recommended: Later phase**
152+
153+
#### VS Code Extension (React Native + Flutter developers)
154+
- WebView-based side panel showing live device mirror + annotation tools
155+
- Tree view showing element hierarchy from bridge data
156+
- Click element in panel -> highlights on device, click in tree -> scrolls to source file
157+
- Uses VS Code's `registerWebviewViewProvider` API
158+
- Communicates with agentation-mobile server via HTTP
159+
160+
#### Android Studio / IntelliJ Plugin
161+
- Custom `ToolWindow` with embedded JCEF (Chromium) WebView
162+
- Could integrate with existing Layout Inspector data
163+
- Show annotation panel alongside emulator panel
164+
- Uses [IntelliJ Platform Plugin SDK](https://plugins.jetbrains.com/docs/intellij/android-studio.html)
165+
166+
#### Xcode
167+
- **Not feasible as extension.** XcodeKit Source Editor Extensions can only modify source text, not add UI panels
168+
- The macOS companion app (Path 1) IS the Xcode integration story
169+
- Could add "Open in Agentation" menu item via Source Editor Extension that launches companion app
170+
171+
**Pros:** Deep IDE integration, feels native
172+
**Cons:** 3 separate implementations, high maintenance, Xcode barely supports it
173+
174+
### Path 4: Enhanced Web UI with OS-Level Integration
175+
176+
**Effort:** Low | **Impact:** Medium | **Recommended: Quick wins**
177+
178+
Keep web UI but make it feel more native:
179+
180+
- **Tauri/Electron wrapper**: `agentation-mobile open` launches frameless desktop window that can be positioned alongside simulator
181+
- **Always-on-top mode**: Browser window stays above other windows
182+
- **Deep links to IDE**: Click annotation -> opens `vscode://file/{path}:{line}` or `xed --line {line} {path}` (Xcode)
183+
- **OS notifications**: Desktop notifications when new annotation arrives while web UI is in background
184+
- **Keyboard shortcuts**: Global hotkeys to toggle annotation mode
185+
186+
## Recommended Phased Approach
187+
188+
### Phase 1: Quick Wins (Path 4)
189+
- Add deep link support (annotation -> IDE) to web UI
190+
- Add `agentation-mobile open` CLI command that launches web UI in default browser with specific window size
191+
- Test usability improvement
192+
193+
### Phase 2: SDK Overlays (Path 2)
194+
- Enhance React Native `AgentationOverlay` with tap-to-annotate flow
195+
- Add visual overlay to Swift SDK (UIWindow-level)
196+
- Add visual overlay to Kotlin SDK (WindowManager)
197+
- Build Flutter SDK with Dart overlay
198+
- This gives direct-on-device annotation for all 4 platforms
199+
200+
### Phase 3: macOS Companion App (Path 1)
201+
- Build SwiftUI macOS app
202+
- Implement simulator window tracking
203+
- Overlay annotation UI on top of any simulator
204+
- Distribute via Homebrew cask / direct download
205+
- This is the premium experience and the main differentiator
206+
207+
### Phase 4: IDE Extensions (Path 3)
208+
- VS Code extension first (largest developer audience for RN + Flutter)
209+
- Android Studio plugin second
210+
- Xcode: handled by macOS companion app
211+
212+
## Architecture Note
213+
214+
The AI agent loop (MCP tools, event bus, server, store) remains unchanged regardless of which frontend is used. The annotation can come from:
215+
- Web UI click
216+
- SDK overlay tap
217+
- macOS companion app click
218+
- IDE extension interaction
219+
- Direct API call
220+
221+
All paths converge on the same HTTP API -> EventBus -> MCP pipeline. This is a strength of the current architecture.
222+
223+
## References
224+
225+
- [RocketSim](https://www.rocketsim.app) - macOS companion app model
226+
- [RocketSim GitHub Issues](https://github.com/AvdLee/RocketSimApp/issues/102) - window positioning discussion
227+
- [CGWindowListCopyWindowInfo](https://developer.apple.com/documentation/coregraphics/1455137-cgwindowlistcopywindowinfo) - macOS window enumeration
228+
- [NSWindow](https://developer.apple.com/documentation/appkit/nswindow) - overlay window creation
229+
- [UIInspector](https://forums.swift.org/t/uiinspector-runtime-ui-debugging-tool-for-ios/80109) - in-app overlay model
230+
- [DebugSwift](https://github.com/DebugSwift/DebugSwift) - iOS debugging toolkit
231+
- [Android Layout Inspector](https://developer.android.com/studio/debug/layout-inspector) - Android Studio built-in
232+
- [Agentation 2.0](https://agentation.dev/blog/introducing-agentation-2) - web-only predecessor
233+
- [XcodeKit](https://developer.apple.com/documentation/xcodekit/creating-a-source-editor-extension) - limited Xcode extension API
234+
- [IntelliJ Platform Plugin SDK](https://plugins.jetbrains.com/docs/intellij/android-studio.html) - Android Studio plugins
235+
- [ADB UI hierarchy extraction](https://www.repeato.app/extracting-layout-and-view-information-via-adb/) - UIAutomator dump details

0 commit comments

Comments
 (0)