feat(agent): wire automate/ax_interact computer tools (3/8 of #3307)#3342
Conversation
Run enigo keyboard/mouse on the app main thread via a native-registry executor; enigo's macOS TSMGetInputSourceProperty traps off-thread and crashes the CEF host. Adds mouse/keyboard tools, the main_thread bridge, and downscaled screenshots so the model can see them. Slice 1/7 of tinyhumansai#3307 (was the 'computer control' area).
… loop Adds the Rust-internal automate engine (poll-until-stable settle, playback verification), the AXEnabled diagnostics field + settle primitives on ax_interact, the Music fast-path, and the Windows UIA superset. Exposes launch_platform as pub(crate) so the automate loop can launch apps mid-flow. Slice 2/7 of tinyhumansai#3307 (accessibility/automate engine).
…trator Registers the AutomateTool (multi-step UI flows in one call) and the ax_interact denylist/opt-in plumbing; adds the catalog toggle, tool definition, and orchestrator prompt guidance (automate + screenshot/ mouse/keyboard fallback for Electron apps with empty AX trees). Slice 3/7 of tinyhumansai#3307 (tool wiring + prompts).
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds a new ChangesApp Automation Tool Feature
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…tool-wiring # Conflicts: # docs/voice-automate-plan.md # src/openhuman/accessibility/app_fastpaths/fastpaths_tests.rs # src/openhuman/accessibility/app_fastpaths/music.rs # src/openhuman/accessibility/automate.rs # src/openhuman/tools/impl/browser/screenshot.rs # src/openhuman/tools/impl/computer/main_thread.rs
Take main's versions of the already-merged slice-1/slice-2 files (screenshot, main_thread, automate, music, fastpaths_tests, plan doc).
…ansai#3340 review) Per @oxoxDev: MouseTool/KeyboardTool inherited external_effect=false, so neither hit the ApprovalGate — PermissionLevel::Dangerous alone does NOT trigger it (the gate keys off external_effect_with_args). With computer_control.enabled, blind clicks / arbitrary keystrokes could run unattended on an auto-approved turn, with no sensitive-app denylist. - Override external_effect → true on both tools (gate every action). - Wrap the main-thread input executor in catch_unwind so an enigo FFI panic can't unwind across the app main thread. - Correct the user_filter.rs / ax_interact.rs comments that wrongly claimed Dangerous fires the gate. - Tests: assert both tools route through the gate.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@app/src/utils/toolDefinitions.ts`:
- Around line 50-53: The new user-visible literals in the ToolDefinition object
(displayName and description in toolDefinitions.ts) must be replaced with i18n
keys and resolved at render time via the app i18n hook; update the
ToolDefinition to use translation keys (e.g. "tools.appAutomation.displayName"
and "tools.appAutomation.description") instead of hard-coded strings, and ensure
any component that renders these fields calls useT() (or maps the definition
through t(def.displayName) / t(def.description)) so translations are applied at
render-time; edit the ToolDefinition entries for displayName and description and
update consuming UI code that reads these properties to call useT()/t(...)
before rendering.
In `@docs/voice-system-actions.md`:
- Around line 365-368: The fenced code block containing the stack trace
(enigo::macos::get_layoutdependent_keycode → TSMGetInputSourceProperty →
dispatch_assert_queue → _dispatch_assert_queue_fail → SIGTRAP) is unlabeled;
update that triple-backtick fence to include the language tag "text" so the
block becomes ```text and the snippet renders and lints correctly.
- Around line 352-372: The section currently claims both that the crash is fixed
("Keyboard/mouse now run on the app main thread" / "registered main-thread
synthetic-input executor") and that the blocker is unresolved ("THE BLOCKER —
OpenHuman-2026-06-03-170058.ips" / "must stay disabled"), so pick one canonical
status and remove the contradictory text: either mark Change 1.15 as fully fixed
(keep the MainThreadInputOp/run_input_on_main description, keep notes about
computer_control.enabled and added tools to orchestrator, and delete the "THE
BLOCKER" paragraph), or mark it as unresolved (remove or tone down the "crash
fixed" sentences and the claim about a registered main-thread executor, and keep
the blocker paragraph and instructions to keep keyboard/mouse disabled); ensure
references to MainThreadInputOp, run_input_on_main, enigo, and the
OpenHuman-2026-06-03-170058.ips trace remain consistent with the chosen state.
In `@src/openhuman/tools/impl/computer/automate.rs`:
- Around line 14-24: The AutomateTool implementation (AutomateTool struct and
its uses of is_sensitive_app, AutomateOptions, RealBackend) must be moved out of
the deprecated tools/impl location into the owning domain module’s tools.rs (or
a tools/ submodule) and then re-exported from the shared tools surface module;
specifically, move the AutomateTool code into the domain module’s tools.rs or
tools/*, update module declarations/imports there, and add a pub use in the
shared tools mod to expose AutomateTool instead of leaving new domain code under
src/openhuman/tools/impl/.
- Line 105: The log line logs the raw goal which may contain sensitive user
data; update the logging in the automate execution path (the log::info! call
that includes app and goal) to avoid printing goal verbatim—instead compute a
redacted or sanitized representation (for example a truncated summary, hashed
value, or placeholder) and log that (reference the log::info! invocation and the
goal variable), keeping app as before but replacing goal with the redacted_goal
value when calling log::info!.
- Around line 78-80: The tool currently always returns true from
external_effect(), causing the harness to prompt approvals even when
execute_with_options() will refuse (disabled mutations, missing args, or
denylisted apps); update the tool to avoid dead-end approvals by either (A)
implementing external_effect_with_args(&self, args: &ArgsType) to mirror the
checks in execute_with_options() and return true only when those checks pass, or
(B) make external_effect() perform the same quick pre-checks (disabled
mutations, required args present, app not denylisted) and return false when the
request will be refused; reference the external_effect /
external_effect_with_args methods and the execute_with_options (and execute)
logic to ensure the pre-checks are identical to the refusal conditions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 99e92adf-7e65-4358-8273-46e825b509c1
⛔ Files ignored due to path filters (1)
app/src-tauri/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
app/src-tauri/src/lib.rsapp/src/utils/toolDefinitions.tsdocs/voice-system-actions.mdsrc/openhuman/agent_registry/agents/orchestrator/agent.tomlsrc/openhuman/agent_registry/agents/orchestrator/prompt.mdsrc/openhuman/tools/impl/computer/automate.rssrc/openhuman/tools/impl/computer/ax_interact.rssrc/openhuman/tools/impl/computer/keyboard.rssrc/openhuman/tools/impl/computer/keyboard_tests.rssrc/openhuman/tools/impl/computer/mod.rssrc/openhuman/tools/impl/computer/mouse.rssrc/openhuman/tools/impl/computer/mouse_tests.rssrc/openhuman/tools/ops.rssrc/openhuman/tools/user_filter.rs
Remove the stale 'THE BLOCKER / fix not yet done / keep disabled' paragraph that contradicted the '✅ crash fixed' status; reframe as root-cause-now-fixed and note the catch_unwind guard. Add the `text` language to the trace fence.
Independent review (beyond the CodeRabbit pass)Reviewed the tool-wiring slice — Security fix verified
Reviewed clean / noted
No further correctness issues. LGTM once CI is green. |
It isn't on main and its 'mouse/keyboard run without an approval prompt' text contradicts the tinyhumansai#3342 ApprovalGate fix. Keep tinyhumansai#3344 a clean UI slice; a corrected desktop-control prompt can land as its own follow-up.
…ch slice Same stale 'runs without an approval prompt' section as tinyhumansai#3344 — not on main, contradicts the tinyhumansai#3342 ApprovalGate fix. Tracked for a corrected follow-up.
…lGate Ship the orchestrator's desktop-control playbook (carried from the original voice work) with the gating line corrected: mouse/keyboard now route through the ApprovalGate (tinyhumansai#3342), so the prompt no longer claims they 'run without an approval prompt'. Resolves the deferred follow-up.
Summary
Slice 3/8 of #3307 — wire the automate/ax_interact computer tools into the orchestrator.
AutomateTool(multi-step UI flows in one call) and theax_interactdenylist/opt-in plumbing.user_filter), tool definition, and orchestrator prompt guidance (automate + screenshot/mouse/keyboard fallback for Electron apps with empty AX trees).Files (9)
tools/impl/computer/{automate,ax_interact,mod}.rs,tools/ops.rs,tools/user_filter.rs,app/src/utils/toolDefinitions.ts,agent_registry/agents/orchestrator/{agent.toml,prompt.md},docs/voice-system-actions.md.Summary by CodeRabbit
New Features
Bug Fixes
Documentation