Skip to content

feat(agent): wire automate/ax_interact computer tools (3/8 of #3307)#3342

Merged
M3gA-Mind merged 7 commits into
tinyhumansai:mainfrom
M3gA-Mind:feat/voice-split-3-tool-wiring
Jun 4, 2026
Merged

feat(agent): wire automate/ax_interact computer tools (3/8 of #3307)#3342
M3gA-Mind merged 7 commits into
tinyhumansai:mainfrom
M3gA-Mind:feat/voice-split-3-tool-wiring

Conversation

@M3gA-Mind

@M3gA-Mind M3gA-Mind commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Slice 3/8 of #3307wire the automate/ax_interact computer tools into the orchestrator.

  • Registers AutomateTool (multi-step UI flows in one call) and the ax_interact denylist/opt-in plumbing.
  • Adds the catalog toggle (user_filter), tool definition, and orchestrator prompt guidance (automate + screenshot/mouse/keyboard fallback for Electron apps with empty AX trees).

Files (9)

tools/impl/computer/{automate,ax_interact,mod}.rs, tools/ops.rs, tools/user_filter.rs, app/src/utils/toolDefinitions.ts, agent_registry/agents/orchestrator/{agent.toml,prompt.md}, docs/voice-system-actions.md.

Part of the #3307 split. PR 3307 (72 files) is being replaced by 7 small, dependency-ordered PRs (merge-train). Branches are stacked; each PR's true slice is shown once its predecessors merge and it is rebased onto main.
Stacked on slice 2 (#3341).

Summary by CodeRabbit

  • New Features

    • Added App Automation (multi-step UI automation) and a new "App Automation" Settings toggle.
    • Keyboard and mouse actions now route through the approval gate for safer external-effect handling.
  • Bug Fixes

    • Hardened synthetic input/main-thread handling to prevent crashes and improve stability.
    • Improved main-thread dispatch and safety for automation-related actions.
  • Documentation

    • Expanded voice-system and automation docs with implementation status, milestones, and fine-tuning backlog.

M3gA-Mind added 3 commits June 4, 2026 14:09
Run enigo keyboard/mouse on the app main thread via a native-registry
executor; enigo's macOS TSMGetInputSourceProperty traps off-thread and
crashes the CEF host. Adds mouse/keyboard tools, the main_thread bridge,
and downscaled screenshots so the model can see them.

Slice 1/7 of tinyhumansai#3307 (was the 'computer control' area).
… loop

Adds the Rust-internal automate engine (poll-until-stable settle, playback
verification), the AXEnabled diagnostics field + settle primitives on
ax_interact, the Music fast-path, and the Windows UIA superset. Exposes
launch_platform as pub(crate) so the automate loop can launch apps mid-flow.

Slice 2/7 of tinyhumansai#3307 (accessibility/automate engine).
…trator

Registers the AutomateTool (multi-step UI flows in one call) and the
ax_interact denylist/opt-in plumbing; adds the catalog toggle, tool
definition, and orchestrator prompt guidance (automate + screenshot/
mouse/keyboard fallback for Electron apps with empty AX trees).

Slice 3/7 of tinyhumansai#3307 (tool wiring + prompts).
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5d2d8f93-53be-4cd4-a9a3-961702fbe033

📥 Commits

Reviewing files that changed from the base of the PR and between 52dc7c4 and 09263fc.

📒 Files selected for processing (1)
  • docs/voice-system-actions.md
✅ Files skipped from review due to trivial changes (1)
  • docs/voice-system-actions.md

📝 Walkthrough

Walkthrough

Adds a new automate tool for multi-step UI automation with safety gates, panic-hardening for main-thread synthetic-input dispatch, approval-gating changes for keyboard/mouse, tool registry and frontend wiring, orchestrator prompt updates, and expanded documentation and tests.

Changes

App Automation Tool Feature

Layer / File(s) Summary
Main-thread dispatch panic safety
app/src-tauri/src/lib.rs
The synthetic-input queued closure is wrapped in std::panic::catch_unwind to prevent FFI panics from unwinding across the app main thread; panics are converted to clean error results sent via oneshot channel.
AutomateTool core implementation and safety gates
src/openhuman/tools/impl/computer/automate.rs
New AutomateTool with allow_mutations implements Tool; validates app and goal, enforces denylist and mutation gating, loads config and runs the automation backend, and returns results with a "Steps:" summary. Includes unit tests for identity, schema, gating, and denylist behavior.
Shared sensitive-app boundary
src/openhuman/tools/impl/computer/ax_interact.rs
Makes is_sensitive_app pub(crate) for reuse and updates mutation-disabled refusal messaging to guide enabling "App UI Control / App Automation" in Settings.
Approval gating for input tools
src/openhuman/tools/impl/computer/keyboard.rs, src/openhuman/tools/impl/computer/mouse.rs, src/openhuman/tools/impl/computer/*_tests.rs
KeyboardTool and MouseTool now declare external_effect() -> true, routing actions through the ApprovalGate via external_effect_with_args; tests verify this behavior.
Tool module wiring and registration
src/openhuman/tools/impl/computer/mod.rs, src/openhuman/tools/ops.rs, src/openhuman/tools/user_filter.rs
Adds automate submodule and re-export, registers AutomateTool in all_tools_with_runtime (configured with ax_interact_mutations), and adds automate to TOOL_FAMILIES with default_enabled: true.
Frontend UI and tool catalog
app/src/utils/toolDefinitions.ts
Adds automate to TOOL_CATALOG as "App Automation", category System, defaultEnabled: true, mapped to Rust tool automate.
Orchestrator agent configuration and guidance
src/openhuman/agent_registry/agents/orchestrator/agent.toml, src/openhuman/agent_registry/agents/orchestrator/prompt.md
Adds automate to orchestrator's named tools and expands prompt with keyboard-first desktop control guidance, app-focus management, Electron/Chromium handling, and a Slack keyboard-navigation example.
Documentation of automation features and status
docs/voice-system-actions.md
Documents automate(app, goal) internal loop and milestones (Change 1.14), main-thread dispatch fix and related notes (Change 1.15), marks Always-On Listening implemented, adds fine-tuning backlog, and updates checklist statuses.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • tinyhumansai/openhuman#3341: Related to the accessibility::automate::run engine and automation fast-paths that the new automate tool invokes.
  • tinyhumansai/openhuman#3340: Related to the synthetic-input main-thread executor; both PRs modify main-thread dispatch behavior.
  • tinyhumansai/openhuman#3118: Related to TOOL_FAMILIES retention and default-enabled tool family handling, which intersects with adding automate as default-enabled.

Suggested reviewers

  • graycyrus
  • senamakel

Poem

🐰 I hopped through code to plant a tool,
A gentle gate to keep things cool,
Keypresses, clicks—now checked and neat,
A careful loop to guide each feat,
Automation hums, the rabbit's pleased.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: wiring the automate and ax_interact computer tools into the orchestrator as part of a stacked PR series.
Docstring Coverage ✅ Passed Docstring coverage is 90.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@M3gA-Mind

Copy link
Copy Markdown
Collaborator Author

📚 Stacked PR series (8 total) — split from #3307

Merge bottom-up; each builds on the one above it:

  1. feat(computer): main-thread synthetic-input executor + CEF crash fix (1/8 of #3307) #3340 — main-thread synthetic-input executor + CEF crash fix
  2. feat(accessibility): AX/UIA perception + automate engine (2/8 of #3307) #3341 — AX/UIA perception + automate engine
  3. feat(agent): wire automate/ax_interact computer tools (3/8 of #3307) #3342 — wire automate/ax_interact computer tools
  4. feat(voice): Phase 2 always-on listening engine + RPC (4/8 of #3307) #3343 — Phase 2 always-on listening engine + RPC
  5. feat(voice): always-on Settings toggle + debug panel + i18n (5/8 of #3307) #3344 — always-on Settings toggle + debug panel + i18n
  6. feat(notch): always-visible macOS notch status pill (6/8 of #3307) #3345 — always-visible macOS notch status pill
  7. feat(voice): Phase 3 fast command router (7/8 of #3307) #3346 — Phase 3 fast command router
  8. feat(accessibility): vision-click fallback for Electron/partial-AX apps (8/8 of #3307) #3362 — vision-click fallback for Electron/partial-AX apps (Phase 1.5 complete)

Tracker: docs/voice-system-actions.md.

M3gA-Mind added 3 commits June 4, 2026 18:04
…tool-wiring

# Conflicts:
#	docs/voice-automate-plan.md
#	src/openhuman/accessibility/app_fastpaths/fastpaths_tests.rs
#	src/openhuman/accessibility/app_fastpaths/music.rs
#	src/openhuman/accessibility/automate.rs
#	src/openhuman/tools/impl/browser/screenshot.rs
#	src/openhuman/tools/impl/computer/main_thread.rs
Take main's versions of the already-merged slice-1/slice-2 files
(screenshot, main_thread, automate, music, fastpaths_tests, plan doc).
…ansai#3340 review)

Per @oxoxDev: MouseTool/KeyboardTool inherited external_effect=false, so
neither hit the ApprovalGate — PermissionLevel::Dangerous alone does NOT
trigger it (the gate keys off external_effect_with_args). With
computer_control.enabled, blind clicks / arbitrary keystrokes could run
unattended on an auto-approved turn, with no sensitive-app denylist.

- Override external_effect → true on both tools (gate every action).
- Wrap the main-thread input executor in catch_unwind so an enigo FFI
  panic can't unwind across the app main thread.
- Correct the user_filter.rs / ax_interact.rs comments that wrongly
  claimed Dangerous fires the gate.
- Tests: assert both tools route through the gate.
@M3gA-Mind M3gA-Mind marked this pull request as ready for review June 4, 2026 12:47
@M3gA-Mind M3gA-Mind requested a review from a team June 4, 2026 12:47
@coderabbitai coderabbitai Bot added feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. labels Jun 4, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@app/src/utils/toolDefinitions.ts`:
- Around line 50-53: The new user-visible literals in the ToolDefinition object
(displayName and description in toolDefinitions.ts) must be replaced with i18n
keys and resolved at render time via the app i18n hook; update the
ToolDefinition to use translation keys (e.g. "tools.appAutomation.displayName"
and "tools.appAutomation.description") instead of hard-coded strings, and ensure
any component that renders these fields calls useT() (or maps the definition
through t(def.displayName) / t(def.description)) so translations are applied at
render-time; edit the ToolDefinition entries for displayName and description and
update consuming UI code that reads these properties to call useT()/t(...)
before rendering.

In `@docs/voice-system-actions.md`:
- Around line 365-368: The fenced code block containing the stack trace
(enigo::macos::get_layoutdependent_keycode → TSMGetInputSourceProperty →
dispatch_assert_queue → _dispatch_assert_queue_fail → SIGTRAP) is unlabeled;
update that triple-backtick fence to include the language tag "text" so the
block becomes ```text and the snippet renders and lints correctly.
- Around line 352-372: The section currently claims both that the crash is fixed
("Keyboard/mouse now run on the app main thread" / "registered main-thread
synthetic-input executor") and that the blocker is unresolved ("THE BLOCKER —
OpenHuman-2026-06-03-170058.ips" / "must stay disabled"), so pick one canonical
status and remove the contradictory text: either mark Change 1.15 as fully fixed
(keep the MainThreadInputOp/run_input_on_main description, keep notes about
computer_control.enabled and added tools to orchestrator, and delete the "THE
BLOCKER" paragraph), or mark it as unresolved (remove or tone down the "crash
fixed" sentences and the claim about a registered main-thread executor, and keep
the blocker paragraph and instructions to keep keyboard/mouse disabled); ensure
references to MainThreadInputOp, run_input_on_main, enigo, and the
OpenHuman-2026-06-03-170058.ips trace remain consistent with the chosen state.

In `@src/openhuman/tools/impl/computer/automate.rs`:
- Around line 14-24: The AutomateTool implementation (AutomateTool struct and
its uses of is_sensitive_app, AutomateOptions, RealBackend) must be moved out of
the deprecated tools/impl location into the owning domain module’s tools.rs (or
a tools/ submodule) and then re-exported from the shared tools surface module;
specifically, move the AutomateTool code into the domain module’s tools.rs or
tools/*, update module declarations/imports there, and add a pub use in the
shared tools mod to expose AutomateTool instead of leaving new domain code under
src/openhuman/tools/impl/.
- Line 105: The log line logs the raw goal which may contain sensitive user
data; update the logging in the automate execution path (the log::info! call
that includes app and goal) to avoid printing goal verbatim—instead compute a
redacted or sanitized representation (for example a truncated summary, hashed
value, or placeholder) and log that (reference the log::info! invocation and the
goal variable), keeping app as before but replacing goal with the redacted_goal
value when calling log::info!.
- Around line 78-80: The tool currently always returns true from
external_effect(), causing the harness to prompt approvals even when
execute_with_options() will refuse (disabled mutations, missing args, or
denylisted apps); update the tool to avoid dead-end approvals by either (A)
implementing external_effect_with_args(&self, args: &ArgsType) to mirror the
checks in execute_with_options() and return true only when those checks pass, or
(B) make external_effect() perform the same quick pre-checks (disabled
mutations, required args present, app not denylisted) and return false when the
request will be refused; reference the external_effect /
external_effect_with_args methods and the execute_with_options (and execute)
logic to ensure the pre-checks are identical to the refusal conditions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 99e92adf-7e65-4358-8273-46e825b509c1

📥 Commits

Reviewing files that changed from the base of the PR and between 7c08704 and 52dc7c4.

⛔ Files ignored due to path filters (1)
  • app/src-tauri/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • app/src-tauri/src/lib.rs
  • app/src/utils/toolDefinitions.ts
  • docs/voice-system-actions.md
  • src/openhuman/agent_registry/agents/orchestrator/agent.toml
  • src/openhuman/agent_registry/agents/orchestrator/prompt.md
  • src/openhuman/tools/impl/computer/automate.rs
  • src/openhuman/tools/impl/computer/ax_interact.rs
  • src/openhuman/tools/impl/computer/keyboard.rs
  • src/openhuman/tools/impl/computer/keyboard_tests.rs
  • src/openhuman/tools/impl/computer/mod.rs
  • src/openhuman/tools/impl/computer/mouse.rs
  • src/openhuman/tools/impl/computer/mouse_tests.rs
  • src/openhuman/tools/ops.rs
  • src/openhuman/tools/user_filter.rs

Comment thread app/src/utils/toolDefinitions.ts
Comment thread docs/voice-system-actions.md
Comment thread docs/voice-system-actions.md Outdated
Comment thread src/openhuman/tools/impl/computer/automate.rs
Comment thread src/openhuman/tools/impl/computer/automate.rs
Comment thread src/openhuman/tools/impl/computer/automate.rs
Remove the stale 'THE BLOCKER / fix not yet done / keep disabled' paragraph
that contradicted the '✅ crash fixed' status; reframe as root-cause-now-fixed
and note the catch_unwind guard. Add the `text` language to the trace fence.
@M3gA-Mind

Copy link
Copy Markdown
Collaborator Author

Independent review (beyond the CodeRabbit pass)

Reviewed the tool-wiring slice — AutomateTool/AxInteractTool registration (tools/ops.rs, user_filter.rs), the orchestrator exposure (agent.toml, prompt.md), and the gating fix added here for @oxoxDev's #3340 blocker.

Security fix verified

  • MouseTool + KeyboardTool now override external_effect → true; confirmed the gate keys off external_effect_with_args (engine/tools.rs:162), which defaults to external_effect(). So every mouse/keyboard call now routes through the ApprovalGate. Tests assert it.
  • Main-thread executor wrapped in catch_unwind — an enigo FFI panic can no longer unwind across the app main thread.
  • AutomateTool/AxInteractTool gate correctly (external_effect_with_args/permission_level_with_args), reuse the SENSITIVE_APPS denylist + ax_interact_mutations opt-in.

Reviewed clean / noted

  • Corrected the user_filter.rs + ax_interact.rs comments that wrongly implied Dangerous triggers the gate.
  • i18n: the new automate catalog entry uses literal displayName/description consistent with the entire existing TOOL_CATALOG (all literals since Phase 1). Full-catalog useT() migration is tracked as a follow-up rather than a one-off inconsistency here.

No further correctness issues. LGTM once CI is green.

@M3gA-Mind M3gA-Mind merged commit cd31484 into tinyhumansai:main Jun 4, 2026
22 checks passed
M3gA-Mind added a commit to M3gA-Mind/openhuman that referenced this pull request Jun 4, 2026
It isn't on main and its 'mouse/keyboard run without an approval prompt'
text contradicts the tinyhumansai#3342 ApprovalGate fix. Keep tinyhumansai#3344 a clean UI slice;
a corrected desktop-control prompt can land as its own follow-up.
M3gA-Mind added a commit to M3gA-Mind/openhuman that referenced this pull request Jun 4, 2026
…ch slice

Same stale 'runs without an approval prompt' section as tinyhumansai#3344 — not on main,
contradicts the tinyhumansai#3342 ApprovalGate fix. Tracked for a corrected follow-up.
M3gA-Mind added a commit to M3gA-Mind/openhuman that referenced this pull request Jun 4, 2026
…lGate

Ship the orchestrator's desktop-control playbook (carried from the original
voice work) with the gating line corrected: mouse/keyboard now route through
the ApprovalGate (tinyhumansai#3342), so the prompt no longer claims they 'run without an
approval prompt'. Resolves the deferred follow-up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. feature Net-new user-facing capability or product behavior. rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant