Skip to content

feat: add eval framework and coordinate-based input tools#453

Merged
shivammittal274 merged 1 commit intomainfrom
feat/eval
Mar 16, 2026
Merged

feat: add eval framework and coordinate-based input tools#453
shivammittal274 merged 1 commit intomainfrom
feat/eval

Conversation

@shivammittal274
Copy link
Contributor

Summary

  • Add hover_at, type_at, drag_at coordinate-based input tools to the server (3 new MCP tools, 3 new Browser methods)
  • Export server internals (./browser, ./agent/tool-loop, ./tools/registry) so eval can import them
  • Add full eval framework as apps/eval — multi-agent evaluation with support for single-agent, orchestrator-executor (Clado Action), Gemini computer use, and Yutori Navigator
  • Includes dashboard UI, multiple graders (WebVoyager, Mind2Web, FARA, performance), parallel execution, and datasets (WebVoyager, Mind2Web, WebBench, BrowseComp)
  • Nest eval-targets inside apps/eval for coordinate-click test fixtures

Test plan

  • Server typecheck passes
  • Eval typecheck passes
  • bun run src/index.ts --help loads correctly
  • Dashboard HTML functions verified (lint auto-fix had renamed onclick handlers — fixed and added biome ignore)
  • Run a test eval with bun run eval -c configs/webvoyager-test.json

🤖 Generated with Claude Code

- Add hover_at, type_at, drag_at coordinate tools to server
- Add hoverAt, typeAt, dragAt methods to Browser class
- Export server internals (browser, tool-loop, registry) for eval imports
- Copy eval app from enterprise repo with agents, graders, runner, dashboard
- Nest eval-targets inside apps/eval
- Adapt sessionExecutionDir → workingDir for current server API
- Add biome ignore for dashboard HTML to prevent lint breaking onclick handlers
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 16, 2026

Too many files changed for review. (125 files found, 100 file limit)

@shivammittal274 shivammittal274 merged commit 2905622 into main Mar 16, 2026
2 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 16, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant