Skip to content

Conversation

@kevinswint
Copy link

@kevinswint kevinswint commented Jan 1, 2026

Add Screenshot Capture Tool

Summary

This PR adds a new capture_screenshot MCP tool that enables Claude to visually analyze Roblox Studio workspaces by capturing screenshots of the Studio window.

Motivation

Currently, Claude can interact with Roblox Studio through code execution (run_code) and asset insertion (insert_model), but lacks the ability to visually inspect the workspace. This screenshot capability enables powerful new workflows:

  • Visual debugging: Claude can see UI layout issues, visual glitches, or rendering problems
  • Design verification: Claude can verify that changes look correct after making modifications
  • Iterative development: Claude can capture multiple views to analyze scenes from different angles (combined with camera control via run_code)

Implementation

Architecture

  • Rust-only implementation - No Luau plugin changes needed
  • Cross-platform support - macOS and Windows

Platform-Specific Details

macOS:

  • Uses Swift script to query window list via CoreGraphics API
  • Obtains Roblox Studio window ID without requiring Accessibility permissions
  • Captures window using screencapture command-line tool
  • Requires Screen Recording permission for Terminal/MCP client

Windows:

  • Uses PowerShell with System.Drawing to capture window
  • Finds Roblox Studio by window title matching
  • No additional permissions required

Output Format

  • Returns base64-encoded PNG data via MCP protocol
  • Compatible with Claude's multimodal capabilities for visual analysis
  • Typical screenshot size: ~1.5MB (encoded), 3420x1918 pixels on Retina displays

Testing

✅ Tested on macOS 14.x (Sonnet) with Roblox Studio

  • Window detection works correctly (finds Studio even when not focused)
  • Screenshot capture produces valid PNG images
  • Base64 encoding and MCP transmission successful
  • Claude can receive and analyze the screenshot data

⚠️ Windows implementation is untested (no Windows development environment available)

Documentation

  • Updated README with complete tool documentation
  • Added usage examples and prompts
  • Documented macOS Screen Recording permission requirement
  • Added note about window focus requirements (none needed)

Files Changed

  • Cargo.toml - Added base64 dependency
  • Cargo.lock - Dependency updates
  • src/rbx_studio_server.rs - Screenshot tool implementation (~155 lines)
  • README.md - Tool documentation and examples (~38 lines)

Breaking Changes

None - This is a purely additive change.

Future Enhancements

Potential follow-up work (out of scope for this PR):

  • Configurable screenshot parameters (region, format, quality)
  • Video capture support
  • Built-in camera movement presets
  • Annotation/markup capabilities

Checklist

  • Code follows project style and conventions

  • Cross-platform implementation (macOS + Windows)

  • Documentation updated (README)

  • Tested locally on macOS

  • No breaking changes

  • Commit message follows conventional commits format

    Updates (Jan 6)

    • Fixed Windows implementation - Added missing Win32/RECT type definitions (script was non-functional before)
    • Fixed image handling - Now returns Content::image instead of Content::text, so Claude processes it as a proper image rather than tokenizing base64 text
    • Increased resolution - Bumped from 1024 to 4096px for full Retina/4K quality
    • Added timeouts - 10-second timeout on Swift/PowerShell to prevent hangs
    • Better error messages - Standardized messaging, added macOS permission hint
    • Code cleanup - Extracted shared process_screenshot() function, added named constants

@josharagon
Copy link

josharagon commented Jan 1, 2026

Beautiful. I was literally planning on doing something similar yesterday. I will test on Windows

@kevinswint
Copy link
Author

@josharagon Thanks! Really appreciate you testing on Windows - I don't have a Windows
environment to test on, so that would be super helpful. Let me know if you
run into any issues!

Add a new capture_screenshot MCP tool that captures the Roblox Studio
window and returns base64-encoded JPEG data. This enables Claude to
visually analyze workspaces, debug UI issues, and verify changes.

Implementation details:
- Cross-platform support for macOS and Windows
- macOS: Uses Swift to filter for main Studio window + screencapture
- Windows: Uses PowerShell with System.Drawing for screen capture
- Images resized to max 1024x1024px while maintaining aspect ratio
- JPEG compression at quality 85 for optimal clarity/size balance
- Returns base64-encoded data (~93K chars) via MCP protocol

Window selection:
- macOS: Filters for windows with " - Roblox Studio" in title
- Prevents capturing auxiliary windows (Output, etc.)

Requirements:
- macOS requires Screen Recording permission for Terminal/MCP client
- Windows has no additional permission requirements

Documentation:
- Updated README with tool descriptions and technical details
- Added TESTING.md with testing guidelines and benchmarks
- Added setup instructions for macOS Screen Recording permissions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@kevinswint kevinswint force-pushed the add-screenshot-capture-tool branch from 8135db2 to da94594 Compare January 4, 2026 06:00
Kevin Swint and others added 3 commits January 6, 2026 16:23
Critical fixes:
- Return Content::image instead of Content::text for proper multimodal handling
- Fix Windows PowerShell script with proper Win32/RECT type definitions

Improvements:
- Extract shared process_screenshot() function to eliminate code duplication
- Add configurable constants for image dimensions and JPEG quality
- Add timeouts to external commands (Swift/PowerShell) to prevent hangs
- Standardize error messages across platforms
- Add -NoProfile and -ExecutionPolicy Bypass flags for Windows reliability
- Add proper resource disposal in Windows script
- Improve error handling with more descriptive messages
- Relax Swift window matching (removed leading space requirement)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Now that images are properly returned as Content::image (not text),
we can use higher resolution without hitting token limits. Bumped
from 1024 to 2048 for better detail and legibility.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
4096 max dimension captures native resolution on most displays without
downscaling, while still capping extremely large windows. Small windows
are unaffected (no upscaling occurs).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@kevinswint
Copy link
Author

Hey @josharagon I've pushed some updates to PR #47:

  • Fixed the Windows PowerShell script (added missing Win32/RECT type definitions - it was broken before)
  • Screenshots now return as proper images instead of base64 text
  • Bumped resolution to 4096px for full quality
  • Added timeouts and better error handling

Would love your help testing on Windows when you get a chance, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants