Skip to content

docer1990/visiontest

Repository files navigation

VisionTest - MCP Server for Mobile Automation

An MCP server that lets AI agents interact with Android devices and iOS simulators — tap, swipe, type, read UI elements, and launch apps.

What It Does

  • Android + iOS automation through a single MCP server
  • UI interaction: tap, swipe, type text, find elements, read screen hierarchy
  • App management: list, inspect, and launch apps
  • Device detection: automatically finds connected Android devices and booted iOS simulators
  • Zero-config iOS: uses pre-built test bundle when installed, falls back to source build if needed

Prerequisites

  • JDK 17 or higher
  • macOS or Linux (arm64 or x86_64)
  • Android Platform Tools (for Android automation): Download
  • Xcode Command Line Tools (for iOS simulator automation, macOS only)

Installation

Quick Install (Recommended)

curl -fsSL https://github.com/docer1990/visiontest/releases/latest/download/install.sh | bash

This will:

  • Check that Java 17+ is installed
  • Download the latest release JAR, Android APKs, and iOS test bundle
  • Create a visiontest command in ~/.local/bin/
  • Verify all downloads via SHA-256 checksums

You can customize the install directory:

VISIONTEST_DIR="$HOME/my-tools/visiontest" curl -fsSL https://github.com/docer1990/visiontest/releases/latest/download/install.sh | bash

To update, re-run the same command.

Configure Your AI Coding Tool

Claude Code
claude mcp add visiontest java -- -jar ~/.local/share/visiontest/visiontest.jar
Claude Desktop

Edit the config file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Linux: ~/.config/Claude/claude_desktop_config.json
{
  "mcpServers": {
    "visiontest": {
      "command": "java",
      "args": ["-jar", "/ABSOLUTE/PATH/TO/.local/share/visiontest/visiontest.jar"]
    }
  }
}

Note: Replace /ABSOLUTE/PATH/TO with your home directory (e.g. /Users/yourname on macOS, /home/yourname on Linux). JSON does not expand ~.

GitHub Copilot CLI

Add to ~/.copilot/mcp-config.json:

{
  "mcpServers": {
    "visiontest": {
      "command": "java",
      "args": ["-jar", "/ABSOLUTE/PATH/TO/.local/share/visiontest/visiontest.jar"],
      "type": "stdio"
    }
  }
}
OpenAI Codex CLI
codex mcp add visiontest -- java -jar ~/.local/share/visiontest/visiontest.jar

Or add to ~/.codex/config.toml:

[mcp_servers.visiontest]
command = "java"
args = ["-jar", "/ABSOLUTE/PATH/TO/.local/share/visiontest/visiontest.jar"]
OpenCode

Add to opencode.json (project root or ~/.config/opencode/opencode.json):

{
  "mcp": {
    "visiontest": {
      "type": "local",
      "command": ["java", "-jar", "/ABSOLUTE/PATH/TO/.local/share/visiontest/visiontest.jar"]
    }
  }
}

Build from Source

For development or contributing, see CONTRIBUTING.md.

Usage

Your AI coding tool discovers all available tools automatically via MCP. Just ask it to interact with a device and it will use the right tools.

Android Workflow

1. install_automation_server     →  Install APKs (one-time setup)
2. start_automation_server       →  Start the JSON-RPC server
3. get_interactive_elements      →  Get interactive elements with tap coordinates
4. android_tap_by_coordinates    →  Tap using centerX/centerY
5. android_input_text            →  Type text into focused field

iOS Workflow

1. ios_start_automation_server   →  Start XCUITest server (pre-built or source build)
2. ios_get_interactive_elements  →  Get interactive elements with tap coordinates
3. ios_tap_by_coordinates        →  Tap using centerX/centerY
4. ios_input_text                →  Type text into focused field

Available Tools

Device Management: available_device_android, list_apps_android, info_app_android, launch_app_android, ios_available_device, ios_list_apps, ios_info_app, ios_launch_app

Android Automation: install_automation_server, start_automation_server, automation_server_status, get_ui_hierarchy, get_interactive_elements, find_element, android_tap_by_coordinates, android_swipe, android_swipe_direction, android_swipe_on_element, android_get_device_info, android_input_text, android_press_back, android_press_home

iOS Automation: ios_start_automation_server, ios_automation_server_status, ios_get_ui_hierarchy, ios_get_interactive_elements, ios_find_element, ios_tap_by_coordinates, ios_swipe, ios_swipe_direction, ios_get_device_info, ios_input_text, ios_press_home, ios_stop_automation_server

CLI Usage

The same operations are also available as direct CLI commands — no MCP client needed:

visiontest automation_server_status -p android
visiontest get_interactive_elements -p ios
visiontest tap_by_coordinates -p android 100 200
visiontest screenshot -p ios --output ./screenshot.png
visiontest swipe_direction -p android up --distance long --speed fast

Every command requires --platform android or --platform ios (alias -p). Run visiontest --help for the full command list, or visiontest <command> --help for per-command usage.

With no arguments, visiontest starts the MCP stdio server.

Exit Codes

Code Meaning
0 Success
1 Generic failure
2 Usage error (missing/invalid args)
3 Automation server not reachable
4 Device/simulator not found
5 Platform not supported for this command

Configuration

Environment Variables

Variable Default Description
VISION_TEST_LOG_LEVEL PRODUCTION PRODUCTION, DEVELOPMENT, DEBUG
VISION_TEST_APK_PATH (auto-detected) Explicit path to Android test APK
VISION_TEST_IOS_PROJECT_PATH (auto-detected) Explicit path to iOS .xcodeproj
VISIONTEST_DIR ~/.local/share/visiontest Override install directory (must be under $HOME)

Ports

  • Android: 9008 (requires ADB port forwarding, set up automatically)
  • iOS: 9009 (no port forwarding needed — simulators share the Mac's network)

Future Plans

  • Text input/typing support
  • Screenshot capture via UIAutomator / XCUITest
  • CLI mode (direct command-line usage without MCP)
  • Long press operations
  • Wait/sync operations for E2E testing
  • Multi-device coordination
  • Generic app install/uninstall
  • Clipboard operations (read/write)
  • Physical iOS device support
  • WebSocket support for real-time updates
  • Notification/status bar interaction
  • Permission dialog automation
  • Video recording of automation sessions
  • Separate CLI-only artifact (smaller download, no MCP dependencies)

Contributing

See CONTRIBUTING.md for build-from-source instructions, architecture details, JSON-RPC API reference, testing guide, and how to extend VisionTest.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Vision Test is an MCP (Model Context Protocol) server that provides a standardized way for AI agents and Large Language Models to interact with mobile devices.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors