Skip to content

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

License

Notifications You must be signed in to change notification settings

naicud/JARVIS-desktop

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

1,114 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
Jarvis Banner

๐Ÿค– JARVIS

The Ultra-Evolved AI PC Controller

License Node Platform

Jarvis is a state-of-the-art Multimodal AI Agent stack that grants you total control over your computer through Natural Language and Voice Commands.

Features โ€ข Quick Start โ€ข Showcase โ€ข Contributing


๐Ÿš€ Unleash the Power of Jarvis

Jarvis transforms your interaction with the digital world. No more complex menus or repetitive tasksโ€”just tell Jarvis what you need, and watch it navigate your PC, browser, and apps with human-like precision.

โœจ Key Capabilities

  • ๐Ÿ—ฃ๏ธ Voice & Text Control - Seamlessly switch between typing and talking to your computer.
  • ๐Ÿ‘๏ธ Visual Intelligence - Powered by advanced Vision-Language Models (VLM) for screen understanding.
  • ๐Ÿ–ฑ๏ธ Native GUI Automation - Precise mouse and keyboard control across all applications.
  • ๐ŸŒ Hybrid Browser Agent - Advanced web navigation using DOM and visual grounding.
  • ๐Ÿงฐ MCP Ecosystem - Extensible through Model Context Protocol (MCP) tool integration.
  • ๐Ÿ” Privacy First - Secure processing with support for local and private models.

๐Ÿ“บ Showcase

Complex Flight Booking

"Help me book the earliest flight from San Jose to New York on September 1st and the last return flight on September 6th on Priceline"

agent-tars-new-flight.mp4
Action Local Operator Remote Operator
VS Code Automation
computer-use-triple-speed.mp4
remote-computer-operators.mp4
GitHub Exploration
browser-use-triple-speed.mp4
remote-browser-operators.mp4

๐Ÿ› ๏ธ Quick Start

For Developers (CLI)

Get up and running in seconds with our high-performance CLI.

# Launch instantly with npx
npx @agent-tars/cli@latest

# Or install globally (Requires Node.js >= 22)
npm install @agent-tars/cli@latest -g

# Run Jarvis with your preferred provider
agent-tars --provider anthropic --model claude-3-7-sonnet-latest --apiKey YOUR_API_KEY

Jarvis Desktop App

For a full native experience with a sleek UI:

  1. Clone the repository.
  2. Run pnpm install.
  3. Start the dev environment: npm run dev:ui-tars.
  4. Follow the Desktop Quick Start Guide for deeper configuration.

๐Ÿ“š Documentation & Resources

Resource Link
๐Ÿ  Website TBD
๐Ÿ“– Guides Documentation
๐Ÿ› ๏ธ SDK Build on Jarvis
๐ŸŽฎ Showcase Use Cases & Examples

๐Ÿค Contributing

We welcome contributions from the community! Whether it's a bug fix, a new feature, or better documentation, check out our CONTRIBUTING.md to get started.


๐Ÿ“œ License & Credits

Jarvis is open-source software licensed under the Apache License 2.0.

Note

This project is based on and was originally forked from UI-TARS-desktop.


Built with โค๏ธ for the future of Human-Computer Interaction.

About

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • TypeScript 89.2%
  • MDX 8.2%
  • JavaScript 1.1%
  • CSS 1.1%
  • Less 0.2%
  • HTML 0.1%
  • Other 0.1%