OmniVision Sandbox

A powerful, multi-turn Streamlit UI designed for testing, comparing, and pushing the limits of Vision-Language Models (VLMs). Whether you are running local instances (like Gemma, Qwen) or cloud APIs (OpenAI GPT-4o, Google Gemini), this sandbox provides a unified interface to craft complex multi-modal prompts.

✨ Features

Universal Model Support: Easily switch between local models (via vLLM/Ollama) and commercial APIs (OpenAI, Gemini) using a centralized config.py. Fully supports modern reasoning models (e.g., Qwen 3.5/3.6, OpenAI o-series).
🧠 Advanced Reasoning (Chain-of-Thought): Extracts and displays the hidden "thinking" process of reasoning models inside a collapsible UI widget.
Rich Media Handling:
- Images: Upload or paste screenshots directly. Preserves multi-line text and prompt formatting perfectly in the chat history.
- Video: Native video parsing and frame sampling support.
Smart EXIF & Location Extraction: Automatically extracts EXIF data, XMP metadata, and IPTC tags from uploaded images.
Reverse Geocoding: Converts GPS coordinates found in images into human-readable addresses using Nominatim.
Dynamic Variable Injection: Automatically maps extracted locations and timestamps to variables (e.g., {geo_1}, {time_1}) that you can use dynamically in your text prompts.
Granular Parameter Control: Adjust Temperature, Top P, Max Tokens, Seeds, Video Sampling Frames, and Thinking Toggles dynamically from the sidebar based on what the active model supports.
Transparent Execution: Inspect exact JSON payloads, raw chain-of-thought outputs, token usage, and execution times for every turn.

🛠️ Prerequisites

You will need Python 3.8+.

The following dependencies are required to run the application:

streamlit (UI framework)
openai (API client for both OpenAI and local OpenAI-compatible endpoints)
Pillow (Image processing and EXIF extraction)
geopy (Reverse geocoding)

🚀 Installation & Setup

1. Clone the repository: git clone https://github.com/yourusername/omnivision-sandbox.git cd omnivision-sandbox

2. Install dependencies: pip install streamlit openai Pillow geopy

3. Configure your API Keys: Set your environment variables for the remote models you wish to use. You can do this in your terminal or via a .env file:

export OPENAI_API_KEY="your-openai-key" export GEMINI_API_KEY="your-gemini-key"

(Note: Local models running on localhost:8000 or localhost:11434 do not require API keys).

4. Edit the Configuration (Optional): Open config.py to modify the LLM_PROFILES list. You can add your own custom local models, change default system prompts, or adjust geocoding limits.

5. Run the Application: streamlit run webapp.py

🧩 How to Use the Builder

Select a Model: Use the sidebar to choose your target LLM. If the model supports it, a "🧠 Enable Thinking" toggle will dynamically appear in the parameters menu.
Add Blocks: Use the bottom toolbar to add Text, Image, or Video blocks.
Upload Media: Drag and drop an image or paste from your clipboard. The app will immediately attempt to read its EXIF data.
Use Variables: If an image has GPS/Time data, the app will display variables like {geo_1} and {time_1}. Write your text prompt like this: "What is the architectural style of the building in this image? It was taken at {geo_1}."
Send: Click Assemble & Send to LLM to process the payload. Once the model replies, you can expand the "🧠 View Model Thinking" widget to see its internal logic.

⚠️ Notes on Geocoding Rate Limits

This app uses OpenStreetMap's Nominatim service for reverse geocoding. To comply with their usage policy, the app enforces a strict 1-second pause between requests. Do not upload large batches of GPS-tagged images at once if you are in a rush.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
config.py		config.py
readme.md		readme.md
webapp.py		webapp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniVision Sandbox

✨ Features

🛠️ Prerequisites

🚀 Installation & Setup

🧩 How to Use the Builder

⚠️ Notes on Geocoding Rate Limits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OmniVision Sandbox

✨ Features

🛠️ Prerequisites

🚀 Installation & Setup

🧩 How to Use the Builder

⚠️ Notes on Geocoding Rate Limits

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages