A modern web application that converts text and images to structured data using Google's Gemini models through Vertex AI. This app provides a user-friendly interface for individual data processing with Pydantic schema validation.
- Multi-Modal Input: Process text, images, or both simultaneously
- Pydantic Schema Support: Define data models in Python format and auto-convert to JavaScript
- Schema Validation: Validate AI outputs against your defined schemas
- Side-by-Side Visualization: View input and output data side by side
- Modern UI: Beautiful, responsive design with intuitive controls
- Real-time Processing: Direct integration with Google Vertex AI
- Error Handling: Comprehensive error reporting and validation feedback
- Google Cloud Project with Vertex AI API enabled
- Authentication set up for your project
- Modern web browser (Chrome, Firefox, Safari, Edge)
-
Download the App
# Save the index.html file to your local machine -
Get Your Access Token
# Install Google Cloud CLI if you haven't already # https://cloud.google.com/sdk/docs/install # Authenticate and get access token gcloud auth login gcloud auth print-access-token
-
Open the App
- Open
index.htmlin your web browser - Or serve it using a local server:
# Python python -m http.server 8000 # Node.js npx serve .
- Open
- Project ID: Enter your Google Cloud project ID
- Location: Select your preferred region (default: us-central1)
- Model: Choose between Gemini models
- Access Token: Paste your GCP access token
You can define your data schema in several ways:
Click on the provided example schemas (Person or Product) to load them automatically.
class Person(BaseModel):
first_name: str
last_name: str
age: Optional[int] = None
email: Optional[str] = None
middle_names: List[str] = []
titles: List[str] = []
extra_info: List[str] = []The app will auto-convert Python schemas, but you can also define them directly.
Choose your input type:
- Text: Paste text data for processing
- Image: Upload an image file (drag & drop supported)
- Text + Image: Combine both for multi-modal processing
- Click "🚀 Process Data" to send your request
- View results side-by-side with input
- Check validation status against your schema
Input: "John Smith, age 35, works as a software engineer at Google"
Schema: Person model
Output: Structured JSON with extracted fields
Input: Photo of a business card
Schema: Contact model
Output: Extracted contact information
Input: Product description + product image
Schema: Product model
Output: Complete product data structure
The app uses the OpenAI-compatible Vertex AI endpoint:
https://{location}-aiplatform.googleapis.com/v1beta1/projects/{project}/locations/{location}/endpoints/openapi
Python Pydantic models are converted to JSON Schema format for validation:
str→stringint→integerfloat→numberbool→booleanList[T]→arrayOptional[T]→ nullable fieldDict→object
Client-side validation checks:
- Required fields presence
- Data type matching
- Array/object structure
- Gemini 2.0 Flash Lite: Fast, cost-effective processing
- Gemini 2.5 Flash Preview: Enhanced capabilities
us-central1(default)us-east1us-west1europe-west1
- Responsive Design: Works on desktop and mobile
- Drag & Drop: Easy image uploads
- Real-time Feedback: Loading states and error messages
- Syntax Highlighting: JSON output formatting
- Auto-resizing: Text areas adapt to content
- Access tokens are stored only in browser memory
- No data is persisted locally
- All communication is HTTPS encrypted
- Tokens expire automatically (typically 1 hour)
-
"Access token invalid"
- Regenerate token:
gcloud auth print-access-token - Check token expiration (usually 1 hour)
- Regenerate token:
-
"API call failed"
- Verify project ID and region
- Check Vertex AI API is enabled
- Ensure proper authentication
-
"Schema validation errors"
- Review schema definition
- Check required vs optional fields
- Verify data types match
-
"Failed to parse JSON"
- Model output may be malformed
- Try adjusting the prompt
- Check model temperature settings
This project is open source and available under the MIT License.
Feel free to submit issues, feature requests, or pull requests to improve the application.
Note: This app is designed for individual data processing. For bulk processing, refer to the original Python script (main.py).