Autonomous Research Reliability Engineer — Converts scientific publications into verifiable, executable computational workflows using Gemini 3 AI agents.
Built for the Gemini 3 Hackathon.
- About
- The Problem
- Our Solution
- Gemini 3 Integration
- Architecture
- Key Features
- Data Flow Pipeline
- Tech Stack
- Quick Start
- Project Structure
- Testing
- Documentation
- License
- Team
- Acknowledgements
VeriFlow is an end-to-end platform that tackles the research reproducibility crisis by autonomously converting scientific publications into executable computational workflows. Given a PDF of a research paper, VeriFlow uses a LangGraph-orchestrated pipeline of three specialized Gemini 3 AI agents to extract the methodology, generate standards-compliant CWL v1.3 workflows with Docker containers, validate them through a self-healing retry loop, and review the output for scientific correctness — all with real-time WebSocket streaming to an interactive Vue 3 frontend.
Note: This project was built for the Gemini 3 Hackathon.
Scientific research faces a reproducibility crisis: studies report that 70%+ of researchers have failed to reproduce another scientist's experiment. Key barriers include:
- Implicit methodology — Critical experimental details are buried in PDF publications as unstructured text
- Missing computational environments — Papers describe tools and models without specifying exact versions, dependencies, or configurations
- No executable artifacts — Methodologies exist only as prose, not as runnable code
- Manual workflow creation — Converting a paper's methodology into an executable pipeline requires deep domain expertise and significant engineering effort
VeriFlow bridges the gap between scientific publications and executable workflows through an autonomous, AI-driven pipeline:
- Upload a scientific publication (PDF) with optional user context
- Scholar Agent (Gemini 3 Pro) extracts the full methodology as a structured ISA-JSON hierarchy using native PDF upload, grounding with Google Search, and agentic vision for diagram analysis
- Engineer Agent (Gemini 3 Pro) generates a complete CWL v1.3 workflow with Dockerfiles, tool definitions, and infrastructure code using the extracted ISA-JSON and repository context
- Validate Node checks generated artifacts for structural correctness (Dockerfile has FROM, CWL has cwlVersion) — if errors are found, the self-healing loop retries the Engineer up to 3 times with error context
- Reviewer Agent (Gemini 3 Flash) critiques the final output for scientific correctness, comparing the ISA extraction against the generated code
- Plan & Apply — Users can chat with any agent, refine directives, and restart the workflow from any node
- Export results as a SPARC SDS-compliant ZIP with full provenance tracking
VeriFlow leverages 4 Gemini 3 features through the google-genai SDK (from google import genai):
| # | Feature | How It's Used | Agent(s) |
|---|---|---|---|
| 1 | Pydantic Structured Output | All agents use Pydantic BaseModel subclasses as response_schema parameter with response_mime_type="application/json" for type-safe, validated JSON responses (AnalysisResult, WorkflowResult, ValidationResult, ErrorTranslationResult) |
All 3 |
| 2 | Native PDF Upload | types.Part.from_bytes(data=file_data, mime_type="application/pdf") for multimodal publication ingestion — the entire PDF is sent to Gemini for full-document analysis |
Scholar |
| 3 | Thought Signature Preservation | _extract_thoughts() captures reasoning chains from response.candidates[].content.parts where part.thought == True, preserving reasoning across multi-turn conversations for iterative CWL generation and validation-fix loops |
Engineer, Reviewer |
| 4 | Async Streaming | client.aio.models.generate_content_stream() for real-time token-by-token streaming via WebSocket to the frontend console |
All 3 |
| Agent | Model | Thinking Budget | Responsibilities |
|---|---|---|---|
| ScholarAgent | gemini-3-pro-preview |
HIGH (24,576) | PDF analysis, ISA-JSON extraction, confidence scoring, tool/model identification |
| EngineerAgent | gemini-3-pro-preview |
HIGH (24,576) | CWL v1.3 workflow generation, Dockerfile creation, infrastructure code |
| ReviewerAgent | gemini-3-flash-preview |
MEDIUM (8,192) | ISA vs code critique, scientific correctness validation, approval/rejection decision |
All Gemini 3 interactions go through a single GeminiClient class:
from google import genai
from google.genai import types
class GeminiClient:
model_name = "gemini-3.0-flash" # Default fallback
async def analyze_file(self, file_path, prompt, model, stream_callback):
"""Native PDF upload via Part.from_bytes + JSON response + async streaming"""
async def generate_content(self, prompt, model, response_schema, stream_callback):
"""Text-only structured generation with optional streaming"""
def _extract_thoughts(self, response) -> List[str]:
"""Chain-of-thought extraction from response candidates"""
def _robust_parse_json(self, text) -> Dict:
"""json_repair-based parsing for Markdown backticks and malformed JSON"""| Service | Port | Purpose |
|---|---|---|
backend |
8000 | FastAPI backend (Python 3.11) |
frontend |
3000 | Vue 3 SPA via Nginx |
postgres |
5432 | PostgreSQL 15 (Airflow database) |
minio |
9000/9001 | S3-compatible object storage (4 buckets) |
minio-init |
— | Ephemeral bucket initialization |
airflow-apiserver |
8080 | Airflow 3.0.6 REST API server |
airflow-scheduler |
— | Airflow task scheduler (LocalExecutor) |
dind |
— | Docker-in-Docker for CWL execution |
cwl |
— | CWL runner (cwltool) |
veriflow-sandbox |
— | Sandbox for script execution (PyTorch + nnU-Net) |
A StateGraph with 4 nodes (Scholar, Engineer, Validate, Reviewer) orchestrates the full PDF-to-workflow pipeline. Conditional edges enable a self-healing retry loop where validation failures automatically route back to the Engineer with error context, up to 3 iterations.
Upload a scientific paper and VeriFlow autonomously extracts the methodology, generates executable workflows, validates them through a self-healing loop, and reviews them for scientific correctness — no manual intervention required.
The Scholar Agent extracts structured investigation hierarchies following the ISA (Investigation-Study-Assay) standard, with per-field confidence scores and source page references using Gemini 3's native PDF upload.
The Engineer Agent produces standards-compliant Common Workflow Language workflows with:
- Step-by-step
CommandLineTooldefinitions withInitialWorkDirRequirementembedded scripts - Auto-generated Dockerfiles for each tool
- Data format adapters between incompatible step types
- Repository context analysis (reads repo files up to 50KB for informed generation)
The Validate node checks generated artifacts and the LangGraph conditional edges route:
- Back to Engineer (retry with error context) if validation fails and
retry_count < 3 - Forward to Reviewer (final critique) if validation passes or max retries reached
Users can chat with any agent about their output, formulate specific directives, and restart the workflow from any node with those directives applied:
POST /api/v1/chat/{run_id}/{agent_name}— Discuss agent outputPOST /api/v1/chat/{run_id}/{agent_name}/apply— Apply directive and restart
All agent output is streamed token-by-token via WebSocket to the frontend console using Gemini 3's generate_content_stream() API, with the SmartMessageRenderer component providing intelligent rendering of JSON, Markdown, Dockerfiles, and CWL code blocks.
Export results as a standards-compliant ZIP containing:
dataset_description.json— Dataset metadatamanifest.xlsx— File manifest with checksumsprovenance.json— W3C PROV derivation trackingderivative/— Output files organized by execution step
Vue 3 frontend with:
- Left: PDF upload + ISA hierarchy viewer with confidence scores
- Center: Interactive Vue Flow workflow graph with custom nodes
- Right: Results visualization and SDS export
- Bottom: Real-time console with agent streaming via SmartMessageRenderer
Scientific Publication (PDF) + User Context + Repository Path
|
v
[POST /api/v1/orchestrate]
|
v
VeriFlowService.run_workflow()
|
v
+=== LangGraph StateGraph ================================+
| |
| ScholarAgent (Gemini 3 Pro) |
| - Native PDF Upload (Part.from_bytes) |
| - Grounding with Google Search |
| - Pydantic Structured Output (AnalysisResult) |
| - Thinking: HIGH (24,576) + Async Streaming |
| | |
| v |
| ISA-JSON Hierarchy + Confidence Scores |
| | |
| v |
| EngineerAgent (Gemini 3 Pro) |
| - Pydantic Structured Output (WorkflowResult) |
| - Repository Context (up to 50KB of source files) |
| - Previous validation_errors injected into prompt |
| - Thinking: HIGH (24,576) + Async Streaming |
| | |
| v |
| CWL Workflow + Dockerfiles + Infrastructure Code |
| | |
| v |
| Validate Node (System) |
| - Dockerfile has FROM instruction? |
| - CWL has cwlVersion declaration? |
| | |
| Valid? --No + retry<3--> Back to Engineer |
| | |
| Yes (or max retries) |
| | |
| v |
| ReviewerAgent (Gemini 3 Flash) |
| - ISA vs Generated Code critique |
| - Thought Signature Preservation |
| - Thinking: MEDIUM (8,192) + Async Streaming |
| - Decision: approved / rejected |
| |
+=========================================================+
|
v (WebSocket streaming throughout)
Vue 3 Frontend — Real-time Console + ISA Viewer + Graph
|
v (optional)
ExecutionEngine --> CWLParser --> DAGGenerator
|
v
Airflow 3.0.6 --> DockerOperator --> cwltool
|
v
SDS ZIP Export (dataset_description + manifest + provenance)
| Layer | Technology |
|---|---|
| AI | Gemini 3 (google-genai SDK) — gemini-3-pro-preview, gemini-3-flash-preview |
| Orchestration | LangGraph (StateGraph with conditional edges, self-healing retry loop) |
| Backend | Python 3.11, FastAPI, Pydantic, uvicorn, json-repair |
| Frontend | Vue 3.5, Vue Flow 1.41, Pinia, Tailwind CSS 4, TypeScript, Vite 6, markdown-it |
| Real-time | WebSocket (FastAPI native), SmartMessageRenderer |
| Execution | Apache Airflow 3.0.6 (LocalExecutor), CWL v1.3, Docker-in-Docker, cwltool |
| Storage | SQLite (sessions), PostgreSQL 15 (Airflow), MinIO (S3-compatible, 4 buckets) |
| Standards | ISA-JSON, SPARC SDS, CWL v1.3, W3C PROV |
- Docker & Docker Compose
- A Gemini API key from Google AI Studio
# 1. Clone the repository
git clone https://github.com/ABI-CTT-Group/VeriFlow.git
cd VeriFlow
# 2. Configure environment
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY
# 3. Start all services
docker compose up -d
# 4. Open the app
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000/docs
# Airflow UI: http://localhost:8080
# MinIO Console: http://localhost:9001# Backend
cd backend
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8000
# Frontend
cd frontend
npm install
npm run devVeriFlow/
+-- backend/ # Python FastAPI backend
| +-- app/
| | +-- agents/ # ScholarAgent, EngineerAgent, ReviewerAgent (class-based)
| | +-- api/ # 5 REST API routers + WebSocket endpoint
| | +-- graph/ # LangGraph StateGraph + node implementations
| | | +-- workflow.py # StateGraph definition with conditional edges
| | | +-- nodes.py # scholar_node, engineer_node, validate_node, reviewer_node
| | +-- models/ # Pydantic schemas (Gemini structured output)
| | +-- services/ # GeminiClient, VeriFlowService, WebSocketManager, SQLiteDB
| | +-- state.py # AgentState TypedDict (LangGraph shared state)
| | +-- main.py # FastAPI entry point
| +-- config.yaml # Agent model & thinking level configuration
| +-- prompts.yaml # Versioned prompt templates per agent
| +-- examples/ # Pre-computed agent outputs for MAMA-MIA demo
| +-- tests/ # pytest tests (unit + integration)
+-- frontend/ # Vue 3 + TypeScript + Tailwind CSS 4
| +-- src/
| | +-- components/ # Vue components including SmartMessageRenderer
| | +-- stores/ # Pinia workflow + console stores
| | +-- services/ # API client (axios) + WebSocket service
| | +-- utils/ # dagre layout utilities
+-- airflow/ # Custom Airflow 3.0.6 image + DAGs
+-- cwl/ # CWL runner service (cwltool)
+-- sandbox/ # Sandbox Docker environment (PyTorch + nnU-Net)
+-- docs/ # Architecture diagrams (Mermaid, draw.io, about, testing)
+-- docker-compose.yml # 10-service orchestration (development)
+-- docker-compose.prod.yml # Production configuration (GHCR images)
+-- .env.example # Environment variable template
+-- SPEC.md # Technical specification
# Backend unit tests
cd backend && python -m pytest tests/ -v
# Backend tests in Docker
docker compose run --rm backend pytest tests/ -v
# Frontend tests (Vitest)
cd frontend && npx vitest run- Technical Specification — Full implementation spec with API endpoints, data structures, and agent details
VeriFlow is fully open source and distributed under the Apache License 2.0. See LICENSE for more information.
- Thiranja Prasad Babarenda Gamage
- Jagir Hussan
- Tim Wu
- Linkun Gao
- Patrick Zhang
- Emily Clark Murphy
- Holly Hogan-Murphy
- Chinchien Lin
- We would like to thank the organizers of the Gemini Hachathon for organising this event and all the support doucmnetation
- ARDC Nectar Research Cloud for providing the infrastructure for hosting this project
