Skip to content

MaZheZJU/marine-image-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OceanGPT-X

OceanGPT-X is an intelligent marine image recognition service under the OceanGPT project, providing a unified multi-model inference API for marine biology research, underwater robot vision, and sonar image interpretation. With one-click deployment, users can upload marine images via REST API or the Streamlit demo and receive species-level identification results.

Architecture

OceanGPT-X employs a multi-model fusion inference strategy, combining FAISS vector retrieval, OceanCLIP (a marine-adapted vision-language model fine-tuned from BioCLIP), and a suite of YOLOv5/YOLOv11-cls detection and classification models for efficient and accurate image recognition.

Inference Pipeline

For each input image, the system processes as follows:

  1. FAISS Vector Retrieval — Uses BioCLIP pre-trained features to search the retrieval database. If similarity exceeds the threshold (default 0.90), returns the match directly, skipping further inference.
  2. Router Classifier — If no match is found, a YOLOv11-cls router model classifies the image as "sonar" or "biological".
  3. Branch Inference:
    • Sonar branch: A YOLOv5 classifier categorizes sonar targets into 15 classes (e.g., side-scan sonar, multibeam, cube).
    • Biological branch: A YOLOv5 fish/coral binary classifier determines the category, then either a fish detector or coral detector performs fine-grained species identification.
  4. Cross-Validation Fusion — In the biological branch, the detector result is cross-validated against OceanCLIP's Top-N matches at the genus level. If they agree, a fused result is output (source: oceanclip+detector); otherwise OceanCLIP takes priority; if OceanCLIP is unavailable, the detector result is used as fallback.

Overview

Models

All model weights and data files are hosted in the OceanGPT-X Collection on Hugging Face:

Repository Model File Task Architecture Classes
zjunlp/Ocean-router cls_bio_sonar/best.pt Sonar vs. Bio routing YOLOv11-cls 2
zjunlp/Ocean-router fish_coral_cls/best.pt Fish vs. Coral binary YOLOv5 2
zjunlp/Ocean-yolo fish_detector/best.pt Fish species detection YOLOv5 Multi-class
zjunlp/Ocean-yolo coral_detector/best.pt Coral species detection YOLOv5 Multi-class
zjunlp/Ocean-yolo sonar_detector/best.pt Sonar target detection YOLOv5 15
zjunlp/OceanCLIP-0.15B oceanclip-bio/epoch_50.pt Zero-shot species ID CLIP (ViT-B/16) Term-driven
zjunlp/OceanCLIP-0.15B bioclip/open_clip_pytorch_model.bin BioCLIP base weights CLIP (ViT-B/16)
zjunlp/Ocean-FAISS faiss/index.faiss FAISS retrieval index
zjunlp/Ocean-FAISS metadata/metadata.jsonl Image metadata (species, location, capture info)

Quick Start

1. Install Dependencies

conda env create -f environment.yml
conda activate marine-api

2. Clone YOLOv5 Source Code

Required for loading YOLOv5-format models (sonar, fish, coral):

git clone https://github.com/ultralytics/yolov5 ./yolov5

Default clone to ./yolov5 for auto-detection. If using a different path, set the YOLOV5_DIR environment variable.

3. Download Models & Data

All model weights and data files are hosted on Hugging Face: huggingface.co/collections/zjunlp/oceangpt-x

python scripts/download_assets.py

This downloads:

  • 7 model weights (Router, Sonar classifier, Fish/Coral binary, Fish detector, Coral detector, OceanCLIP checkpoint + terms)
  • BioCLIP base model for feature encoding
  • FAISS retrieval index
  • Metadata for image lookup

Custom download directory:

python scripts/download_assets.py --download-dir ./my-models

4. Configure Environment (Optional)

All paths default to the downloaded_assets/ directory created by the download script. No manual configuration is required to start the service.

Only set environment variables if you use custom paths:

export YOLOV5_DIR=/path/to/yolov5

To adjust inference parameters:

export THRESHOLD=0.85
export TOPK=10

5. Start the Service

uvicorn app.main:app --host 0.0.0.0 --port 8000

Development mode with auto-reload:

uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Streamlit Demo

Launch the interactive web demo:

streamlit run streamlit/demo.py

API Endpoints

Health Check

GET /health

Returns the loading status of each model module.

Prediction

POST /predict
Field Type Description
file image Image to classify

Example:

curl -X POST http://localhost:8000/predict -F "file=@test/soner_cube.png"

Or open http://localhost:8000/docs for interactive API documentation.

Configuration

Key environment variables:

Variable Default Description
THRESHOLD 0.90 FAISS retrieval similarity threshold
ROUTER_THRESHOLD 0.5 Probability threshold for sonar routing
USE_OCEANCLIP true Enable OceanCLIP species identification
TOPK 5 Number of FAISS retrieval results
DEVICE cuda Computation device (cuda or cpu)
YOLOV5_DIR ./yolov5 YOLOv5 source directory

Project Structure

app/
  api/          # FastAPI routes (/health, /predict)
  core/         # Configuration and global state
  services/     # Model loading, retrieval, classification, fusion
  main.py       # Application entry point
scripts/
  download_assets.py  # One-command model + data downloader
streamlit/
  demo.py       # Streamlit demo UI
test/           # Sample test images

Test Samples

4 test images are provided in test/:

  • test/coral_Acropora Cervicornis_1.png — Coral (Acropora cervicornis)
  • test/fish_Amphiprion_clarkii_62.png — Fish (Amphiprion clarkii)
  • test/soner_cube.png — Sonar (cube)
  • test/fish.png — Out-of-domain fish (aquarium white background)

Note

This repository does not include model weights or data files. Download them via scripts/download_assets.py. All paths default to the download script's output directory — zero configuration needed.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages