OceanGPT-X is an intelligent marine image recognition service under the OceanGPT project, providing a unified multi-model inference API for marine biology research, underwater robot vision, and sonar image interpretation. With one-click deployment, users can upload marine images via REST API or the Streamlit demo and receive species-level identification results.
OceanGPT-X employs a multi-model fusion inference strategy, combining FAISS vector retrieval, OceanCLIP (a marine-adapted vision-language model fine-tuned from BioCLIP), and a suite of YOLOv5/YOLOv11-cls detection and classification models for efficient and accurate image recognition.
For each input image, the system processes as follows:
- FAISS Vector Retrieval — Uses BioCLIP pre-trained features to search the retrieval database. If similarity exceeds the threshold (default 0.90), returns the match directly, skipping further inference.
- Router Classifier — If no match is found, a YOLOv11-cls router model classifies the image as "sonar" or "biological".
- Branch Inference:
- Sonar branch: A YOLOv5 classifier categorizes sonar targets into 15 classes (e.g., side-scan sonar, multibeam, cube).
- Biological branch: A YOLOv5 fish/coral binary classifier determines the category, then either a fish detector or coral detector performs fine-grained species identification.
- Cross-Validation Fusion — In the biological branch, the detector result is cross-validated against OceanCLIP's Top-N matches at the genus level. If they agree, a fused result is output (source:
oceanclip+detector); otherwise OceanCLIP takes priority; if OceanCLIP is unavailable, the detector result is used as fallback.
All model weights and data files are hosted in the OceanGPT-X Collection on Hugging Face:
| Repository | Model File | Task | Architecture | Classes |
|---|---|---|---|---|
| zjunlp/Ocean-router | cls_bio_sonar/best.pt |
Sonar vs. Bio routing | YOLOv11-cls | 2 |
| zjunlp/Ocean-router | fish_coral_cls/best.pt |
Fish vs. Coral binary | YOLOv5 | 2 |
| zjunlp/Ocean-yolo | fish_detector/best.pt |
Fish species detection | YOLOv5 | Multi-class |
| zjunlp/Ocean-yolo | coral_detector/best.pt |
Coral species detection | YOLOv5 | Multi-class |
| zjunlp/Ocean-yolo | sonar_detector/best.pt |
Sonar target detection | YOLOv5 | 15 |
| zjunlp/OceanCLIP-0.15B | oceanclip-bio/epoch_50.pt |
Zero-shot species ID | CLIP (ViT-B/16) | Term-driven |
| zjunlp/OceanCLIP-0.15B | bioclip/open_clip_pytorch_model.bin |
BioCLIP base weights | CLIP (ViT-B/16) | — |
| zjunlp/Ocean-FAISS | faiss/index.faiss |
FAISS retrieval index | — | — |
| zjunlp/Ocean-FAISS | metadata/metadata.jsonl |
Image metadata (species, location, capture info) | — | — |
conda env create -f environment.yml
conda activate marine-apiRequired for loading YOLOv5-format models (sonar, fish, coral):
git clone https://github.com/ultralytics/yolov5 ./yolov5Default clone to ./yolov5 for auto-detection. If using a different path, set the YOLOV5_DIR environment variable.
All model weights and data files are hosted on Hugging Face: huggingface.co/collections/zjunlp/oceangpt-x
python scripts/download_assets.pyThis downloads:
- 7 model weights (Router, Sonar classifier, Fish/Coral binary, Fish detector, Coral detector, OceanCLIP checkpoint + terms)
- BioCLIP base model for feature encoding
- FAISS retrieval index
- Metadata for image lookup
Custom download directory:
python scripts/download_assets.py --download-dir ./my-modelsAll paths default to the downloaded_assets/ directory created by the download script.
No manual configuration is required to start the service.
Only set environment variables if you use custom paths:
export YOLOV5_DIR=/path/to/yolov5To adjust inference parameters:
export THRESHOLD=0.85
export TOPK=10uvicorn app.main:app --host 0.0.0.0 --port 8000Development mode with auto-reload:
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reloadLaunch the interactive web demo:
streamlit run streamlit/demo.pyGET /health
Returns the loading status of each model module.
POST /predict
| Field | Type | Description |
|---|---|---|
| file | image | Image to classify |
Example:
curl -X POST http://localhost:8000/predict -F "file=@test/soner_cube.png"Or open http://localhost:8000/docs for interactive API documentation.
Key environment variables:
| Variable | Default | Description |
|---|---|---|
THRESHOLD |
0.90 |
FAISS retrieval similarity threshold |
ROUTER_THRESHOLD |
0.5 |
Probability threshold for sonar routing |
USE_OCEANCLIP |
true |
Enable OceanCLIP species identification |
TOPK |
5 |
Number of FAISS retrieval results |
DEVICE |
cuda |
Computation device (cuda or cpu) |
YOLOV5_DIR |
./yolov5 |
YOLOv5 source directory |
app/
api/ # FastAPI routes (/health, /predict)
core/ # Configuration and global state
services/ # Model loading, retrieval, classification, fusion
main.py # Application entry point
scripts/
download_assets.py # One-command model + data downloader
streamlit/
demo.py # Streamlit demo UI
test/ # Sample test images
4 test images are provided in test/:
test/coral_Acropora Cervicornis_1.png— Coral (Acropora cervicornis)test/fish_Amphiprion_clarkii_62.png— Fish (Amphiprion clarkii)test/soner_cube.png— Sonar (cube)test/fish.png— Out-of-domain fish (aquarium white background)
This repository does not include model weights or data files. Download them via scripts/download_assets.py. All paths default to the download script's output directory — zero configuration needed.
