Skip to content

jinlab-imvr/UniSurgSAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 

Repository files navigation

UniSurgSAM: A Universal Promptable Model for Reliable Surgical Video Segmentation

Project Page Paper Code


πŸš€ Code Coming Soon!

We are currently preparing the code for public release. Stay tuned! πŸŽ‰

The code and pre-trained models will be released soon, including:

  • Training and inference scripts
  • Pre-trained weights for all datasets
  • Dataset preparation guidelines
  • Evaluation benchmarks

⭐ Star this repository to get notified when the code is released!


πŸ“– Overview

UniSurgSAM is a universal promptable video object segmentation (PVOS) framework designed for reliable surgical video segmentation. It supports visual, textual, and audio prompts within a unified architecture, enabling flexible human-AI interaction for computer-assisted surgery.

Key Features

  • 🎯 Multi-Modal Prompts: Visual, textual, and audio prompts within a unified architecture
  • ⚑ Real-Time Performance: 55 FPS (linguistic) / 68 FPS (visual)
  • πŸ₯ Clinical Reliability: Presence-aware decoding to suppress hallucinations
  • πŸ“ Multi-Granular: Whole-object, part-level, and subpart segmentation
  • πŸ”„ Closed-Loop Design: Automatic failure recovery via adaptive state transition

Architecture Highlights

UniSurgSAM employs a two-stage paradigm with decoupled decoders:

  • Stage I: Multi-modal promptable initialization with RPAD (Reliable Presence-Aware Decoding)
  • Stage II: Boundary-aware long-term tracking (BLT) with diversity-driven memory
  • AST: Adaptive state transition for closed-loop failure recovery

For more details, please refer to our paper and project page.


🎬 Demo Videos

Check out our project page for video demonstrations and detailed results.


πŸ“ Citation

If you find our work useful, please consider citing:

@article{liu2025unisurgsam,
  title={UniSurgSAM: A Universal Promptable Model for Reliable Surgical Video Segmentation},
  author={Liu, Haofeng and Wang, Ziyue and Kong, Alex Y. W. and Qin, Guanyi and Gao, Mingqi and Low, Chang Han and Chan, Lap Yan Lennon and Jin, Yueming},
  journal={arXiv preprint},
  year={2026}
}

πŸ“§ Contact

For questions or collaborations, please contact:


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Official code implementation for paper "UniSurgSAM: A Universal Promptable Model for Reliable Surgical Video Segmentation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors