UniSurgSAM: A Universal Promptable Model for Reliable Surgical Video Segmentation

🚀 Code Coming Soon!

We are currently preparing the code for public release. Stay tuned! 🎉

The code and pre-trained models will be released soon, including:

Training and inference scripts
Pre-trained weights for all datasets
Dataset preparation guidelines
Evaluation benchmarks

⭐ Star this repository to get notified when the code is released!

📖 Overview

UniSurgSAM is a universal promptable video object segmentation (PVOS) framework designed for reliable surgical video segmentation. It supports visual, textual, and audio prompts within a unified architecture, enabling flexible human-AI interaction for computer-assisted surgery.

Key Features

🎯 Multi-Modal Prompts: Visual, textual, and audio prompts within a unified architecture
⚡ Real-Time Performance: 55 FPS (linguistic) / 68 FPS (visual)
🏥 Clinical Reliability: Presence-aware decoding to suppress hallucinations
📏 Multi-Granular: Whole-object, part-level, and subpart segmentation
🔄 Closed-Loop Design: Automatic failure recovery via adaptive state transition

Architecture Highlights

UniSurgSAM employs a two-stage paradigm with decoupled decoders:

Stage I: Multi-modal promptable initialization with RPAD (Reliable Presence-Aware Decoding)
Stage II: Boundary-aware long-term tracking (BLT) with diversity-driven memory
AST: Adaptive state transition for closed-loop failure recovery

For more details, please refer to our paper and project page.

🎬 Demo Videos

Check out our project page for video demonstrations and detailed results.

📝 Citation

If you find our work useful, please consider citing:

@article{liu2025unisurgsam,
  title={UniSurgSAM: A Universal Promptable Model for Reliable Surgical Video Segmentation},
  author={Liu, Haofeng and Wang, Ziyue and Kong, Alex Y. W. and Qin, Guanyi and Gao, Mingqi and Low, Chang Han and Chan, Lap Yan Lennon and Jin, Yueming},
  journal={arXiv preprint},
  year={2026}
}

📧 Contact

For questions or collaborations, please contact:

Yueming Jin: ymjin@nus.edu.sg
Haofeng Liu: haofeng.liu@u.nus.edu

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UniSurgSAM: A Universal Promptable Model for Reliable Surgical Video Segmentation

🚀 Code Coming Soon!

📖 Overview

Key Features

Architecture Highlights

🎬 Demo Videos

📝 Citation

📧 Contact

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

UniSurgSAM: A Universal Promptable Model for Reliable Surgical Video Segmentation

🚀 Code Coming Soon!

📖 Overview

Key Features

Architecture Highlights

🎬 Demo Videos

📝 Citation

📧 Contact

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages