Skip to content

Comments

feat: support LongCat-Flash-Chat Model#82

Open
VegetaPn wants to merge 6 commits intoISEEKYAN:mainfrom
VegetaPn:release/os
Open

feat: support LongCat-Flash-Chat Model#82
VegetaPn wants to merge 6 commits intoISEEKYAN:mainfrom
VegetaPn:release/os

Conversation

@VegetaPn
Copy link

Add support for the LongCat Flash model to mbridge, including model architecture definition (MoE, MLA, Router, ZeroExpert), Megatron transformer configuration, safetensor IO adaptation, as well as example scripts for loading, inference, and export.

Not including MTP yet.

Changes to existing mbridge code:

  1. mbridge/core/bridge.py

Functional changes:

  • Relaxed MoE expert matching: .mlp.experts.linear_fc → .experts.linear_fc (6 occurrences). Removes the .mlp prefix requirement to support LongCat's MoE structure. Existing models whose paths still contain
    .experts.linear_fc are unaffected.

Non-functional changes:

  • Added tqdm progress bar and timing logs during weight loading
  • Cached model.state_dict() to avoid repeated calls
  1. mbridge/core/safetensor_io.py
  • Added logging throughout save operations
  • Partial file saving: Previously, weights that couldn't form a complete shard file were silently dropped. Now they are saved as partial files with warnings about missing weights.
  • Added file-existence check in merge_tmp_safetensor_files to avoid crashes
  • Used .clone() when reading tensors to prevent reference issues
  1. mbridge/core/util.py
  • Bug fix: Added mpu.set_virtual_pipeline_model_parallel_rank(i) inside the VPP loop — this was previously missing, which could cause incorrect pipeline stage detection.
  1. mbridge/models/init.py
  • Registered LongCatFlashBridge import.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant