SkyRL Megatron Development Tracker H1 2026

This issue tracks the development of the `megatron` backend for SkyRL for large scale MoE training.

Megatron-Core MoE tech report: https://arxiv.org/pdf/2603.07685
Prior tracking issue: https://github.com/NovaSky-AI/SkyRL/issues/203 

## New Features
- [x] [P0] Support R3  - Rollout Routing Replay (https://github.com/NovaSky-AI/SkyRL/issues/815)
- [ ] [P2] Megatron dynamic context parallel support (https://github.com/NovaSky-AI/SkyRL/issues/1019)
- [ ] [P2] Enabling `TransformerConfig.cp_comm_type="a2a"` for ulysses style sequence parallelism in megatron @devpatelio 
- [ ] [P2] Support Virtual Pipeline Parallel for improving training throughput
- [ ] [P2] Support megatron native FSDP
- [ ] [P1] Enable dynamic batch sizing in megatron

## Megatron + LoRA
- [ ] [P1] Support in memory LoRA only weight sync for megatron + lora (https://github.com/NovaSky-AI/SkyRL/issues/1336)
- [ ] [P1] LoRA rank normalization via Megatron-Bridge (https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues/2964)
- [ ] [P0] Improve LoRA checkpointing support and fix/note upstream bugs (https://github.com/NVIDIA-NeMo/Megatron-Bridge/issues/2240)

## Bugs/Improvements
- [ ] [P0] Add a DAPO recipe with MoE + megatron to CI: https://github.com/NovaSky-AI/SkyRL/issues/1322
- [ ] [P1] Debug GLM + LoRA train/infer mismatch issue

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SkyRL Megatron Development Tracker H1 2026 #1392

New Features

Megatron + LoRA

Bugs/Improvements

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SkyRL Megatron Development Tracker H1 2026 #1392

Description

New Features

Megatron + LoRA

Bugs/Improvements

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions