Info model: https://github.com/huggingface/transformers/blob/main/src/transformers/models/mixtral/modeling_mixtral.py config: https://github.com/huggingface/transformers/blob/main/src/transformers/models/mixtral/configuration_mixtral.py commit id: 5af7d41e49bbfc8319f462eb45253dcb3863dfb7 Usage How to apply InternEvo patch to support Variable-Length and Intern Sequence Parallel training? patch modeling_mixtral.py internevo.patch