Skip to content

Development Roadmap (2025 H2) #460

@chenghuaWang

Description

@chenghuaWang

Here is the development roadmap for H2 2025. We will pin this roadmap in Issues, and most of our subsequent work will be updated in this roadmap within Issues. In MLLM's documentation, we will archive each version of the roadmap and provide some outlooks. Contributions and feedback are welcome.

We plan to release a major MLLM version every year. The version for H2 2025 will be 2.0.0, and the main updates to be implemented in this version can be found in the Focus section.

Focus

  • Refactoring from mllm-v1: Implement a more streamlined project structure; introduce a simple and user-friendly eager mode; provide MLLM static graph IR
  • Support for more backends: P0-CANN, and P1-CUDA/AMD NPU
  • Experimental attempt: Compilation from MLLM static graph IR to NPU backend
  • Provide user-friendly components such as pymllm, mllm-cli, and MllmCSdk to expand the adoption of the MLLM project
  • Enhance the benchmarking system with a focus on optimizing Arm Kernels

Engine

Model coverage

Kernels

Backends

Performance

Quantization

  • kai: Quantize on any machine, packing on ARM (make mllm-convertor --pipeline xxx_kai_pipeline available on any devices).
  • GGUF: GGUF Q4_K and Q6_K Quantization method on .mllm file. @HayzelHan

Compile

KV Cache Management

Pymllm

  • ✔️ Waiting: Awaiting PyPI's approval of our organization's application. pymllm now is available on MacOS(pip install pymllm).

Production Stack

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions