Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
a9a366e
finish hw1 2 hw3 waiting
Saberlilya Feb 2, 2026
a2cf465
Assignment #3: Qwen2 inference with KV cache
Saberlilya Feb 2, 2026
7d5b027
trigger github actions
Saberlilya Feb 2, 2026
67f8f53
Assignment #3: add qwen2 model implementation
Saberlilya Feb 2, 2026
f658111
fix: add libllaisys qwen2 ctypes bindings
Saberlilya Feb 2, 2026
44cf45b
fix: track python libllaisys and add qwen2 ctypes bindings
Saberlilya Feb 2, 2026
42514a8
fix: windows build (no exceptions across C boundary)
Saberlilya Feb 2, 2026
6f77d90
fix: windows build warnings in tensor loops
Saberlilya Feb 2, 2026
5e7dd8d
docs: update comments for better readability
Saberlilya Feb 4, 2026
6b9d013
docs: update comments for better readability
Saberlilya Feb 4, 2026
915f394
chore: rerun ci
Saberlilya Feb 9, 2026
64a26b0
chore: trigger official CI
Saberlilya Feb 9, 2026
84ac6cd
chore: rerun ci on my repo
Saberlilya Feb 9, 2026
7ec901b
chore: trigger ci (non-md change)
Saberlilya Feb 9, 2026
dae9421
feat: complete llaisys projects 1 2 3 6
Saberlilya Mar 11, 2026
f1522d6
feat: add MetaX/MACA backend and submission docs
Saberlilya Mar 11, 2026
0f385a0
fix: release hf gpu cache before llaisys infer
Saberlilya Mar 11, 2026
d75c940
docs: trim submission materials
Saberlilya Mar 11, 2026
3b12931
Update pr_zh.md
Saberlilya Mar 11, 2026
9cd02ac
Update reproduce_zh.md
Saberlilya Mar 11, 2026
b6e3b17
Update README_ZN.md
Saberlilya Mar 11, 2026
9507553
Update report_zh.md
Saberlilya Mar 11, 2026
46c8a02
fix: track decoder model files and fix ci builds
Saberlilya Mar 14, 2026
702742f
fix: align qwen2 python wrapper with decoder base
Saberlilya Mar 14, 2026
97450c2
docs: clean submission materials before final pr
Saberlilya Mar 14, 2026
d441885
docs: harden reproduce environment checks
Saberlilya Mar 14, 2026
586d107
docs: sync metax submission docs
Saberlilya Mar 14, 2026
d51708d
docs: finalize submission wording
Saberlilya Mar 14, 2026
be32fb3
docs: merge submission docs into full course report
Saberlilya Mar 14, 2026
910dfb2
docs: add project 6 to submission scope
Saberlilya Mar 15, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 21 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -87,4 +87,24 @@ htmlcov/
# Windows
Thumbs.db
ehthumbs.db
desktop.ini
desktop.ini

# 忽略模型大文件
models/
!python/llaisys/models/
!python/llaisys/models/*.py
*.safetensors
*.bin
*.pth

# 忽略编译垃圾
build/
.xmake/
*.o
*.so
/libllaisys/
bin/

# 忽略 Python 缓存
__pycache__/
*.pyc
70 changes: 69 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
<a href="README.md" target="README.md">English</a> |
<a href="README_ZN.md" target="README_ZN.md">中文</a>
</p>

<!-- trigger ci -->
## Introduction

LLAISYS (Let's Learn AI SYStem) is an educational project that aims to provide a platform for new and future AI engineers to learn how to build AI systems from scratch. LLAISYS consists of several assignments, which help students learn and build the basic modules, and projects that challenge them to add more fancy features to their systems. LLAISYS uses C++ as primary programming language for system backend, and is compiled into shared libraries exposing C language APIs. Frontend codes are written in Python which calls these APIs to provide more convenient testing and interaction with other architectures such as PyTorch.
Expand Down Expand Up @@ -429,3 +429,71 @@ Introduce Tensor Parallelism to LLAISYS. Shard your model across multiple device
## Project #6: Support New Models

Support another model type than the one we use for homework in LLAISYS.

## Chinese Submission Docs

- Overview: [docs/submission_zh.md](docs/submission_zh.md)
- Report: [docs/report_zh.md](docs/report_zh.md)
- Reproduce: [docs/reproduce_zh.md](docs/reproduce_zh.md)
- PR Text: [docs/pr_zh.md](docs/pr_zh.md)

## Current Submission Status For This Fork

This section is appended for course submission and does not change the original assignment description above.

### Scope

- This submission is organized as a complete course delivery covering Assignments 1/2/3 and Projects 1/2/3/6
- Project #2 uses MetaX/MACA as the second backend
- Project #6 adds Llama/TinyLlama model support through the shared decoder-only path
- Only implementation code and formal submission docs are tracked for submission

### Verified Environment

- Local CPU dev environment: Python `3.12.3`, xmake `v3.0.7+20260308`
- MetaX validation environment:

- GPU: `MetaX C500`
- `mx-smi`: `2.2.9`
- `MACA`: `3.2.1.10`
- Driver: `3.0.11`
- Compiler: `mxcc 1.0.0`
- Python: `3.10.10`
- PyTorch: `2.6.0+metax3.2.1.3`

### Verified Commands

```bash
## Local CPU path
xmake f --nv-gpu=n --metax-gpu=n -cv
xmake -r

python test/test_tensor.py
python test/test_runtime.py --device cpu
python test/test_ops.py --device cpu
python test/test_infer.py --device cpu --test --model models/DeepSeek-R1-Distill-Qwen-1.5B --prompt hi --max_steps 1

## Chat service minimal validation
PYTHONPATH=python python -m llaisys.chat.server --model models/DeepSeek-R1-Distill-Qwen-1.5B --device cpu --host 127.0.0.1 --port 8011
curl --noproxy '*' -s http://127.0.0.1:8011/health
curl --noproxy '*' -s -X POST http://127.0.0.1:8011/v1/chat/completions -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"你好"}],"stream":false,"max_tokens":8}'

## New model validation entry
python test/test_infer.py --device cpu --test --model /path/to/local/llama_or_tinyllama_model --prompt hi --max_steps 1

## MetaX path
XMAKE_ROOT=y xmake f --metax-gpu=y -cv
XMAKE_ROOT=y xmake -r
XMAKE_ROOT=y xmake install

python test/test_runtime.py --device metax
python test/test_ops.py --device metax
python test/test_infer.py --device metax --test --model_id trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 --prompt hi --max_steps 1
```

### Notes

- This section combines verified local CPU commands and verified MetaX commands.
- MetaX is not a C++-level CUDA drop-in platform, so the backend is adapted separately.
- Hugging Face verification still uses `torch.cuda` semantics because the local MetaX PyTorch build exposes CUDA-compatible device APIs.
- The external MetaX PDF in the repo root is intentionally kept untracked and is not part of the git submission.
67 changes: 67 additions & 0 deletions README_ZN.md
Original file line number Diff line number Diff line change
Expand Up @@ -430,3 +430,70 @@ python test/test_infer.py --model [dir_path/to/model] --test --device nvidia
## 项目#6:支持新模型

在 LLAISYS 中支持除作业所用模型以外的其他模型。

## 当前仓库提交说明

这一节是为当前 fork 的作业提交补充的,不改变上面原始作业描述。

### 当前提交范围

- 当前提交按完整课程交付组织,覆盖作业 1/2/3 与项目 1/2/3/6
- 项目 2 的第二平台为 MetaX/MACA
- 项目 6 提供 `Llama/TinyLlama` 新模型支持路径
- 提交中只保留实现代码与正式提交文档

### 当前验证环境

- 本地 CPU 开发环境:Python `3.12.3`,xmake `v3.0.7+20260308`
- 沐曦 MetaX 环境如下:

- GPU:`MetaX C500`
- `mx-smi`:`2.2.9`
- `MACA`:`3.2.1.10`
- 驱动:`3.0.11`
- 编译器:`mxcc 1.0.0`
- Python:`3.10.10`
- PyTorch:`2.6.0+metax3.2.1.3`

### 当前已验证命令

```bash
## 本地 CPU 路径
xmake f --nv-gpu=n --metax-gpu=n -cv
xmake -r

python test/test_tensor.py
python test/test_runtime.py --device cpu
python test/test_ops.py --device cpu
python test/test_infer.py --device cpu --test --model models/DeepSeek-R1-Distill-Qwen-1.5B --prompt hi --max_steps 1

## 聊天服务最小验证
PYTHONPATH=python python -m llaisys.chat.server --model models/DeepSeek-R1-Distill-Qwen-1.5B --device cpu --host 127.0.0.1 --port 8011
curl --noproxy '*' -s http://127.0.0.1:8011/health
curl --noproxy '*' -s -X POST http://127.0.0.1:8011/v1/chat/completions -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"你好"}],"stream":false,"max_tokens":8}'

## 新模型验证入口
python test/test_infer.py --device cpu --test --model /path/to/local/llama_or_tinyllama_model --prompt hi --max_steps 1

## 沐曦 MetaX 路径
XMAKE_ROOT=y xmake f --metax-gpu=y -cv
XMAKE_ROOT=y xmake -r
XMAKE_ROOT=y xmake install

python test/test_runtime.py --device metax
python test/test_ops.py --device metax
python test/test_infer.py --device metax --test --model_id trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 --prompt hi --max_steps 1
```

### 提交材料入口

- 总览:[`docs/submission_zh.md`](docs/submission_zh.md)
- 报告:[`docs/report_zh.md`](docs/report_zh.md)
- 复现:[`docs/reproduce_zh.md`](docs/reproduce_zh.md)
- PR 文案:[`docs/pr_zh.md`](docs/pr_zh.md)

### 说明

- 这里合并列出本地 CPU 路径与沐曦 MetaX 路径的已验证命令
- MetaX 在 C++ SDK 层不是 CUDA drop-in 兼容,因此后端必须单独适配
- PyTorch 层保留了 `torch.cuda` 语义,因此 Hugging Face 对照测试仍复用 CUDA 命名空间
85 changes: 85 additions & 0 deletions docs/pr_zh.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
# GitHub PR 文案

## 标题

`feat: complete LLAISYS assignments 1 2 3 and projects 1 2 3 6`

## 正文

本 PR 完成 LLAISYS 的以下课程内容,并补齐中文提交文档:

- Assignment #1:Tensor
- Assignment #2:Operators
- Assignment #3:Large Language Model Inference
- Project #1:CPU 优化
- Project #2:第二平台 MetaX/MACA
- Project #3:聊天服务
- Project #6:支持新模型

### 主要改动

- 完成 Tensor 基础能力,包括 `load`、`isContiguous`、`view`、`permute`、`slice`
- 完成 CPU 侧关键算子:`argmax`、`embedding`、`linear`、`rms_norm`、`rope`、`self_attention`、`swiglu`
- 完成 Qwen2 推理链路、权重装载与 token 级对照验证
- 基于 OpenMP 完成 CPU 热点算子优化
- 新增独立 `METAX` 设备类型与 `--metax-gpu=y` 构建开关
- 完成 MetaX/MACA runtime 与关键算子路径接入,`linear` 对接 `mcblasGemmEx`
- 实现聊天服务与流式返回接口
- 新增 `Llama/TinyLlama` 路径的 C++/Python 包装与基于 `config.json` 的模型类型自动分发
- 补齐提交总览、实现报告与复现流程
- 本 PR 只包含实现代码与正式提交文档,本地学习材料与外部 PDF 未纳入提交

### 已验证命令

本地 CPU 路径:

```bash
xmake f --nv-gpu=n --metax-gpu=n -cv
xmake -r

python test/test_tensor.py
python test/test_runtime.py --device cpu
python test/test_ops.py --device cpu
python test/test_infer.py --device cpu --test --model models/DeepSeek-R1-Distill-Qwen-1.5B --prompt hi --max_steps 1
```

聊天服务最小验证:

```bash
PYTHONPATH=python python -m llaisys.chat.server --model models/DeepSeek-R1-Distill-Qwen-1.5B --device cpu --host 127.0.0.1 --port 8011
curl --noproxy '*' -s http://127.0.0.1:8011/health
curl --noproxy '*' -s -X POST http://127.0.0.1:8011/v1/chat/completions -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"你好"}],"stream":false,"max_tokens":8}'
```

新模型验证入口:

```bash
python test/test_infer.py --device cpu --test --model /path/to/local/llama_or_tinyllama_model --prompt hi --max_steps 1
```

MetaX 路径:

```bash
XMAKE_ROOT=y xmake f --metax-gpu=y -cv
XMAKE_ROOT=y xmake -r
XMAKE_ROOT=y xmake install

python test/test_runtime.py --device metax
python test/test_ops.py --device metax
python test/test_infer.py --device metax --test --model_id trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 --prompt hi --max_steps 1
```

### 说明

- Assignment #1/#2/#3 与 Project #1/#3/#6 主要在本地 CPU 环境完成验证
- Project #2 在真实沐曦 `MetaX C500` 机器上完成实机验证
- MetaX 在 C/C++ SDK 层不是 CUDA drop-in 兼容平台,因此后端采用独立适配
- 当前推理验证以 `Qwen2` 为主;Project #6 提供 `Llama/TinyLlama` 新模型接入与本地模型目录验证入口
- 当前机器没有 NVIDIA 硬件,因此本次没有新增 `--device nvidia` 的实机回归数据
- 根目录外部 PDF 保持未跟踪状态,不提交进仓库

### 提交文档

- 提交总览:[`submission_zh.md`](submission_zh.md)
- 实现报告:[`report_zh.md`](report_zh.md)
- 复现流程:[`reproduce_zh.md`](reproduce_zh.md)
Loading