InfiniTensor · Saberlilya · Feb 2, 2026 · Feb 2, 2026 · Feb 2, 2026 · Feb 2, 2026
diff --git a/.gitignore b/.gitignore
@@ -87,4 +87,24 @@ htmlcov/
 # Windows
 Thumbs.db
 ehthumbs.db
-desktop.ini
+desktop.ini
+
+# 忽略模型大文件
+models/
+!python/llaisys/models/
+!python/llaisys/models/*.py
+*.safetensors
+*.bin
+*.pth
+
+# 忽略编译垃圾
+build/
+.xmake/
+*.o
+*.so
+/libllaisys/
+bin/
+
+# 忽略 Python 缓存
+__pycache__/
+*.pyc
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@
 <a href="README.md" target="README.md">English</a> ｜
 <a href="README_ZN.md" target="README_ZN.md">中文</a>
 </p>
-
+<!-- trigger ci -->
 ## Introduction
 
 LLAISYS (Let's Learn AI SYStem) is an educational project that aims to provide a platform for new and future AI engineers to learn how to build AI systems from scratch. LLAISYS consists of several assignments, which help students learn and build the basic modules, and projects that challenge them to add more fancy features to their systems. LLAISYS uses C++ as primary programming language for system backend, and is compiled into shared libraries exposing C language APIs. Frontend codes are written in Python which calls these APIs to provide more convenient testing and interaction with other architectures such as PyTorch.
@@ -429,3 +429,71 @@ Introduce Tensor Parallelism to LLAISYS. Shard your model across multiple device
 ## Project #6: Support New Models
 
 Support another model type than the one we use for homework in LLAISYS.
+
+## Chinese Submission Docs
+
+- Overview: [docs/submission_zh.md](docs/submission_zh.md)
+- Report: [docs/report_zh.md](docs/report_zh.md)
+- Reproduce: [docs/reproduce_zh.md](docs/reproduce_zh.md)
+- PR Text: [docs/pr_zh.md](docs/pr_zh.md)
+
+## Current Submission Status For This Fork
+
+This section is appended for course submission and does not change the original assignment description above.
+
+### Scope
+
+- This submission is organized as a complete course delivery covering Assignments 1/2/3 and Projects 1/2/3/6
+- Project #2 uses MetaX/MACA as the second backend
+- Project #6 adds Llama/TinyLlama model support through the shared decoder-only path
+- Only implementation code and formal submission docs are tracked for submission
+
+### Verified Environment
+
+- Local CPU dev environment: Python `3.12.3`, xmake `v3.0.7+20260308`
+- MetaX validation environment:
+
+- GPU: `MetaX C500`
+- `mx-smi`: `2.2.9`
+- `MACA`: `3.2.1.10`
+- Driver: `3.0.11`
+- Compiler: `mxcc 1.0.0`
+- Python: `3.10.10`
+- PyTorch: `2.6.0+metax3.2.1.3`
+
+### Verified Commands
+
+```bash
+## Local CPU path
+xmake f --nv-gpu=n --metax-gpu=n -cv
+xmake -r
+
+python test/test_tensor.py
+python test/test_runtime.py --device cpu
+python test/test_ops.py --device cpu
+python test/test_infer.py --device cpu --test --model models/DeepSeek-R1-Distill-Qwen-1.5B --prompt hi --max_steps 1
+
+## Chat service minimal validation
+PYTHONPATH=python python -m llaisys.chat.server --model models/DeepSeek-R1-Distill-Qwen-1.5B --device cpu --host 127.0.0.1 --port 8011
+curl --noproxy '*' -s http://127.0.0.1:8011/health
+curl --noproxy '*' -s -X POST http://127.0.0.1:8011/v1/chat/completions -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"你好"}],"stream":false,"max_tokens":8}'
+
+## New model validation entry
+python test/test_infer.py --device cpu --test --model /path/to/local/llama_or_tinyllama_model --prompt hi --max_steps 1
+
+## MetaX path
+XMAKE_ROOT=y xmake f --metax-gpu=y -cv
+XMAKE_ROOT=y xmake -r
+XMAKE_ROOT=y xmake install
+
+python test/test_runtime.py --device metax
+python test/test_ops.py --device metax
+python test/test_infer.py --device metax --test --model_id trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 --prompt hi --max_steps 1
+```
+
+### Notes
+
+- This section combines verified local CPU commands and verified MetaX commands.
+- MetaX is not a C++-level CUDA drop-in platform, so the backend is adapted separately.
+- Hugging Face verification still uses `torch.cuda` semantics because the local MetaX PyTorch build exposes CUDA-compatible device APIs.
+- The external MetaX PDF in the repo root is intentionally kept untracked and is not part of the git submission.
diff --git a/README_ZN.md b/README_ZN.md
@@ -430,3 +430,70 @@ python test/test_infer.py --model [dir_path/to/model] --test --device nvidia
 ## 项目#6：支持新模型
 
 在 LLAISYS 中支持除作业所用模型以外的其他模型。
+
+## 当前仓库提交说明
+
+这一节是为当前 fork 的作业提交补充的，不改变上面原始作业描述。
+
+### 当前提交范围
+
+- 当前提交按完整课程交付组织，覆盖作业 1/2/3 与项目 1/2/3/6
+- 项目 2 的第二平台为 MetaX/MACA
+- 项目 6 提供 `Llama/TinyLlama` 新模型支持路径
+- 提交中只保留实现代码与正式提交文档
+
+### 当前验证环境
+
+- 本地 CPU 开发环境：Python `3.12.3`，xmake `v3.0.7+20260308`
+- 沐曦 MetaX 环境如下：
+
+- GPU：`MetaX C500`
+- `mx-smi`：`2.2.9`
+- `MACA`：`3.2.1.10`
+- 驱动：`3.0.11`
+- 编译器：`mxcc 1.0.0`
+- Python：`3.10.10`
+- PyTorch：`2.6.0+metax3.2.1.3`
+
+### 当前已验证命令
+
+```bash
+## 本地 CPU 路径
+xmake f --nv-gpu=n --metax-gpu=n -cv
+xmake -r
+
+python test/test_tensor.py
+python test/test_runtime.py --device cpu
+python test/test_ops.py --device cpu
+python test/test_infer.py --device cpu --test --model models/DeepSeek-R1-Distill-Qwen-1.5B --prompt hi --max_steps 1
+
+## 聊天服务最小验证
+PYTHONPATH=python python -m llaisys.chat.server --model models/DeepSeek-R1-Distill-Qwen-1.5B --device cpu --host 127.0.0.1 --port 8011
+curl --noproxy '*' -s http://127.0.0.1:8011/health
+curl --noproxy '*' -s -X POST http://127.0.0.1:8011/v1/chat/completions -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"你好"}],"stream":false,"max_tokens":8}'
+
+## 新模型验证入口
+python test/test_infer.py --device cpu --test --model /path/to/local/llama_or_tinyllama_model --prompt hi --max_steps 1
+
+## 沐曦 MetaX 路径
+XMAKE_ROOT=y xmake f --metax-gpu=y -cv
+XMAKE_ROOT=y xmake -r
+XMAKE_ROOT=y xmake install
+
+python test/test_runtime.py --device metax
+python test/test_ops.py --device metax
+python test/test_infer.py --device metax --test --model_id trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 --prompt hi --max_steps 1
+```
+
+### 提交材料入口
+
+- 总览：[`docs/submission_zh.md`](docs/submission_zh.md)
+- 报告：[`docs/report_zh.md`](docs/report_zh.md)
+- 复现：[`docs/reproduce_zh.md`](docs/reproduce_zh.md)
+- PR 文案：[`docs/pr_zh.md`](docs/pr_zh.md)
+
+### 说明
+
+- 这里合并列出本地 CPU 路径与沐曦 MetaX 路径的已验证命令
+- MetaX 在 C++ SDK 层不是 CUDA drop-in 兼容，因此后端必须单独适配
+- PyTorch 层保留了 `torch.cuda` 语义，因此 Hugging Face 对照测试仍复用 CUDA 命名空间
diff --git a/docs/pr_zh.md b/docs/pr_zh.md
@@ -0,0 +1,85 @@
+# GitHub PR 文案
+
+## 标题
+
+`feat: complete LLAISYS assignments 1 2 3 and projects 1 2 3 6`
+
+## 正文
+
+本 PR 完成 LLAISYS 的以下课程内容，并补齐中文提交文档：
+
+- Assignment #1：Tensor
+- Assignment #2：Operators
+- Assignment #3：Large Language Model Inference
+- Project #1：CPU 优化
+- Project #2：第二平台 MetaX/MACA
+- Project #3：聊天服务
+- Project #6：支持新模型
+
+### 主要改动
+
+- 完成 Tensor 基础能力，包括 `load`、`isContiguous`、`view`、`permute`、`slice`
+- 完成 CPU 侧关键算子：`argmax`、`embedding`、`linear`、`rms_norm`、`rope`、`self_attention`、`swiglu`
+- 完成 Qwen2 推理链路、权重装载与 token 级对照验证
+- 基于 OpenMP 完成 CPU 热点算子优化
+- 新增独立 `METAX` 设备类型与 `--metax-gpu=y` 构建开关
+- 完成 MetaX/MACA runtime 与关键算子路径接入，`linear` 对接 `mcblasGemmEx`
+- 实现聊天服务与流式返回接口
+- 新增 `Llama/TinyLlama` 路径的 C++/Python 包装与基于 `config.json` 的模型类型自动分发
+- 补齐提交总览、实现报告与复现流程
+- 本 PR 只包含实现代码与正式提交文档，本地学习材料与外部 PDF 未纳入提交
+
+### 已验证命令
+
+本地 CPU 路径：
+
+```bash
+xmake f --nv-gpu=n --metax-gpu=n -cv
+xmake -r
+
+python test/test_tensor.py
+python test/test_runtime.py --device cpu
+python test/test_ops.py --device cpu
+python test/test_infer.py --device cpu --test --model models/DeepSeek-R1-Distill-Qwen-1.5B --prompt hi --max_steps 1
+```
+
+聊天服务最小验证：
+
+```bash
+PYTHONPATH=python python -m llaisys.chat.server --model models/DeepSeek-R1-Distill-Qwen-1.5B --device cpu --host 127.0.0.1 --port 8011
+curl --noproxy '*' -s http://127.0.0.1:8011/health
+curl --noproxy '*' -s -X POST http://127.0.0.1:8011/v1/chat/completions -H 'Content-Type: application/json' -d '{"messages":[{"role":"user","content":"你好"}],"stream":false,"max_tokens":8}'
+```
+
+新模型验证入口：
+
+```bash
+python test/test_infer.py --device cpu --test --model /path/to/local/llama_or_tinyllama_model --prompt hi --max_steps 1
+```
+
+MetaX 路径：
+
+```bash
+XMAKE_ROOT=y xmake f --metax-gpu=y -cv
+XMAKE_ROOT=y xmake -r
+XMAKE_ROOT=y xmake install
+
+python test/test_runtime.py --device metax
+python test/test_ops.py --device metax
+python test/test_infer.py --device metax --test --model_id trl-internal-testing/tiny-Qwen2ForCausalLM-2.5 --prompt hi --max_steps 1
+```
+
+### 说明
+
+- Assignment #1/#2/#3 与 Project #1/#3/#6 主要在本地 CPU 环境完成验证
+- Project #2 在真实沐曦 `MetaX C500` 机器上完成实机验证
+- MetaX 在 C/C++ SDK 层不是 CUDA drop-in 兼容平台，因此后端采用独立适配
+- 当前推理验证以 `Qwen2` 为主；Project #6 提供 `Llama/TinyLlama` 新模型接入与本地模型目录验证入口
+- 当前机器没有 NVIDIA 硬件，因此本次没有新增 `--device nvidia` 的实机回归数据
+- 根目录外部 PDF 保持未跟踪状态，不提交进仓库
+
+### 提交文档
+
+- 提交总览：[`submission_zh.md`](submission_zh.md)
+- 实现报告：[`report_zh.md`](report_zh.md)
+- 复现流程：[`reproduce_zh.md`](reproduce_zh.md)