Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
项目详情:Llaisys
1. 项目概览
主要实现了项目3:构建了一个能与单用户实时对话的聊天机器人, 实现随机采样,支持 Temperature、Top-K、Top-P,支持流式传输和会话管理(编辑对话、重新回答、历史对话),支持page attention和前缀匹配kv cache复用。额外做了cuda算子。能用nvidia推理。
2. 项目结构
项目文件结构清晰,主要分为核心 C++ 实现、Python 绑定和测试用例三部分。
3. 复现步骤
3.1. 环境准备
在开始之前,请确保您的系统满足以下要求:
安装依赖示例 (Ubuntu)
3.2. 编译项目
项目使用
xmake进行构建管理。3.2.1 配置构建
CPU 版本 (默认):
NVIDIA GPU 版本:
如果您的机器有 NVIDIA 显卡并安装了 CUDA,可以开启 GPU 支持:
3.2.2 编译
执行以下命令进行编译。编译完成后,生成的共享库会自动复制到 Python 包目录 (
python/llaisys/libllaisys/)。3.3. 运行项目
项目包含一个兼容 OpenAI API 的聊天服务器和一个简单的 Web UI。
3.3.1 准备模型
您需要下载 Qwen2 模型权重(如 Qwen2-0.5B-Instruct 或其他尺寸)。
假设模型下载到了
/path/to/Qwen2-0.5B-Instruct。3.3.2 启动服务器
设置
PYTHONPATH以便 Python 能找到llaisys包,然后启动服务器。3.3.3 访问服务
服务器启动后,您可以:
http://localhost:8000/进行对话。http://localhost:8000/v1/chat/completions。API 测试示例: