feat: vLLM도입 (close #7) #10

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

sunnyanna0 merged 1 commit into main from feature/#4-자체모델

Jul 13, 2025

Member

sunnyanna0 commented Jul 13, 2025

⭐Key Changes

vLLM GCP서버에 설치 후, vLLM 적용하였습니다.

🖼️ 테스트 결과

추론 지연 시간 감소: 첫 토큰 응답 시간 기준 평균 6.2초 → 1.4초, 최대 ~77% 개선
GPU 메모리 절감: paged-attention 구조를 활용해 메모리 사용량 최대 20% 절약
멀티유저 처리 가능성 증가: 동시에 여러 요청을 처리할 수 있어 확장성 확보
뉴스 재구성 서비스 안정화: 길고 복잡한 뉴스 요약 및 변환 시, 일관되고 빠른 응답 제공

📌 issue

close ✨ [Feature] - vLLM 기반 LLM 추론 구조로 전환 #7


          feat: vLLM도입 (close #7)

58a050f

sunnyanna0 self-assigned this

sunnyanna0 merged commit e980e88 into main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet