-
Notifications
You must be signed in to change notification settings - Fork 655
fix: consistent model recipes and update simplified doc #3858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughComprehensive documentation rewrite for recipes with expanded guides, prerequisites, and deployment instructions. New Kubernetes Job manifest added for automated model caching. Script updated to dynamically reference job names instead of using hardcoded values. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes The changes are heterogeneous in nature (documentation, infrastructure configuration, and script logic), but each component involves straightforward modifications without complex reasoning requirements. Documentation rewrites lack logical density, YAML configuration is standard Kubernetes manifest structure, and the script change is a direct refactor from hardcoded to dynamic naming. Poem
Pre-merge checks❌ Failed checks (2 warnings, 1 inconclusive)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (3)
recipes/deepseek-r1/model-cache/model-download.yaml (3)
19-19: Use pinned container image digest for reproducibility.The image
python:3.10-slimuses a mutable tag, so different runs could pull different versions. Use a specific version tag or SHA256 digest to ensure consistent builds and easier debugging.Example:
- image: python:3.10-slim + image: python:3.10-slim@sha256:YOUR_DIGEST_HERE
15-40: Add security context to restrict privilege escalation and enforce least-privilege.The container currently runs as root without privilege escalation restrictions. Add a securityContext to align with Kubernetes security best practices:
spec: restartPolicy: Never + securityContext: + runAsNonRoot: true + runAsUser: 1000 + fsGroup: 1000 containers: - name: model-download image: python:3.10-slim command: ["sh", "-c"] + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALLNote: If the model download requires root access, document why and accept the security trade-off explicitly.
17-37: Add resource requests and limits to prevent unbounded cluster resource consumption.The container lacks CPU/memory resource constraints, which can destabilize the cluster if the download is large or the pip install is resource-intensive.
containers: - name: model-download image: python:3.10-slim + resources: + requests: + cpu: "2" + memory: "4Gi" + limits: + cpu: "4" + memory: "8Gi" command: ["sh", "-c"]Adjust values based on your model size and cluster capacity.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
recipes/README.md(1 hunks)recipes/deepseek-r1/model-cache/model-download.yaml(1 hunks)recipes/run.sh(1 hunks)
🧰 Additional context used
🪛 Checkov (3.2.334)
recipes/deepseek-r1/model-cache/model-download.yaml
[medium] 3-44: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[medium] 3-44: Minimize the admission of root containers
(CKV_K8S_23)
🪛 markdownlint-cli2 (0.18.1)
recipes/README.md
320-320: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
- GitHub Check: trtllm (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: operator (arm64)
- GitHub Check: vllm (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: Build and Test - dynamo
|
Could you add some detail to the title? |
|
Should we also put the model-cache and model-download in the same yaml so they get executed at once instead of two commands? |
| kubectl apply -f hf_hub_secret/hf_hub_secret.yaml -n ${NAMESPACE} | ||
|
|
||
| # Deploy model with automatic download and benchmarking | ||
| ./run.sh --model llama-3-70b --framework vllm agg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel agg should also be a flag. It should be clear you have to provide 3 args - --model, --framework, --deployment or something like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added --deployment
| ./run.sh --model llama-3-70b --framework vllm agg | ||
|
|
||
| # Or skip model download if model has been already downloaded to model cache PVC | ||
| ./run.sh --skip-model-cache --model llama-3-70b --framework vllm agg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the model download technically be a no-op? Is there harm in having it run if it will just exit successfully if it finds the model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, removed the slip-model-cache
| kubectl delete namespace $NAMESPACE | ||
| ``` | ||
|
|
||
| ## Contributing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm glad we added a contributing. But its 500 lines down at the bottom. Maybe its own CONTRIBUTING.md
| kubectl exec -n $NAMESPACE -it $(kubectl get pods -n $NAMESPACE -l job-name=$PERF_JOB_NAME -o jsonpath='{.items[0].metadata.name}') -- ls -la /model-cache/perf/ | ||
| ``` | ||
|
|
||
| ## Model-Specific Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just add these model specific examples as ReadMe.md under the model folder itself. This markdown is too long now.
| # Replace "your-storage-class-name" with your actual storage class | ||
|
|
||
| ## Model Download |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be step 1 in manual model deployment?
| kubectl apply -n $NAMESPACE -f <model>/<framework>/<mode>/perf.yaml | ||
| ``` | ||
|
|
||
| ## Prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could just be a link to somewhere with a good explanation? Do we need this extra text here? Someone can just go to the link and make sure its setup. Maybe you could have a command or two to check that thing are setup properly but don't think we should have an entire explanation here. It takes up too much valuable readme space.
| ./run.sh --skip-model-cache --model llama-3-70b --framework vllm agg | ||
| ``` | ||
|
|
||
| ### Option 2: Manual Deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this manual deployment. Short and sweet. Do we need the other ones with all the extra information? Maybe that can be a different markdown? These are basically just duplicating info so we should just keep one or the other.
See
Manual Model Deployment
|
Main feedback is the Readme needs to be much more concise. There is a lot of extras in there I think you can just remove. |
Overview:
Details:
closes nvbug: https://nvbugspro.nvidia.com/bug/5609103
closes: DYN-1338
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Documentation
New Features
Bug Fixes