-
Notifications
You must be signed in to change notification settings - Fork 655
feat: add HuggingFace cache checking to sanity_check.py #3890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Keiven Chang <[email protected]>
WalkthroughThe changes add a new Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Areas requiring extra attention:
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
deploy/sanity_check.py (4)
16-16: Standardize “Hugging Face” naming and sync help text.Use “Hugging Face” consistently and align the options text with argparse help.
- - HuggingFace model cache (detailed with --thorough-check) + - Hugging Face model cache (detailed with --thorough-check) @@ - --thorough-check Enable thorough checking (file permissions, directory sizes, HuggingFace model details) + --thorough-check Enable thorough checking (file permissions, directory sizes, disk space, Hugging Face model details)Also applies to: 92-93
1361-1367: Tighten exceptions, add debug logs, and fix unused loop var (ruff).Narrow broad excepts, add debug logging instead of silent pass, and rename dirnames to _dirnames.
- try: - stat_info = os.stat(item_path) + try: + stat_info = os.stat(item_path) # Use the earlier of creation time or modification time download_time = min(stat_info.st_ctime, stat_info.st_mtime) download_date = self._format_timestamp_pdt(download_time) - except Exception: + except OSError as e: + logging.debug("HF cache: stat failed for %s: %s", item_path, e) download_date = "unknown" @@ - except Exception: - pass + except OSError as e: + logging.debug("HF cache: listing failed for %s: %s", cache_path, e) @@ - for dirpath, dirnames, filenames in os.walk(directory): + for dirpath, _dirnames, filenames in os.walk(directory): for filename in filenames: @@ - except Exception: - pass + except OSError as e: + logging.debug("HF cache: size scan failed for %s: %s", directory, e)Based on static analysis hints.
Also applies to: 1370-1377, 1386-1389, 1394-1395
1308-1319: Use model name as the node label for readability.Makes the tree easier to scan than “Model N”.
- model_node = NodeInfo( - label=f"Model {i+1}", - desc=f"{model_name}, downloaded={download_date}, size={size_str}", - status=NodeStatus.INFO, - ) + model_node = NodeInfo( + label=model_name, + desc=f"downloaded={download_date}, size={size_str}", + status=NodeStatus.INFO, + )
1320-1329: Optional: also show when HF_TOKEN is not set.If helpful, add a small INFO/WARNING when HF_TOKEN is unset to hint at auth-protected models.
def _add_hf_token_info(self): """Add HF_TOKEN information if the environment variable is set.""" - if os.environ.get("HF_TOKEN"): + if os.environ.get("HF_TOKEN"): token_node = NodeInfo( label="HF_TOKEN", desc="<set>", status=NodeStatus.INFO, ) self.add_child(token_node) + else: + self.add_child(NodeInfo(label="HF_TOKEN", desc="not set", status=NodeStatus.INFO))
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deploy/dynamo_check.py(0 hunks)deploy/sanity_check.py(5 hunks)
💤 Files with no reviewable changes (1)
- deploy/dynamo_check.py
🧰 Additional context used
🪛 Ruff (0.14.1)
deploy/sanity_check.py
1365-1365: Do not catch blind exception: Exception
(BLE001)
1372-1372: Do not catch blind exception: Exception
(BLE001)
1376-1377: try-except-pass detected, consider logging the exception
(S110)
1376-1376: Do not catch blind exception: Exception
(BLE001)
1386-1386: Loop control variable dirnames not used within loop body
Rename unused dirnames to _dirnames
(B007)
1394-1395: try-except-pass detected, consider logging the exception
(S110)
1394-1394: Do not catch blind exception: Exception
(BLE001)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: vllm (arm64)
- GitHub Check: operator (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (4)
deploy/sanity_check.py (4)
19-26: Docs update looks good.Clear rationale for standalone behavior and hard-coded paths.
47-50: Example output changes look fine.
57-57: Example HF cache line looks good.Will reflect accurate counts after model filtering fix below.
337-339: Integration into SystemInfo is correct.Runs in non-terse mode; respects --thorough-check.
- Only count models--* directories, excluding datasets--, spaces--, blobs - Gate size calculation on thorough_check flag to keep default mode fast - Add compute_sizes parameter with documentation Signed-off-by: Keiven Chang <[email protected]>
Overview:
Add HuggingFace model cache checking to sanity_check.py for better pre-deployment validation.
Details:
HuggingFaceInfoclass to check~/.cache/huggingface/hub--thorough-checkHF_TOKENenvironment variable statusdeploy/dynamo_check.pyWhere should the reviewer start?
HuggingFaceInfoclass indeploy/sanity_check.py(lines 1242-1410)Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
/coderabbit profile chill
Summary by CodeRabbit