feat: add HuggingFace cache checking to sanity_check.py #3890

keivenchang · 2025-10-24T23:33:05Z

Overview:

Add HuggingFace model cache checking to sanity_check.py for better pre-deployment validation.

Details:

Add HuggingFaceInfo class to check ~/.cache/huggingface/hub
Display model count in default mode, detailed info with --thorough-check
Check HF_TOKEN environment variable status
Update documentation and example output
Remove deprecated deploy/dynamo_check.py

Where should the reviewer start?

HuggingFaceInfo class in deploy/sanity_check.py (lines 1242-1410)

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

/coderabbit profile chill

Summary by CodeRabbit

New Features
- Deployment verification now includes HuggingFace model cache analysis.
- Thorough-check mode displays detailed HuggingFace model cache information and token status.
- Help documentation updated to reflect HuggingFace cache monitoring.

Signed-off-by: Keiven Chang <[email protected]>

coderabbitai · 2025-10-24T23:38:01Z

Walkthrough

The changes add a new HuggingFaceInfo class to analyze and report HuggingFace model cache state as part of the diagnostic system. This class integrates into the SystemInfo diagnostic tree, with optional detailed model enumeration in thorough-check mode. Help text is updated to reflect this addition.

Changes

Cohort / File(s)	Summary
HuggingFace Cache Diagnostics `deploy/sanity_check.py`	Adds `HuggingFaceInfo` class to detect and report cached Hugging Face models under `~/.cache/huggingface/hub`, with optional thorough-check mode for detailed model enumeration and HF_TOKEN indicator. Integrates into `SystemInfo` construction via `add_child()`. Updates help text and intro narrative to reflect HuggingFace cache inspection. Note: Duplicate class definition present in file.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring extra attention:

Duplicate HuggingFaceInfo class definition — Two identical class definitions appear in the same file; verify intent and consolidate or remove one copy
Integration into SystemInfo subtree — Confirm add_child() call is correctly positioned and doesn't introduce circular dependencies
Cache path assumptions — Validate that ~/.cache/huggingface/hub path handling is robust across platforms and edge cases
HF_TOKEN handling — Ensure sensitive token information is appropriately handled and not leaked in output

Poem

🐰 A cache so deep, with models galore,
HuggingFace treasures we now explore!
Though duplicates dance where one should be,
Our diagnostics bloom, for all to see! 🌱

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The PR title "feat: add HuggingFace cache checking to sanity_check.py" directly and clearly describes the primary change in the changeset. According to the raw summary and PR objectives, the main contribution is the introduction of a new HuggingFaceInfo class that adds HuggingFace model cache checking capability to the sanity_check.py file. The title is concise, specific, and uses conventional commit formatting (feat:), making it easy for teammates scanning history to understand the main addition. While the PR also includes the deletion of deploy/dynamo_check.py and documentation updates, these are secondary to the primary feature being added, which the title appropriately captures.
Description Check	✅ Passed	The PR description follows the required template structure with all key sections properly filled out. The Overview section provides a clear summary of the purpose, the Details section outlines specific changes including the new HuggingFaceInfo class, display modes, HF_TOKEN checking, and removal of the deprecated file, and the "Where should the reviewer start?" section helpfully points reviewers to the primary code location (lines 1242-1410). The Related Issues section is present as required, though empty, which is acceptable if no GitHub issues are associated. The description provides sufficient context and specificity for reviewers to understand the changes without being overly verbose.
Docstring Coverage	✅ Passed	Docstring coverage is 81.82% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (4)

deploy/sanity_check.py (4)

16-16: Standardize “Hugging Face” naming and sync help text.

Use “Hugging Face” consistently and align the options text with argparse help.

- - HuggingFace model cache (detailed with --thorough-check)
+ - Hugging Face model cache (detailed with --thorough-check)
@@
-    --thorough-check  Enable thorough checking (file permissions, directory sizes, HuggingFace model details)
+    --thorough-check  Enable thorough checking (file permissions, directory sizes, disk space, Hugging Face model details)

Also applies to: 92-93

1361-1367: Tighten exceptions, add debug logs, and fix unused loop var (ruff).

Narrow broad excepts, add debug logging instead of silent pass, and rename dirnames to _dirnames.

-                        try:
-                            stat_info = os.stat(item_path)
+                        try:
+                            stat_info = os.stat(item_path)
                             # Use the earlier of creation time or modification time
                             download_time = min(stat_info.st_ctime, stat_info.st_mtime)
                             download_date = self._format_timestamp_pdt(download_time)
-                        except Exception:
+                        except OSError as e:
+                            logging.debug("HF cache: stat failed for %s: %s", item_path, e)
                             download_date = "unknown"
@@
-        except Exception:
-            pass
+        except OSError as e:
+            logging.debug("HF cache: listing failed for %s: %s", cache_path, e)
@@
-            for dirpath, dirnames, filenames in os.walk(directory):
+            for dirpath, _dirnames, filenames in os.walk(directory):
                 for filename in filenames:
@@
-        except Exception:
-            pass
+        except OSError as e:
+            logging.debug("HF cache: size scan failed for %s: %s", directory, e)

Based on static analysis hints.

Also applies to: 1370-1377, 1386-1389, 1394-1395

1308-1319: Use model name as the node label for readability.

Makes the tree easier to scan than “Model N”.

-            model_node = NodeInfo(
-                label=f"Model {i+1}",
-                desc=f"{model_name}, downloaded={download_date}, size={size_str}",
-                status=NodeStatus.INFO,
-            )
+            model_node = NodeInfo(
+                label=model_name,
+                desc=f"downloaded={download_date}, size={size_str}",
+                status=NodeStatus.INFO,
+            )

1320-1329: Optional: also show when HF_TOKEN is not set.

If helpful, add a small INFO/WARNING when HF_TOKEN is unset to hint at auth-protected models.

     def _add_hf_token_info(self):
         """Add HF_TOKEN information if the environment variable is set."""
-        if os.environ.get("HF_TOKEN"):
+        if os.environ.get("HF_TOKEN"):
             token_node = NodeInfo(
                 label="HF_TOKEN",
                 desc="<set>",
                 status=NodeStatus.INFO,
             )
             self.add_child(token_node)
+        else:
+            self.add_child(NodeInfo(label="HF_TOKEN", desc="not set", status=NodeStatus.INFO))

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 48b622c and 1e91bb4.

📒 Files selected for processing (2)

deploy/dynamo_check.py (0 hunks)
deploy/sanity_check.py (5 hunks)

💤 Files with no reviewable changes (1)

deploy/dynamo_check.py

🧰 Additional context used

🪛 Ruff (0.14.1)

deploy/sanity_check.py

1365-1365: Do not catch blind exception: Exception

(BLE001)

1372-1372: Do not catch blind exception: Exception

(BLE001)

1376-1377: try-except-pass detected, consider logging the exception

(S110)

1376-1376: Do not catch blind exception: Exception

(BLE001)

1386-1386: Loop control variable dirnames not used within loop body

Rename unused dirnames to _dirnames

(B007)

1394-1395: try-except-pass detected, consider logging the exception

(S110)

1394-1394: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: vllm (arm64)
GitHub Check: operator (arm64)
GitHub Check: operator (amd64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (4)

deploy/sanity_check.py (4)

19-26: Docs update looks good.

Clear rationale for standalone behavior and hard-coded paths.

47-50: Example output changes look fine.

57-57: Example HF cache line looks good.

Will reflect accurate counts after model filtering fix below.

337-339: Integration into SystemInfo is correct.

Runs in non-terse mode; respects --thorough-check.

deploy/sanity_check.py

- Only count models--* directories, excluding datasets--, spaces--, blobs - Gate size calculation on thorough_check flag to keep default mode fast - Add compute_sizes parameter with documentation Signed-off-by: Keiven Chang <[email protected]>

feat: add HuggingFace cache checking to sanity_check.py

1e91bb4

Signed-off-by: Keiven Chang <[email protected]>

keivenchang requested a review from a team as a code owner October 24, 2025 23:33

pull-request-size bot added the size/L label Oct 24, 2025

github-actions bot added the feat label Oct 24, 2025

keivenchang self-assigned this Oct 24, 2025

keivenchang requested review from biswapanda, hhzhang16 and mohammedabdulwahhab October 24, 2025 23:34

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

deploy/sanity_check.py Show resolved Hide resolved

deploy/sanity_check.py Outdated Show resolved Hide resolved

copy-pr-bot bot temporarily deployed to GITLAB October 25, 2025 02:53 Inactive

copy-pr-bot bot deployed to GITLAB October 25, 2025 02:54 Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add HuggingFace cache checking to sanity_check.py #3890

feat: add HuggingFace cache checking to sanity_check.py #3890

keivenchang commented Oct 24, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 24, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: add HuggingFace cache checking to sanity_check.py #3890

Are you sure you want to change the base?

feat: add HuggingFace cache checking to sanity_check.py #3890

Conversation

keivenchang commented Oct 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 24, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

keivenchang commented Oct 24, 2025 •

edited by coderabbitai bot

Loading