-
Notifications
You must be signed in to change notification settings - Fork 45
Doc 1144 #1737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…hots - Add step-by-step screenshots for cluster creation workflow - Add cluster architecture section with InfiniBand and NCCL explanations - Add file share integration section with persistence details - Add network settings table with use cases - Add firewall settings section (conditional) - Add API automation section - Add verification commands (nvidia-smi, file share mount check) - Fix terminology explanations (InfiniBand, NCCL, DDP, Slurm) - Fix style guide compliance (impersonal voice, present tense) - Move screenshots to proper directory structure
- Add create-a-bare-metal-gpu-cluster to GPU cloud sidebar - Remove old screenshots from incorrect location (images/edge-ai/) - Screenshots moved to proper structure (images/docs/edge-ai/ai-infrastructure/)
- Create dedicated article for Spot Bare Metal GPU clusters - Explain reclamation process (24-hour notice, email notification) - Document data preservation (file shares, object storage not affected) - Add best practices for checkpointing and interruption handling - Add screenshots for Spot selector and warning dialog - Add to GPU cloud sidebar
- Create dedicated article for post-creation cluster management - Document cluster details page navigation - Add resize operations (scale up, scale down, delete specific node) - Add power actions (individual and bulk) - Document network interface management - Add console access instructions - Document tags and user actions log - Add cluster deletion with warnings - List current limitations - Add to GPU cloud sidebar
- Add Spot vs On-demand comparison table - Explain capacity source difference (dedicated vs unused capacity) - Expand reclamation process with warning block - Add detailed terms from UI warning - Enhance best practices with specific recommendations - Add checkpoint interval guidance (1-4 hours) - Improve workload suitability guidance
- Add data deletion timeline table (immediate for worker nodes, 48h for volumes) - Clarify that data deletion happens immediately upon suspension, not after 24h - Add billing details (per minute, aggregated hourly, entire node) - Restructure data preservation section with actionable strategies - Add pre-reclamation transfer recommendation
- Add explicit note that only one email is sent (no follow-up reminders) - Clarify that deletion happens without additional warnings after 24h
- Replace 'suspension' with 'deletion' per SME interview - Spot reclamation is direct deletion, not account suspension - Simplify data deletion table (remove 48h volumes row - not applicable) - Clarify that local NVMe is erased as part of deletion process
- Add minimum balance requirement (500 EUR/USD for card payments) - Add bank transfer option with Sales contact - Link to GPU Cloud billing page
- Clarify what appears in cluster type selector - Describe warning banner visual appearance (yellow) - Specify that flavor card shows hourly and monthly rates
- Add GPU cluster type selector screenshot - Add Spot selected with warning banner screenshot - Add Spot flavor card with pricing screenshot - Add cluster capacity section overview screenshot - Replace old screenshots with higher quality versions - Improve UI-to-text correlation throughout article - Add 'Out of Stock' explanation in Availability section
- Remove spot-selected-with-warning.png (duplicated selector + warning) - Remove cluster-capacity-section.png (duplicated selector view) - Keep only 3 unique screenshots: selector, warning banner, price - Update article to remove reference to deleted screenshot
- Add step-region.png showing region selector - Add step-gpu-cluster-type.png showing Spot selected with warning - Update Availability section with more informative screenshot - Add region screenshot to Creating section step 3 - Remove duplicate gpu-cluster-type-selector.png
… screenshots with unique ones - Remove redundant content and repetitions - Simplify text structure, remove bullet pseudo-headers - Fix broken link to object storage - Remove Best practices section
…hots and style improvements
# Conflicts: # docs.json
# Conflicts: # edge-ai/ai-infrastructure/create-a-bare-metal-gpu-cluster.mdx
# Conflicts: # edge-ai/ai-infrastructure/create-a-bare-metal-gpu-cluster.mdx
…ate-an-ai-cluster.mdx
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR replaces the outdated "Create an AI cluster" article with three focused articles covering the complete GPU cluster lifecycle:
Changes by article
DOC-1145: Create a Bare Metal GPU cluster
DOC-944: Spot Bare Metal GPU
DOC-1146: Manage a Bare Metal GPU cluster
Other changes
docs.jsonsidebar with new article entries under GPU cloud groupgetting-started.mdxandconfigure-file-shares.mdxfor consistencyStats