Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
a3a71eb
DOC-1145: Add Bare Metal GPU cluster creation article with screenshots
Jan 15, 2026
6075715
Changed links becuase of new article
Jan 15, 2026
5906f99
DOC-1145: Update create-a-bare-metal-gpu-cluster article with screens…
Jan 15, 2026
7bc3f42
DOC-1145: Add docs.json sidebar entry and cleanup old screenshot paths
Jan 15, 2026
a1c1ab6
DOC-944: Add Spot Bare Metal GPU article
Jan 15, 2026
b6fd81b
DOC-1146: Add Manage a Bare Metal GPU cluster article
Jan 15, 2026
dc51b71
DOC-944: Expand Spot Bare Metal GPU article with detailed information
Jan 15, 2026
a1ec732
DOC-944: Add critical data deletion timeline and billing details
Jan 15, 2026
91f7828
DOC-944: Clarify single notification policy
Jan 15, 2026
c15e9f4
DOC-944: Fix terminology - deletion not suspension
Jan 15, 2026
80ec382
DOC-944: Add payment requirements section
Jan 15, 2026
61798f5
DOC-944: Fix remaining 'suspension' terminology
Jan 15, 2026
edf6dca
DOC-944: Improve UI-to-text correlation
Jan 15, 2026
f034068
DOC-944: Add comprehensive screenshots for Spot article
Jan 15, 2026
fb77952
DOC-944: Remove duplicate screenshots, keep only unique ones
Jan 15, 2026
d87580e
DOC-944: Add distinct screenshots for Spot creation process
Jan 15, 2026
16fd190
docs(edge-ai): update Spot Bare Metal GPU article - Replace duplicate…
Jan 15, 2026
2b29012
docs: update sidebar link to create-a-bare-metal-gpu-cluster
Jan 15, 2026
5855640
docs: sync GPU cloud sidebar with DOC-944 and DOC-1145
Jan 15, 2026
58fc764
docs: sync GPU cloud sidebar with DOC-944 and DOC-1146
Jan 15, 2026
4c64c54
DOC-1146: Update Manage a Bare Metal GPU cluster article with screens…
Jan 16, 2026
105b4a1
Merge branch 'DOC-1145' into DOC-1144
Jan 16, 2026
6f6b702
Merge branch 'DOC-1146' into DOC-1144
Jan 16, 2026
dfc7071
DOC-1149: Add About GPU Cloud article replacing about-our-ai-infrastr…
Jan 16, 2026
4d70d0c
DOC-1146: Streamline Manage a Bare Metal GPU cluster article
Jan 16, 2026
fbe16f8
Merge branch 'DOC-1146' into DOC-1144
Jan 16, 2026
6cb714d
DOC-1145: Streamline Create a Bare Metal GPU cluster article
Jan 16, 2026
50abf6f
Merge branch 'DOC-1145' into DOC-1144
Jan 16, 2026
22385bd
DOC-1145: Integrate Spot GPU link organically, remove Info block
Jan 16, 2026
c33e812
Merge branch 'DOC-1145' into DOC-1144
Jan 16, 2026
2f9b6fb
DOC-1145: Integrate Spot GPU link organically, remove Info block
Jan 16, 2026
f3ae926
Merge branch 'DOC-1145' into DOC-1144
Jan 16, 2026
e7f906b
DOC-1145: Integrate Spot GPU link organically, remove Info block
Jan 16, 2026
9bf4ffe
Merge branch 'DOC-1145' into DOC-1144
Jan 16, 2026
59d0e1f
DOC-1145: Streamline file share integration section
Jan 16, 2026
3f1a73d
Merge branch 'DOC-1145' into DOC-1144
Jan 16, 2026
4a53500
Merge remote-tracking branch 'origin/main' into DOC-1144
Jan 16, 2026
2bd1457
final corrections after proofreading
Jan 17, 2026
4da288f
final corrections after proofreading
Jan 17, 2026
9cc47e4
Final corrections after proofreading
Jan 17, 2026
acab9e2
Polishing
Jan 17, 2026
a0b8a03
Final proofreading
Jan 19, 2026
2e0f2cd
Merge main into DOC-1144, resolve conflict by deleting deprecated cre…
Jan 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions cloud/file-shares/configure-file-shares.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
</Info>

<Tip>
The best practice is to create VAST shares **when** creating [GPU clusters](/edge-ai/ai-infrastructure/create-an-ai-cluster) or **before** provisioning the corresponding [compute resources](/cloud/virtual-instances/types-of-virtual-machines) (such as VMs).
The best practice is to create VAST shares **when** creating [GPU clusters](/edge-ai/ai-infrastructure/create-a-bare-metal-gpu-cluster) or **before** provisioning the corresponding [compute resources](/cloud/virtual-instances/types-of-virtual-machines) (such as VMs).
</Tip>

The creation flow for both types starts the same. Use the steps below and follow the instructions for the selected file share type.
Expand All @@ -42,7 +42,7 @@
Manually change the OS settings' existing Bare Metal network interface.
</Info>

## Configure file shares for Linux VMs and Bare Metal

Check warning on line 45 in cloud/file-shares/configure-file-shares.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

cloud/file-shares/configure-file-shares.mdx#L45

Did you really mean 'VMs'?

This section describes creating and connecting a standard NFS-based file share using a private network. It can be used with Linux virtual machines or bare-metal servers.

Expand All @@ -56,9 +56,9 @@

2. In the **Basic settings** panel, enter _File Share name_, specify _Size_, and select **Standard** as the _File Share type_.

![File Shares Standard 4 Pn](/images/file-shares-standard-4.png)

Check warning on line 59 in cloud/file-shares/configure-file-shares.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

cloud/file-shares/configure-file-shares.mdx#L59

Did you really mean 'Pn'?

3. In the **Network settings** panel, select the private _Network_ and _Subnetwork_ to use for the file share.

Check warning on line 61 in cloud/file-shares/configure-file-shares.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

cloud/file-shares/configure-file-shares.mdx#L61

Did you really mean 'Subnetwork'?
4. In the **Access** panel, click **Add rule** and specify the IP addresses of the resources that should have access to the file share, and their access modes.
5. Set **Additional options**, if required.

Expand Down Expand Up @@ -107,7 +107,7 @@
</Info>

<Tip>
The best practice is to create VAST shares **when** creating [GPU clusters](/edge-ai/ai-infrastructure/create-an-ai-cluster) or **before** provisioning the corresponding [compute resources](/cloud/virtual-instances/types-of-virtual-machines) (such as VMs).
The best practice is to create VAST shares **when** creating [GPU clusters](/edge-ai/ai-infrastructure/create-a-bare-metal-gpu-cluster) or **before** provisioning the corresponding [compute resources](/cloud/virtual-instances/types-of-virtual-machines) (such as VMs).
</Tip>

### Step 1. Create a VAST share
Expand All @@ -116,7 +116,7 @@
2. In the **Basic settings** panel, enter _File Share name_, specify _Size_, and select **VAST** as the _File Share type_.
3. Set **Additional options**, if required.

![File Share Vast Pn](/images/file-share-vast.png)

Check warning on line 119 in cloud/file-shares/configure-file-shares.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

cloud/file-shares/configure-file-shares.mdx#L119

Did you really mean 'Pn'?

When VAST share type is selected, controls in **Network settings** and **Access** panels are disabled, and the network is assigned automatically.

Expand All @@ -139,19 +139,19 @@

The VAST network only becomes available after the file share has been created. It is a third, dedicated network, separate and distinct from public and private networks.

![File Share Details Pn](/images/file-share-details.png)

Check warning on line 142 in cloud/file-shares/configure-file-shares.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

cloud/file-shares/configure-file-shares.mdx#L142

Did you really mean 'Pn'?

#### Adding VAST interface to an existing compute resource

While the VAST interface can be attached to an already-provisioned GPU cluster or compute resource, this requires additional manual network configuration and is not the standard workflow.

<Tip>
The best practice is to create VAST shares **when** creating [GPU clusters](/edge-ai/ai-infrastructure/create-an-ai-cluster) or **before** provisioning the corresponding [compute resources](/cloud/virtual-instances/types-of-virtual-machines) (such as VMs).
The best practice is to create VAST shares **when** creating [GPU clusters](/edge-ai/ai-infrastructure/create-a-bare-metal-gpu-cluster) or **before** provisioning the corresponding [compute resources](/cloud/virtual-instances/types-of-virtual-machines) (such as VMs).
</Tip>

**Attach VAST network interface**

1. Go to server **Resource** settings ([VM](/cloud/virtual-instances/create-an-instance), [Bare Metal](/cloud/bare-metal-servers/create-a-bare-metal-server), or [GPU cluster](/edge-ai/ai-infrastructure/create-an-ai-cluster)).
1. Go to server **Resource** settings ([VM](/cloud/virtual-instances/create-an-instance), [Bare Metal](/cloud/bare-metal-servers/create-a-bare-metal-server), or [GPU cluster](/edge-ai/ai-infrastructure/create-a-bare-metal-gpu-cluster)).
2. Select the **Networking** tab and click **Add interface**.
3. Click the **Network** drop-down and select the **VAST network**, then click **Add**.
4. Once the interface is added, **note the following details** for use in subsequent steps:
Expand All @@ -172,7 +172,7 @@
3. Note the **interface name** (such as `enp8s0` or `enp4s0`) for use in the following steps.
4. Configure the interface as described below for the relevant instance type.

##### **Configuring VAST interface for VMs and GPU clusters**

Check warning on line 175 in cloud/file-shares/configure-file-shares.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

cloud/file-shares/configure-file-shares.mdx#L175

Did you really mean 'VMs'?

Replace `198.51.100.25` with the IP address from Step 2. Replace `enp8s0` with the interface name from above, if different.

Expand Down Expand Up @@ -313,7 +313,7 @@
Always use NFS version 3 (vers=3) when mounting VAST file shares. If the system does not support the `nconnect` option, install the [VAST Enhanced NFS Client](https://vastnfs.vastdata.com/docs/4.0/download.html).
</Info>

![File Share Mount Pn](/images/file-share-mount.png)

Check warning on line 316 in cloud/file-shares/configure-file-shares.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

cloud/file-shares/configure-file-shares.mdx#L316

Did you really mean 'Pn'?

## Resizing file shares

Expand Down
16 changes: 11 additions & 5 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -642,8 +642,10 @@
{
"group": "GPU cloud",
"pages": [
"edge-ai/ai-infrastructure/about-our-ai-infrastructure",
"edge-ai/ai-infrastructure/create-an-ai-cluster"
"edge-ai/ai-infrastructure/about-gpu-cloud",
"edge-ai/ai-infrastructure/create-a-bare-metal-gpu-cluster",
"edge-ai/ai-infrastructure/spot-bare-metal-gpu",
"edge-ai/ai-infrastructure/manage-a-bare-metal-gpu-cluster"
]
},
{
Expand Down Expand Up @@ -1623,15 +1625,19 @@
},
{
"source": "/docs/cloud/ai-Infrustructure/about-our-ai-infrastructure",
"destination": "/docs/edge-ai/ai-infrastructure/about-our-ai-infrastructure"
"destination": "/docs/edge-ai/ai-infrastructure/about-gpu-cloud"
},
{
"source": "/docs/cloud/ai-Infrustructure/create-an-ai-cluster",
"destination": "/docs/edge-ai/ai-infrastructure/create-an-ai-cluster"
"destination": "/docs/edge-ai/ai-infrastructure/create-a-bare-metal-gpu-cluster"
},
{
"source": "/docs/cloud/ai-Infrustructure/about-virtual-vpod",
"destination": "/docs/edge-ai/ai-infrastructure/about-our-ai-infrastructure"
"destination": "/docs/edge-ai/ai-infrastructure/about-gpu-cloud"
},
{
"source": "/docs/edge-ai/ai-infrastructure/about-our-ai-infrastructure",
"destination": "/docs/edge-ai/ai-infrastructure/about-gpu-cloud"
},
{
"source": "/docs/edge-ai/inference-at-the-edge/:slug*",
Expand Down
92 changes: 92 additions & 0 deletions edge-ai/ai-infrastructure/about-gpu-cloud.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
title: About GPU Cloud
sidebarTitle: About GPU Cloud
---

GPU Cloud provides dedicated compute infrastructure for machine learning workloads. Use GPU clusters to train models, run inference, and process large-scale AI tasks.

## What is a GPU cluster

A GPU cluster is a group of interconnected servers, each equipped with multiple high-performance GPUs. Clusters are designed for workloads that require massive parallel processing power, such as training large language models (LLMs), fine-tuning foundation models, running inference at scale, and high-performance computing (HPC) tasks.

Check warning on line 10 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L10

Did you really mean 'GPUs'?

Check warning on line 10 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L10

Did you really mean 'LLMs'?

<Frame>
<img src="/images/docs/edge-ai/ai-infrastructure/about-gpu-cloud/create-cluster-page.png" alt="GPU Cloud create cluster page showing region selection, cluster type, and GPU configuration options" />
</Frame>

All nodes in a cluster share the same configuration: operating system image, network settings, and storage mounts. This ensures consistent behavior across the cluster.

## Cluster types

Gcore offers two types of GPU clusters:

Check warning on line 20 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L20

Did you really mean 'Gcore'?

| Type | Description | Best for |
|------|-------------|----------|
| **Bare Metal GPU** | Dedicated physical servers with guaranteed resources. No virtualization overhead | Production workloads, long-running training jobs, and latency-sensitive inference |
| **Spot Bare Metal GPU** | Same hardware as Bare Metal, but at a reduced price (up to 50% discount). Instances can be preempted with a 24-hour notice when capacity is needed | Fault-tolerant training with checkpointing, batch processing, development, and testing |

Check warning on line 25 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L25

Did you really mean 'checkpointing'?

<Info>
Spot instances are ideal for workloads that can handle interruptions. When a Spot cluster is reclaimed, you receive an email notification 24 hours before deletion. Use this time to save critical data to file shares or object storage.
</Info>

Clusters can scale to hundreds of nodes. Production deployments with 250+ nodes in a single cluster are supported, limited only by regional stock availability.

## Available configurations

Select a configuration based on your workload requirements:

| Configuration | GPUs | Interconnect | RAM | Storage | Use case |

Check warning on line 37 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L37

Did you really mean 'GPUs'?
|--------------|------|--------------|-----|---------|----------|
| H100 with InfiniBand | 8x NVIDIA H100 80GB | 3.2 Tbit/s InfiniBand | 2TB | 8x 3.84TB NVMe | Distributed LLM training requiring high-speed inter-node communication |

Check warning on line 39 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L39

Did you really mean 'NVMe'?
| H100 (bm3-ai-ndp) | 8x NVIDIA H100 80GB | 3.2 Tbit/s InfiniBand | 2TB | 6x 3.84TB NVMe | Distributed training and latency-sensitive inference at scale |

Check warning on line 40 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L40

Did you really mean 'NVMe'?
| A100 with InfiniBand | 8x NVIDIA A100 80GB | 800 Gbit/s InfiniBand | 2TB | 8x 3.84TB NVMe | Multi-node ML training and HPC workloads |

Check warning on line 41 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L41

Did you really mean 'NVMe'?
| A100 without InfiniBand | 8x NVIDIA A100 80GB | 2x 100 Gbit/s Ethernet | 2TB | 8x 3.84TB NVMe | Single-node training, inference for large models requiring more than 48GB VRAM |

Check warning on line 42 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L42

Did you really mean 'NVMe'?
| L40S | 8x NVIDIA L40S | 2x 25 Gbit/s Ethernet | 2TB | 4x 7.68TB NVMe | Inference, fine-tuning small to medium models requiring less than 48GB VRAM |

Check warning on line 43 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L43

Did you really mean 'NVMe'?

Outbound data transfer (egress) from GPU clusters is free. For pricing details, see [GPU Cloud billing](/edge-ai/billing).

## InfiniBand networking

InfiniBand is a high-bandwidth, low-latency interconnect technology used for communication between nodes in a cluster.

InfiniBand is configured automatically when you create a cluster. If the selected configuration includes InfiniBand network cards, all nodes are placed in the same InfiniBand domain with no manual setup required.

H100 configurations typically have 8 InfiniBand ports per node, each creating a dedicated network interface.

InfiniBand matters most for distributed training, where models that don't fit on a single node require frequent gradient synchronization between GPUs. The same applies to multi-node inference when large models are split across servers. In these cases, InfiniBand reduces communication overhead significantly compared to Ethernet.

Check warning on line 55 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L55

Did you really mean 'GPUs'?

For single-node workloads or independent batch jobs that don't require node-to-node communication, InfiniBand provides no benefit. Standard Ethernet configurations work equally well and may be more cost-effective.

## Storage options

GPU clusters support two storage types:

| Storage type | Persistence | Performance | Use case |
|-------------|-------------|-------------|----------|
| Local NVMe | Temporary (deleted with cluster) | Highest IOPS, lowest latency | Training data cache, checkpoints during training |

Check warning on line 65 in edge-ai/ai-infrastructure/about-gpu-cloud.mdx

View check run for this annotation

Mintlify / Mintlify Validation (gcore) - vale-spellcheck

edge-ai/ai-infrastructure/about-gpu-cloud.mdx#L65

Did you really mean 'NVMe'?
| File shares | Persistent (independent of cluster) | Network-attached, lower latency than object storage | Datasets, model weights, shared checkpoints |

Learn more about [configuring file shares](/cloud/file-shares/configure-file-shares) for persistent storage and sharing data between nodes.

## Cluster lifecycle

```
Create --> Configure --> Run workloads --> Resize (optional) --> Delete
```

1. **Create**: Select region, GPU type, number of nodes, image, and network settings when [creating a Bare Metal GPU cluster](/edge-ai/ai-infrastructure/create-a-bare-metal-gpu-cluster).

2. **Configure**: Connect via SSH to each node, install required dependencies, and mount file shares to prepare the environment for workloads.

3. **Run workloads**: Execute training jobs, run inference services, process data.

4. **Resize**: Add or remove nodes based on demand. New nodes inherit the cluster configuration, which you can manage in the [Bare Metal GPU cluster details](/edge-ai/ai-infrastructure/manage-a-bare-metal-gpu-cluster).

5. **Delete**: Remove the cluster when no longer needed. Local storage is erased; file shares remain.


<Info>
GPU clusters may take 15–40 minutes to provision, and their configuration (image, network, and storage) is fixed at creation. Local NVMe storage is temporary, so critical data should be saved to persistent file shares. Spot clusters can be interrupted with a 24-hour notice, and cluster size is limited by available regional stock.
</Info>

Hardware firewall support is available on servers equipped with BlueField network cards, enhancing network security for GPU clusters.

33 changes: 0 additions & 33 deletions edge-ai/ai-infrastructure/about-our-ai-infrastructure.mdx

This file was deleted.

Loading