Skip to content

Commit e26ff65

Browse files
TELCODOCS-1786: Updated AMD link
1 parent 9d908f4 commit e26ff65

7 files changed

+173
-2
lines changed

_topic_maps/_topic_map.yml

+2
Original file line numberDiff line numberDiff line change
@@ -3525,6 +3525,8 @@ Topics:
35253525
File: about-hardware-accelerators
35263526
- Name: NVIDIA GPU architecture
35273527
File: nvidia-gpu-architecture
3528+
- Name: AMD GPU Operator
3529+
File: amd-gpu-operator
35283530
---
35293531
Name: Backup and restore
35303532
Dir: backup_and_restore

hardware_accelerators/about-hardware-accelerators.adoc

+2-2
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ include::_attributes/common-attributes.adoc[]
66

77
toc::[]
88

9-
Specialized hardware accelerators play a key role in the emerging generative artificial intelligence and machine learning (AI/ML) industry. Specifically, hardware accelerators are essential to the training and serving of large language and other foundational models that power this new technology. Data scientists, data engineers, ML engineers, and developers can take advantage of the specialized hardware acceleration for data-intensive transformations and model development and serving. Much of that ecosystem is open source, with a number of contributing partners and open source foundations.
9+
Specialized hardware accelerators play a key role in the emerging generative artificial intelligence and machine learning (AI/ML) industry. Specifically, hardware accelerators are essential to the training and serving of large language and other foundational models that power this new technology. Data scientists, data engineers, ML engineers, and developers can take advantage of the specialized hardware acceleration for data-intensive transformations and model development and serving. Much of that ecosystem is open source, with several contributing partners and open source foundations.
1010

1111
Red{nbsp}Hat {product-title} provides support for cards and peripheral hardware that add processing units that comprise hardware accelerators:
1212

@@ -39,7 +39,7 @@ include::modules/hardware-accelerators.adoc[leveloffset=+1]
3939
* link:https://docs.redhat.com/en/documentation/red_hat_openshift_ai_self-managed/2-latest/html/introduction_to_red_hat_openshift_ai/index[Introduction to Red Hat OpenShift AI]
4040
4141
* link:https://docs.nvidia.com/datacenter/cloud-native/openshift/latest/index.html[
42-
NVIDIA GPU Operator on Red Hat OpenShift Container Platform]
42+
NVIDIA GPU Operator on Red Hat {product-title}]
4343
4444
* link:https://www.amd.com/en/products/accelerators/instinct.html[AMD Instinct Accelerators]
4545
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * hardware_accelerators/amd-gpu-operator.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="amd-about-amd-gpu-operator_{context}"]
7+
= About the AMD GPU Operator
8+
9+
The hardware acceleration capabilities of the AMD GPU Operator provide enhanced performance and cost efficiency for data scientists and developers using Red Hat OpenShift AI for creating artificial intelligence and machine learning (AI/ML) applications. Accelerating specific areas of GPU functions can minimize CPU processing and memory usage, improving overall application speed, memory consumption, and bandwdith retrictions.
10+
11+
12+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="amd-gpu-operator"]
3+
= AMD GPU Operator
4+
include::_attributes/common-attributes.adoc[]
5+
:context: amd-gpu-operator
6+
7+
toc::[]
8+
9+
AMD Instinct GPU accelerators combined with the AMD GPU Operator within your {product-title} cluster lets you seamlessly harness computing capabilities for machine learning, Generative AI, and GPU-accelerated applications.
10+
11+
This documentation provides the information you need to enable, configure, and test the AMD GPU Operator. For more information, see link:https://www.amd.com/en/products/accelerators/instinct.html[AMD Instinct™ Accelerators].
12+
13+
:FeatureName: AMD GPU Operator
14+
15+
include::modules/amd-about-amd-gpu-operator.adoc[leveloffset=+1]
16+
17+
include::modules/amd-installing-gpu-operator.adoc[leveloffset=+1]
18+
19+
.Next steps
20+
21+
. Install the xref:../hardware_enablement/psap-node-feature-discovery-operator.adoc#installing-the-node-feature-discovery-operator_node-feature-discovery-operator[Node Feature Discovery Operator].
22+
23+
. Install the xref:../hardware_enablement/kmm-kernel-module-management.adoc#kmm-install_kernel-module-management-operator[Kernel Module Management Operator].
24+
25+
. Install and configure the link:https://instinct.docs.amd.com/projects/gpu-operator/en/main/installation/openshift-olm.html#install-amd-gpu-operator[AMD GPU Operator].
26+
27+
include::modules/amd-testing-the-amd-gpu-operator.adoc[leveloffset=+1]
28+
29+
30+
+12
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * hardware_accelerators/amd-gpu-operator.adoc
4+
5+
:_mod-docs-content-type: CONCEPT
6+
[id="amd-about-amd-gpu-operator_{context}"]
7+
= About the AMD GPU Operator
8+
9+
The hardware acceleration capabilities of the AMD GPU Operator provide enhanced performance and cost efficiency for data scientists and developers using Red Hat OpenShift AI for creating artificial intelligence and machine learning (AI/ML) applications. Accelerating specific areas of GPU functions can minimize CPU processing and memory usage, improving overall application speed, memory consumption, and bandwidth restrictions.
10+
11+
12+
+14
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * hardware_accelerators/amd-gpu-operator.adoc
4+
5+
:_mod-docs-content-type: REFERENCE
6+
[id="amd-installing-gpu-operator_{context}"]
7+
= Installing the AMD GPU Operator
8+
9+
As a cluster administrator, you can install the AMD GPU Operator by using the OpenShift CLI and the web console. This is a multi-step procedure that requires the installation of the Node Feature Discovery Operator, the Kernel Module Management Operator, and then the AMD GPU Operator. Use the following steps in succession to install the AMD community release of the Operator.
10+
11+
12+
13+
14+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * hardware_accelerators/amd-gpu-operator.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="amd-testing-the-amd-gpu-operator_{context}"]
7+
= Testing the AMD GPU Operator
8+
9+
Use the following procedure to test the ROCmInfo installation and view the logs for the AMD MI210 GPU.
10+
11+
.Procedure
12+
13+
. Create a YAML file that tests ROCmInfo:
14+
+
15+
[source,terminal]
16+
----
17+
$ cat << EOF > rocminfo.yaml
18+
19+
apiVersion: v1
20+
kind: Pod
21+
metadata:
22+
name: rocminfo
23+
spec:
24+
containers:
25+
- image: docker.io/rocm/pytorch:latest
26+
name: rocminfo
27+
command: ["/bin/sh","-c"]
28+
args: ["rocminfo"]
29+
resources:
30+
limits:
31+
amd.com/gpu: 1
32+
requests:
33+
amd.com/gpu: 1
34+
restartPolicy: Never
35+
EOF
36+
----
37+
38+
. Create the `rocminfo` pod:
39+
+
40+
[source,terminal]
41+
----
42+
$ oc create -f rocminfo.yaml
43+
----
44+
+
45+
.Example output
46+
[source,terminal]
47+
----
48+
apiVersion: v1
49+
pod/rocminfo created
50+
----
51+
52+
. Check the `rocmnfo` log with one MI210 GPU:
53+
+
54+
[source,terminal]
55+
----
56+
$ oc logs rocminfo | grep -A5 "Agent"
57+
----
58+
+
59+
.Example output
60+
[source,terminal]
61+
----
62+
HSA Agents
63+
==========
64+
*******
65+
Agent 1
66+
*******
67+
Name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
68+
Uuid: CPU-XX
69+
Marketing Name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
70+
Vendor Name: CPU
71+
--
72+
Agent 2
73+
*******
74+
Name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
75+
Uuid: CPU-XX
76+
Marketing Name: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz
77+
Vendor Name: CPU
78+
--
79+
Agent 3
80+
*******
81+
Name: gfx90a
82+
Uuid: GPU-024b776f768a638b
83+
Marketing Name: AMD Instinct MI210
84+
Vendor Name: AMD
85+
----
86+
87+
. Delete the pod:
88+
+
89+
[source,terminal]
90+
----
91+
$ oc delete -f rocminfo.yaml
92+
----
93+
+
94+
.Example output
95+
[source,terminal]
96+
----
97+
pod "rocminfo" deleted
98+
----
99+
100+
101+

0 commit comments

Comments
 (0)