Skip to content

Commit a16f0c4

Browse files
committed
Add nouveau issue to troubleshooting.rst
Signed-off-by: Andrew Chen <[email protected]>
1 parent baf96a6 commit a16f0c4

File tree

1 file changed

+23
-0
lines changed

1 file changed

+23
-0
lines changed

gpu-operator/troubleshooting.rst

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,29 @@ If you are facing an issue that is not covered by this page, please file an issu
2626
`NVIDIA GPU Operator GitHub repository <https://github.com/NVIDIA/gpu-operator/issues>`_.
2727

2828

29+
**************************************************
30+
The ``nouveau`` driver fails to initialize the GPU
31+
**************************************************
32+
33+
.. rubric:: Observation
34+
:class: h4
35+
36+
- The GPU driver fails to initialize the GPU with the error ``Failed to enable MSI-X`` in the system journal logs.
37+
- All GPU Operator pods become stuck in the ``init`` state.
38+
39+
.. rubric:: Root Cause
40+
:class: h4
41+
42+
- The ``nouveau`` Linux kernel module is loaded.
43+
44+
.. rubric:: Action
45+
:class: h4
46+
47+
The ``nouveau`` driver must be blacklisted when using NVIDIA vGPU.
48+
49+
Follow the instructions in the `NVIDIA AI Enterprise: VMware Deployment Guide <https://docs.nvidia.com/ai-enterprise/deployment/vmware/latest/nouveau.html#disable-nouveau>`_
50+
to disable ``nouveau`` on your OS/distro to resolve this issue.
51+
2952
***********************************
3053
GPU Operator pods are stuck in Init
3154
***********************************

0 commit comments

Comments
 (0)