Skip to content

GPU Reset Behavior on Out-of-Memory with Intel Arc and xe Driver #842

@arguellocarlos

Description

@arguellocarlos

Hi there,

I'm looking for some insight into how GPU reset is handled when running into out-of-memory (OOM) issues.

My system is running:

  • Kernel: 6.15.9.arch1-1
  • Intel OneAPI Base Toolkit: 2025.2
  • Intel Compute Runtime: 25.27.34303.5
  • Driver: xe

Hardware:

  • AMD Ryzen 9900X
  • Intel Arc B580
  • 48GB DDR5 RAM @ 6200MHz

When I run AI workloads like image generation, the GPU occasionally runs out of memory. When that happens, the entire desktop freezes and becomes completely unresponsive, requiring a hard reboot to recover. I'm particularly wary of hard resets since I have a couple of mechanical drives configured in a RAID array, and I'd really prefer to avoid any risk of data corruption or filesystem damage.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions