Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU hangs when using GLK #749

Open
chrisdecker08 opened this issue Jul 23, 2024 · 5 comments
Open

GPU hangs when using GLK #749

chrisdecker08 opened this issue Jul 23, 2024 · 5 comments

Comments

@chrisdecker08
Copy link

Users have been seeing errors when tonemapping with ffmpeg on kernels 5.17 or newer. I have attached the results from one such user when run with PrintDebugMessages and PrintIoctlEntries. The line that caught my eye is ERROR: GPU HANG detected!. User is running Ubuntu 22.04.4 LTS (Kernel 6.5.0).

log.txt

@eero-t
Copy link

eero-t commented Aug 23, 2024

Is this with 5.17 version of i915 kernel GPU driver (which would be very old), or with newer i915 DKMS? If latter, which version?

@chrisdecker08
Copy link
Author

How would I figure that out?

@eero-t
Copy link

eero-t commented Aug 23, 2024

I missed that the log was from 6.5 kernel, which is rather newer. dkms status tells if there are DKMS kernel packages installed, and the related DEB packages can be listed with dpkg -l *dkms*.

Not that there are quite a lot of reasons why there may be GPU hangs:

Last one can be checked by greatly increasing the hang timer, or by disabling it completely, in case operation will actually complete if given enough time: https://www.intel.com/content/www/us/en/docs/oneapi/installation-guide-linux/2023-2/gpu-disable-hangcheck.html

Note: if you disable hang check completely, have remote access to the computer in case it really was a real GPU hang!

Unless operation is just too slow for default hang timer value, I think it's best to report this to against media driver as it's a FFmpeg use-case.

@eero-t
Copy link

eero-t commented Aug 23, 2024

Of the timing values under sysfs, I think the hang timer is the "heartbeat" one:

$ head /sys/class/drm/card0/engine/*/*heart*_ms
==> /sys/class/drm/card0/engine/bcs0/heartbeat_interval_ms <==
2500

==> /sys/class/drm/card0/engine/rcs0/heartbeat_interval_ms <==
2500

==> /sys/class/drm/card0/engine/vcs0/heartbeat_interval_ms <==
2500

==> /sys/class/drm/card0/engine/vecs0/heartbeat_interval_ms <==
2500

Does dmsg tell which of the above 3 GLK engines (copy, 3D/compute, video) is non-responsive too long?

@eero-t
Copy link

eero-t commented Aug 23, 2024

Ouch, I just noticed this from the log: vaapi=vaapi:/dev/dri/renderD128,driver=i965

Which is: https://packages.ubuntu.com/jammy/i965-va-driver

I.e. the legacy driver for HW before GLK, instead of a media driver that is still supported by Intel:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants