Skip to content

Adder WS became unstable after 6.12 kernel update #351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
robotman40 opened this issue Mar 16, 2025 · 8 comments
Open

Adder WS became unstable after 6.12 kernel update #351

robotman40 opened this issue Mar 16, 2025 · 8 comments

Comments

@robotman40
Copy link

robotman40 commented Mar 16, 2025

Ever since I updated my Adder WS to the 6.12.10 kernel, my system will become unstable and most things stop working shortly after booting and logging in. Here is the log I got from from journalctl -ke -b -1:

Mar 16 11:35:33 pop-os-laptop kernel: ? do_syscall_64+0x8a/0x170

Mar 16 11:35:33 pop-os-laptop kernel: ? do_syscall_64+0x8a/0x170

Mar 16 11:35:33 pop-os-laptop kernel: ? vfs_read+0x16f/0x380

Mar 16 11:35:33 pop-os-laptop kernel: ? vfs_read+0x16f/0x380

Mar 16 11:35:33 pop-os-laptop kernel: ? ksys_read+0x73/0x100

Mar 16 11:35:33 pop-os-laptop kernel: ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0

Mar 16 11:35:33 pop-os-laptop kernel: ? syscall_exit_to_user_mode+0x38/0x1d0

Mar 16 11:35:33 pop-os-laptop kernel: ? do_syscall_64+0x8a/0x170

Mar 16 11:35:33 pop-os-laptop kernel: ? do_syscall_64+0x8a/0x170

Mar 16 11:35:33 pop-os-laptop kernel: ? do_syscall_64+0x8a/0x170

Mar 16 11:35:33 pop-os-laptop kernel: ? syscall_exit_to_user_mode+0x38/0x1d0

Mar 16 11:35:33 pop-os-laptop kernel: ? do_syscall_64+0x8a/0x170

Mar 16 11:35:33 pop-os-laptop kernel: ? do_syscall_64+0x8a/0x170

Mar 16 11:35:33 pop-os-laptop kernel: ? entry_SYSCALL_64_after_hwframe+0x76/0x7e

Mar 16 11:35:33 pop-os-laptop kernel: </TASK>

Mar 16 11:35:39 pop-os-laptop kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=925, name=nvidia-powerd, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x2080205a 0x4).

Mar 16 11:35:45 pop-os-laptop kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=936, name=nv_queue, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x2080a7d7 0x2).

Mar 16 11:35:51 pop-os-laptop kernel: NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:01:00 (printing 1 of every 30). The GPU likely needs to be reset.

Mar 16 11:36:01 pop-os-laptop kernel: pcieport 0000:00:1a.0: AER: Correctable error message received from 0000:00:1a.0

Mar 16 11:36:01 pop-os-laptop kernel: pcieport 0000:00:1a.0: PCIe Bus Error: severity=Correctable, type=Physical Layer, (Receiver ID)

Mar 16 11:36:01 pop-os-laptop kernel: pcieport 0000:00:1a.0: device [8086:7a48] error status/mask=00000001/00002000

Mar 16 11:36:01 pop-os-laptop kernel: pcieport 0000:00:1a.0: [ 0] RxErr (First)

I am defaulting to kernel 6.9.3, which works correctly. Interestingly, when I first got my Adder WS and installed Fedora KDE 41, (which was on kernel 6.12 at the time), I was encountering the exact same issue. I thought it was a Fedora issue, but it turns out this may be a kernel issue as it worked fine under 6.11 on Fedora (but the Nvidia drivers wouldn't load).

Any chance this could be fixed?

@leviport
Copy link
Member

Which generation of Adder WS?

@robotman40
Copy link
Author

Which generation of Adder WS?

Latest generation addw4

@leviport
Copy link
Member

Roughly how long does it take to freeze after login? On my addw4, everything seems fine with 6.12 running.

@robotman40
Copy link
Author

robotman40 commented Mar 17, 2025

Roughly how long does it take to freeze after login? On my addw4, everything seems fine with 6.12 running.

It varies: it could be immediately after I login or about 15 minutes later.

Best way to see if it happens if by trying to open apps like Steam, Brave, or Wine apps. They will not launch properly and your system will become unstable.

@robotman40
Copy link
Author

Roughly how long does it take to freeze after login? On my addw4, everything seems fine with 6.12 running.

Forgot to add, but I'm currently on Nvidia driver 565.77. Should I update?

@leviport
Copy link
Member

Probably worth a try. I'm running nvidia-driver-570-open on mine currently.

@robotman40
Copy link
Author

robotman40 commented Mar 22, 2025

Probably worth a try. I'm running nvidia-driver-570-open on mine currently.

Unfortunately, even after upgrading to Nvidia 570, I still have this issue on 6.12.10. As before, using 6.9.3 works. This is the log I got this time:


Mar 22 09:37:18 pop-os kernel: warning: `ThreadPoolForeg' uses wireless extensions which will stop working for Wi-Fi 7 hardware; use nl80211
Mar 22 09:39:09 pop-os kernel: NVRM: kgspBootstrap_TU102: Failed to boot GSP.
Mar 22 09:39:09 pop-os kernel: NVRM: gpuPowerManagementResume: GSP boot failed at resume (bootMode 0x1): 0x55
Mar 22 09:39:15 pop-os kernel: NVRM: _kgspLogXid119: ********************************* GSP Timeout **********************************
Mar 22 09:39:15 pop-os kernel: NVRM: _kgspLogXid119: Note: Please also check logs above.
Mar 22 09:39:15 pop-os kernel: NVRM: GPU at PCI:0000:01:00: GPU-9be914ea-49c6-1853-1723-75ae269c4089
Mar 22 09:39:15 pop-os kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=1650, name=kworker/2:2, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL)>
Mar 22 09:39:15 pop-os kernel: NVRM: GPU0 GSP RPC buffer contains function 76 (GSP_RM_CONTROL) and data 0x000000002080205b 0x0000000000000004.
Mar 22 09:39:15 pop-os kernel: NVRM: GPU0 RPC history (CPU -> GSP):
Mar 22 09:39:15 pop-os kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration actively_polling
Mar 22 09:39:15 pop-os kernel: NVRM:      0    76   GSP_RM_CONTROL        0x000000002080205b 0x0000000000000004 0x000630f102a02632 0x0000000000000000          y
Mar 22 09:39:15 pop-os kernel: NVRM:     -1    47   UNLOADING_GUEST_DRIVE 0x0000000000000000 0x0000000000000000 0x000630f1025cbfa7 0x000630f102600d38 216465us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -2    10   FREE                  0x00000000c1e00028 0x0000000000000000 0x000630f1025cbb09 0x000630f1025cbe23    794us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -3    10   FREE                  0x000000000000000a 0x0000000000000000 0x000630f1025cb6ec 0x000630f1025cbb06   1050us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -4    10   FREE                  0x000000000000000b 0x0000000000000000 0x000630f1025ca868 0x000630f1025cab06    670us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -5    10   FREE                  0x0000000000000006 0x0000000000000000 0x000630f1025ca377 0x000630f1025ca85c   1253us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -6    10   FREE                  0x0000000000000002 0x0000000000000000 0x000630f1025c95e7 0x000630f1025ca281   3226us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -7    10   FREE                  0x0000000000000005 0x0000000000000000 0x000630f1025c8dd5 0x000630f1025c95e1   2060us  
Mar 22 09:39:15 pop-os kernel: NVRM: GPU0 RPC event history (CPU <- GSP):
Mar 22 09:39:15 pop-os kernel: NVRM:     entry function                   data0              data1              ts_start           ts_end             duration during_incomplete_rpc
Mar 22 09:39:15 pop-os kernel: NVRM:      0    4108 UCODE_LIBOS_PRINT     0x0000000000000000 0x0000000000000000 0x000630f1025d4861 0x000630f1025d4862      1us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -1    4128 GSP_POST_NOCAT_RECORD 0x0000000000000002 0x0000000000000028 0x000630f1025d03b7 0x000630f1025d03b9      2us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -2    4111 PERF_BRIDGELESS_INFO_ 0x0000000000000000 0x0000000000000000 0x000630f1025d01cc 0x000630f1025d01cc           
Mar 22 09:39:15 pop-os kernel: NVRM:     -3    4099 POST_EVENT            0x0000000000000021 0x0000000000000100 0x000630f1012ed0eb 0x000630f1012ed0f9     14us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -4    4128 GSP_POST_NOCAT_RECORD 0x0000000000000005 0x00000000016f356c 0x000630f1008bd596 0x000630f1008bd599      3us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -5    4099 POST_EVENT            0x0000000000000021 0x0000000000000008 0x000630f10089b1e9 0x000630f10089b1ee      5us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -6    4099 POST_EVENT            0x0000000000000021 0x0000000000000001 0x000630f10088e159 0x000630f10088e163     10us  
Mar 22 09:39:15 pop-os kernel: NVRM:     -7    4108 UCODE_LIBOS_PRINT     0x0000000000000000 0x0000000000000000 0x000630f100870e8b 0x000630f100870e8b           
Mar 22 09:39:15 pop-os kernel: CPU: 2 UID: 0 PID: 1650 Comm: kworker/2:2 Tainted: G           OE      6.12.10-76061203-generic #202412060638~1740154617~22.04~b4b3ebc
Mar 22 09:39:15 pop-os kernel: Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Mar 22 09:39:15 pop-os kernel: Hardware name: System76 Adder WS/Adder WS, BIOS 2024-07-08_926f73d 06/28/2024
Mar 22 09:39:15 pop-os kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
Mar 22 09:39:15 pop-os kernel: Call Trace:
Mar 22 09:39:15 pop-os kernel:  <TASK>
Mar 22 09:39:15 pop-os kernel:  dump_stack_lvl+0x76/0xa0
Mar 22 09:39:15 pop-os kernel:  dump_stack+0x10/0x20
Mar 22 09:39:15 pop-os kernel:  os_dump_stack+0xe/0x20 [nvidia]
Mar 22 09:39:15 pop-os kernel:  _kgspRpcRecvPoll+0x54c/0x620 [nvidia]
Mar 22 09:39:15 pop-os kernel:  _issueRpcAndWait+0x71/0x360 [nvidia]
Mar 22 09:39:15 pop-os kernel:  rpcRmApiControl_GSP+0x67e/0xa10 [nvidia]
Mar 22 09:39:15 pop-os kernel:  subdeviceCtrlCmdPerfSetPowerstate_KERNEL+0xa4/0x1b0 [nvidia]
Mar 22 09:39:15 pop-os kernel:  ? rmresControl_Prologue_IMPL+0x22/0x230 [nvidia]
Mar 22 09:39:15 pop-os kernel:  resControl_IMPL+0x1d1/0x1e0 [nvidia]
Mar 22 09:39:15 pop-os kernel:  serverControl+0x48f/0x5c0 [nvidia]
Mar 22 09:39:15 pop-os kernel:  ? _raw_spin_lock_irqsave+0xe/0x20
Mar 22 09:39:15 pop-os kernel:  ? acpi_ds_create_operand+0xf0/0x4d0
Mar 22 09:39:15 pop-os kernel:  _rmapiRmControl+0x544/0x840 [nvidia]
Mar 22 09:39:15 pop-os kernel:  rmapiControlWithSecInfo+0x79/0x140 [nvidia]
Mar 22 09:39:15 pop-os kernel:  ? acpi_evaluate_integer+0x6a/0x100
Mar 22 09:39:15 pop-os kernel:  rmapiControl+0x24/0x40 [nvidia]
Mar 22 09:39:15 pop-os kernel:  RmPowerSourceChangeEvent+0x58/0x70 [nvidia]
Mar 22 09:39:15 pop-os kernel:  ? nv_acpi_get_powersource+0x4b/0x90 [nvidia]
Mar 22 09:39:15 pop-os kernel:  RmPowerManagement+0x193/0x198 [nvidia]
Mar 22 09:39:15 pop-os kernel:  ? os_acquire_spinlock+0x12/0x30 [nvidia]
Mar 22 09:39:15 pop-os kernel:  RmGcxPowerManagement+0x1b8/0x360 [nvidia]
Mar 22 09:39:15 pop-os kernel:  ? _raw_spin_lock_irqsave+0xe/0x20
Mar 22 09:39:15 pop-os kernel:  ? rmGpuLockIsOwner+0x29/0x90 [nvidia]
Mar 22 09:39:15 pop-os kernel:  rm_transition_dynamic_power+0x88/0x137 [nvidia]
Mar 22 09:39:15 pop-os kernel:  ? ktime_get+0x3f/0xf0
Mar 22 09:39:15 pop-os kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Mar 22 09:39:15 pop-os kernel:  nv_pmops_runtime_resume+0x73/0x100 [nvidia]
Mar 22 09:39:15 pop-os kernel:  pci_pm_runtime_resume+0xa0/0x100
Mar 22 09:39:15 pop-os kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Mar 22 09:39:15 pop-os kernel:  __rpm_callback+0x4d/0x170
Mar 22 09:39:15 pop-os kernel:  ? ktime_get_mono_fast_ns+0x40/0xe0
Mar 22 09:39:15 pop-os kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Mar 22 09:39:15 pop-os kernel:  rpm_callback+0x64/0x70
Mar 22 09:39:15 pop-os kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Mar 22 09:39:15 pop-os kernel:  rpm_resume+0x4d5/0x6d0
Mar 22 09:39:15 pop-os kernel:  __pm_runtime_resume+0x4e/0x80
Mar 22 09:39:15 pop-os kernel:  pci_device_shutdown+0x23/0x90
Mar 22 09:39:15 pop-os kernel:  nv_indicate_not_idle+0x2b/0x40 [nvidia]
Mar 22 09:39:15 pop-os kernel:  os_ref_dynamic_power+0x135/0x280 [nvidia]
Mar 22 09:39:15 pop-os kernel:  rm_power_source_change_event+0xa0/0x150 [nvidia]
Mar 22 09:39:15 pop-os kernel:  nv_acpi_powersource_hotplug_event+0x88/0x90 [nvidia]
Mar 22 09:39:15 pop-os kernel:  acpi_ev_notify_dispatch+0x56/0xa0
Mar 22 09:39:15 pop-os kernel:  acpi_os_execute_deferred+0x17/0x40
Mar 22 09:39:15 pop-os kernel:  process_one_work+0x178/0x3d0
Mar 22 09:39:15 pop-os kernel:  worker_thread+0x2b8/0x3e0
Mar 22 09:39:15 pop-os kernel:  ? _raw_spin_lock_irqsave+0xe/0x20
Mar 22 09:39:15 pop-os kernel:  ? __pfx_worker_thread+0x10/0x10
Mar 22 09:39:15 pop-os kernel:  kthread+0xe1/0x110
Mar 22 09:39:15 pop-os kernel:  ? __pfx_kthread+0x10/0x10
Mar 22 09:39:15 pop-os kernel:  ret_from_fork+0x44/0x70
Mar 22 09:39:15 pop-os kernel:  ? __pfx_kthread+0x10/0x10
Mar 22 09:39:15 pop-os kernel:  ret_from_fork_asm+0x1a/0x30
Mar 22 09:39:15 pop-os kernel:  </TASK>
Mar 22 09:39:15 pop-os kernel: NVRM: _kgspLogXid119: ********************************************************************************
Mar 22 09:39:15 pop-os kernel: NVRM: _issueRpcAndWait: rpcRecvPoll timedout for fn 76!
Mar 22 09:39:15 pop-os kernel: NVRM: subdeviceCtrlCmdPerfSetPowerstate_KERNEL: NV2080_CTRL_CMD_PERF_SET_POWERSTATE RPC failed
Mar 22 09:39:21 pop-os kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=1650, name=kworker/2:2, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL)>
Mar 22 09:39:21 pop-os kernel: NVRM: _issueRpcAndWait: rpcRecvPoll timedout for fn 76!
Mar 22 09:39:21 pop-os kernel: NVRM: subdeviceCtrlCmdPerfSetPowerstate_KERNEL: NV2080_CTRL_CMD_PERF_SET_POWERSTATE RPC failed
Mar 22 09:39:21 pop-os kernel: NVRM: rm_power_source_change_event: rm_power_source_change_event: Failed to handle Power Source change event, status=0x65
Mar 22 09:39:27 pop-os kernel: NVRM: Xid (PCI:0000:01:00): 119, pid=792, name=nv_queue, Timeout after 6s of waiting for RPC response from GPU0 GSP! Expected function 76 (GSP_RM_CONTROL) (0x>
Mar 22 09:39:27 pop-os kernel: NVRM: _issueRpcAndWait: rpcRecvPoll timedout for fn 76!
Mar 22 09:39:27 pop-os kernel: NVRM: RmCheckForGcxSupportOnCurrentState: NVRM, Failed to get GCx pre-requisite, status=0x65
Mar 22 09:39:33 pop-os kernel: NVRM: Rate limiting GSP RPC error prints for GPU at PCI:0000:01:00 (printing 1 of every 30).  The GPU likely needs to be reset.
Mar 22 09:39:51 pop-os kernel: NVRM: RmCheckForGcxSupportOnCurrentState: NVRM, Failed to get GCx pre-requisite, status=0x65
Mar 22 09:40:16 pop-os kernel: rfkill: input handler enabled
Mar 22 09:40:21 pop-os kernel: NVRM: RmCheckForGcxSupportOnCurrentState: NVRM, Failed to get GCx pre-requisite, status=0x65

From what I've been reading, these logs might be suggesting my GPU driver is at fault, but then why is it working on your setup

@robotman40
Copy link
Author

robotman40 commented Mar 28, 2025

I updated to the latest kernel and Nvidia 570 open driver that was pushed recently and while the system no longer lockups, now a lot of my games have stopped working properly. Here are a few examples:

  • Forza Horizon 5 - Will show an incompatible graphics driver warning before launching to a black screen
  • Assetto Corsa EVO - Just crashes while launching
  • Persona 3 Reload - Runs at a very low frame rate.

Ultrakill seems to be working fine though, which makes me suspect it happens with vkd3d-proton; I made a bug report there too. Any way I can collect debugging info for this for you guys?

NVM, this was anecdotal. I'm still having the same lockup issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants