Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault running rpk redpanda tune net #22856

Open
Anbu42 opened this issue Aug 13, 2024 · 6 comments
Open

Segfault running rpk redpanda tune net #22856

Anbu42 opened this issue Aug 13, 2024 · 6 comments
Labels
kind/bug Something isn't working

Comments

@Anbu42
Copy link

Anbu42 commented Aug 13, 2024

Version & Environment

Redpanda version: v24.2.2 (the same problem was observed in 23.1.11)

Operating System: Ubuntu 20.04.6 LTS (Focal Fossa)
5.4.0-192-generic #212-Ubuntu SMP Fri Jul 5 09:47:39 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

What went wrong?

rpk redpanda tune net
TUNER         APPLIED         ENABLED        SUPPORTED      ERROR
net  false  true  true  err=signal: segmentation fault (core dumped), stderr=

Verbose:

12:00:00.614  DEBUG  Looking for interface with '[0.0.0.0 0.0.0.0]' addresses
12:00:00.614  DEBUG  Checking 'lo' address '127.0.0.1/8'
12:00:00.614  DEBUG  Checking 'lo' address '::1/128'
12:00:00.614  DEBUG  Checking 'eth0' address '172.22.254.154/24'
12:00:00.614  DEBUG  Checking 'eth0' address 'fe80::a859:b6ff:fe19:2943/64'
12:00:00.614  DEBUG  Checking 'br-5ce485394f2d' address '10.0.0.1/27'
12:00:00.614  DEBUG  Checking 'br-5ce485394f2d' address 'fe80::42:3cff:fef4:881c/64'
12:00:00.614  DEBUG  Checking 'veth5421a81' address 'fe80::64e7:baff:feb1:e744/64'
12:00:00.614  DEBUG  Checking if 'veth5421a81' is HW interface
12:00:00.614  DEBUG  Checking if 'veth5421a81' is bond interface
12:00:00.614  DEBUG  Skipping tuning of 'veth5421a81' virtual interface
12:00:00.614  DEBUG  Checking if 'eth0' is HW interface
12:00:00.614  DEBUG  Checking if 'br-5ce485394f2d' is HW interface
12:00:00.614  DEBUG  Checking if 'br-5ce485394f2d' is bond interface
12:00:00.614  DEBUG  Skipping tuning of 'br-5ce485394f2d' virtual interface
12:00:00.614  DEBUG  Checking if 'veth5421a81' is HW interface
12:00:00.614  DEBUG  Checking if 'veth5421a81' is bond interface
12:00:00.614  DEBUG  Skipping tuning of 'veth5421a81' virtual interface
12:00:00.614  DEBUG  Checking if 'eth0' is HW interface
12:00:00.614  DEBUG  Checking if 'br-5ce485394f2d' is HW interface
12:00:00.614  DEBUG  Checking if 'br-5ce485394f2d' is bond interface
12:00:00.614  DEBUG  Skipping tuning of 'br-5ce485394f2d' virtual interface
12:00:00.614  DEBUG  Checking if 'veth5421a81' is HW interface
12:00:00.614  DEBUG  Checking if 'veth5421a81' is bond interface
12:00:00.614  DEBUG  Skipping tuning of 'veth5421a81' virtual interface
12:00:00.614  DEBUG  Checking if 'eth0' is HW interface
12:00:00.614  DEBUG  Checking if 'br-5ce485394f2d' is HW interface
12:00:00.614  DEBUG  Checking if 'br-5ce485394f2d' is bond interface
12:00:00.614  DEBUG  Skipping tuning of 'br-5ce485394f2d' virtual interface
12:00:00.614  DEBUG  Checking if 'veth5421a81' is HW interface
12:00:00.614  DEBUG  Checking if 'veth5421a81' is bond interface
12:00:00.614  DEBUG  Skipping tuning of 'veth5421a81' virtual interface
12:00:00.614  DEBUG  Checking if 'eth0' is HW interface
12:00:00.614  DEBUG  Checking if 'br-5ce485394f2d' is HW interface
12:00:00.614  DEBUG  Checking if 'br-5ce485394f2d' is bond interface
12:00:00.614  DEBUG  Skipping tuning of 'br-5ce485394f2d' virtual interface
12:00:00.614  DEBUG  Checking if 'veth5421a81' is HW interface
12:00:00.614  DEBUG  Checking if 'veth5421a81' is bond interface
12:00:00.614  DEBUG  Skipping tuning of 'veth5421a81' virtual interface
12:00:00.614  DEBUG  Checking if 'eth0' is HW interface
12:00:00.614  DEBUG  Checking if 'br-5ce485394f2d' is HW interface
12:00:00.614  DEBUG  Checking if 'br-5ce485394f2d' is bond interface
12:00:00.614  DEBUG  Skipping tuning of 'br-5ce485394f2d' virtual interface
12:00:00.614  DEBUG  Checking if 'hwloc-calc-redpanda' & 'hwloc-distrib-redpanda' are present...
12:00:00.614  DEBUG  Checking if 'hwloc-calc-redpanda' & 'hwloc-distrib-redpanda' are present...
12:00:00.614  DEBUG  Tuner parameters &{Mode: CPUMask:all RebootAllowed:false Disks:[] Directories:[/var/lib/redpanda/data] Nics:[veth5421a81 eth0 br-5ce485394f2d]}
12:00:00.614  DEBUG  Checking 'NIC IRQs affinity static'
12:00:00.614  DEBUG  Checking if 'veth5421a81' is HW interface
12:00:00.614  DEBUG  Checking if 'veth5421a81' is bond interface
12:00:00.614  DEBUG  Checking if 'eth0' is HW interface
12:00:00.614  DEBUG  Getting NIC 'eth0' IRQs
12:00:00.614  DEBUG  Reading IRQs of '/sys/class/net/eth0/device', with deviceInfo name pattern 'eth0'
12:00:00.615  DEBUG  Reading '/proc/interrupts' file...
12:00:00.615  DEBUG  Reading '/sys/class/net/eth0/device' device IRQs from /proc/interrupts
12:00:00.615  DEBUG  DeviceInfo '/sys/class/net/eth0/device' IRQs '[]'
12:00:00.615  DEBUG  Reading '/proc/interrupts' file...
12:00:00.615  DEBUG  Checking if 'eth0' is bond interface
12:00:00.615  DEBUG  Checking if 'br-5ce485394f2d' is HW interface
12:00:00.615  DEBUG  Checking if 'br-5ce485394f2d' is bond interface
12:00:00.615  DEBUG  Running command 'ps' with arguments '[--no-headers -C irqbalance]'
12:00:00.621  DEBUG  Getting banned IRQs
12:00:00.621  DEBUG  Check 'NIC IRQs affinity static' passed, skipping tuning
12:00:00.621  DEBUG  Checking 'NIC eth0 IRQ affinity set'
12:00:00.621  DEBUG  Checking if 'eth0' is HW interface
12:00:00.621  DEBUG  'eth0' is HW interface
12:00:00.621  DEBUG  Running command 'hwloc-calc-redpanda' with arguments '[all]'
12:00:00.660  DEBUG  Checking if 'eth0' is HW interface
12:00:00.661  DEBUG  Getting number of Rx queues for 'eth0'
12:00:00.661  DEBUG  Getting max RSS queues count for 'eth0'
12:00:00.661  DEBUG  NIC 'eth0' uses 'vif' driver
12:00:00.661  DEBUG  Running command 'hwloc-calc-redpanda' with arguments '[--restrict 0x000003ff --number-of core machine:0]'
12:00:00.696  DEBUG  Running command 'hwloc-calc-redpanda' with arguments '[--restrict 0x000003ff --number-of PU machine:0]'
12:00:00.735  DEBUG  Using mq mode (hardcoded) for 'eth0': '10' cores, '10' PUs and '8' rx queues
12:00:00.735  DEBUG  Getting max RSS queues count for 'eth0'
12:00:00.735  DEBUG  NIC 'eth0' uses 'vif' driver
12:00:00.735  DEBUG  Getting NIC 'eth0' IRQs
12:00:00.735  DEBUG  Reading IRQs of '/sys/class/net/eth0/device', with deviceInfo name pattern 'eth0'
12:00:00.735  DEBUG  Reading '/proc/interrupts' file...
12:00:00.735  DEBUG  Reading '/sys/class/net/eth0/device' device IRQs from /proc/interrupts
12:00:00.735  DEBUG  DeviceInfo '/sys/class/net/eth0/device' IRQs '[]'
12:00:00.735  DEBUG  Reading '/proc/interrupts' file...
12:00:00.736  DEBUG  Computing IRQ CPU mask for 'mq' mode and input CPU mask '0x000003ff'
12:00:00.736  DEBUG  IRQs CPU mask '0x000003ff'
12:00:00.736  DEBUG  Calculating distribution 'eth0' IRQs
12:00:00.736  DEBUG  Running command 'hwloc-distrib-redpanda' with arguments '[0 --single --restrict 0x000003ff]'
TUNER         APPLIED         ENABLED        SUPPORTED      ERROR
net  false  true  true  err=signal: segmentation fault (core dumped), stderr

What should have happened instead?

Either error or correctly tuned.

How to reproduce the issue?

  1. Run Redpanda v24.2.2 on a virtual machine with Ubuntu 20.04.6 LTS.
  2. Run rpk redpanda mode prod
  3. Run rpk redpanda tune net

JIRA Link: CORE-6865

@Anbu42 Anbu42 added the kind/bug Something isn't working label Aug 13, 2024
@gostiunin
Copy link

Oh, I have the same problem.

@StephanDollberg
Copy link
Member

This is a known issue on systems with more obscure NIC setups like Azure, VMs and others.

The actual segfault comes from hwloc-distrib (issue here) but the main problem is that the tuner fails to parse interrupt lines for the NICs. This needs custom handling for different platforms.

@Anbu42
Copy link
Author

Anbu42 commented Aug 14, 2024

It’s interesting that I have another redpanda cluster in the same environment and network tuning works there.

It’s also not clear how much the cluster’s performance will deteriorate without network tuning?

@frvade
Copy link

frvade commented Aug 16, 2024

So, @StephanDollberg what would you suggest as a workaround for this? Not using tune net or anything else? If yes, what the consequences would be for the cluster network performance?

@StephanDollberg
Copy link
Member

Yeah just skip it for now. Hard to generalize perf impact of this as it quite heavily depends on the system.

In many cases it can be negligible. In others it can have larger impact (many core servers) but that part isn't working correctly right now anyway for other reasons.

@frvade
Copy link

frvade commented Aug 16, 2024

Thanks a lot! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants