Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nwipe getting out of memory error #332

Open
rdellaripa opened this issue Feb 19, 2025 · 25 comments
Open

nwipe getting out of memory error #332

rdellaripa opened this issue Feb 19, 2025 · 25 comments

Comments

@rdellaripa
Copy link

So I boot fine, set parameters and get nwipe running, and then, after a while, it gets killed with an out of memory error. I don't think it's the drive, I managed to do a dd through the whole thing with no errors.

[21103.027907] ip invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=2, oom_score_adj=0
[21103.027929] CPU: 0 UID: 0 PID: 28563 Comm: ip Tainted: G        W        N 6.11.11 #2
[21103.027938] Tainted: [W]=WARN, [N]=TEST
[21103.027940] Hardware name: Hewlett-Packard HP Pavilion 17 Notebook PC /226A, BIOS F.43 08/19/2015
[21103.027944] Call Trace:
[21103.027949]  <TASK>
[21103.027953]  dump_stack_lvl+0x53/0x70
[21103.027965]  dump_header+0x3f/0x1a0
[21103.027973]  oom_kill_process+0xfd/0x210
[21103.027979]  out_of_memory+0x242/0x570
[21103.027985]  __alloc_pages_slowpath.constprop.0+0xac3/0xce0
[21103.027995]  __alloc_pages_noprof+0x2a7/0x310
[21103.028001]  rtl_open+0x1ae/0x4c0
[21103.028009]  __dev_open+0xe4/0x180
[21103.028015]  __dev_change_flags+0x1ac/0x220
[21103.028020]  dev_change_flags+0x21/0x60
[21103.028026]  devinet_ioctl+0x30e/0x7c0
[21103.028033]  inet_ioctl+0x1c5/0x1e0
[21103.028041]  sock_do_ioctl+0x77/0x140
[21103.028048]  sock_ioctl+0x208/0x330
[21103.028053]  __x64_sys_ioctl+0x92/0xd0
[21103.028059]  do_syscall_64+0x54/0x110
[21103.028064]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[21103.028073] RIP: 0033:0x7f85a2576938
[21103.028080] Code: 00 00 48 8d 44 24 08 48 89 54 24 e0 48 89 44 24 c0 48 8d 44 24 d0 48 89 44 24 c8 b8 10 00 00 00 c7 44 24 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 07 89 d0 c3 0f 1f 40 00 48 8b 15 b1 54 0d
[21103.028084] RSP: 002b:00007fff08faf588 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[21103.028091] RAX: ffffffffffffffda RBX: 00007f85a26f6eea RCX: 00007f85a2576938
[21103.028095] RDX: 00007fff08faf600 RSI: 0000000000008914 RDI: 0000000000000003
[21103.028099] RBP: 0000000000000001 R08: 000000000000000b R09: 0000000000000803
[21103.028102] R10: 0000000000000012 R11: 0000000000000246 R12: 0000000000000003
[21103.028105] R13: 00007fff08faf600 R14: ffffffffffffffff R15: 00007fff08faf7a0
[21103.028109]  </TASK>
[21103.028112] Mem-Info:
[21103.028116] active_anon:29456 inactive_anon:179120 isolated_anon:0
                active_file:0 inactive_file:1432291 isolated_file:0
                unevictable:0 dirty:26380 writeback:0
                slab_reclaimable:71899 slab_unreclaimable:6403
                mapped:1850 shmem:205597 pagetables:299
                sec_pagetables:0 bounce:0
                kernel_misc_reclaimable:0
                free:24153 free_pcp:1877 free_cma:0
[21103.028126] Node 0 active_anon:117824kB inactive_anon:716480kB active_file:0kB inactive_file:5729164kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:7400kB dirty:105520kB writeback:0kB shmem:822388kB writeback_tmp:0kB kernel_stack:2092kB pagetables:1196kB sec_pagetables:0kB all_unreclaimable? no
[21103.028135] Node 0 DMA free:14744kB boost:0kB min:20kB low:32kB high:44kB reserved_highatomic:0KB active_anon:148kB inactive_anon:468kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[21103.028146] lowmem_reserve[]: 0 1919 6581 0
[21103.028154] Node 0 DMA32 free:55628kB boost:0kB min:3024kB low:4988kB high:6952kB reserved_highatomic:0KB active_anon:124kB inactive_anon:0kB active_file:0kB inactive_file:1781680kB unevictable:0kB writepending:28020kB present:2031428kB managed:1965692kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[21103.028164] lowmem_reserve[]: 0 0 4662 0
[21103.028172] Node 0 Normal free:26240kB boost:0kB min:7344kB low:12116kB high:16888kB reserved_highatomic:0KB active_anon:117552kB inactive_anon:716076kB active_file:0kB inactive_file:3948460kB unevictable:0kB writepending:77188kB present:5226496kB managed:5029416kB mlocked:0kB bounce:0kB free_pcp:7484kB local_pcp:0kB free_cma:0kB
[21103.028187] lowmem_reserve[]: 0 0 0 0
[21103.028193] Node 0 DMA: 78*4kB (M) 56*8kB (M) 38*16kB (M) 16*32kB (M) 3*64kB (M) 1*128kB (M) 1*256kB (M) 0*512kB 2*1024kB (UM) 1*2048kB (U) 2*4096kB (M) = 14744kB
[21103.028221] Node 0 DMA32: 11571*4kB (UE) 1185*8kB (UE) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 55764kB
[21103.028241] Node 0 Normal: 6023*4kB (UM) 336*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 26780kB
[21103.028260] 1637931 total pagecache pages
[21103.028262] 0 pages in swap cache
[21103.028264] Free swap  = 0kB
[21103.028266] Total swap = 0kB
[21103.028267] 1818480 pages RAM
[21103.028269] 0 pages HighMem/MovableOnly
[21103.028270] 65863 pages reserved
[21103.028271] 0 pages cma reserved
[21103.028273] Tasks state (memory values in pages):
[21103.028275] [  pid  ]   uid  tgid total_vm      rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[21103.028280] [    145]     0   145     1049      668       32        0       636    49152        0             0 syslogd
[21103.028288] [    149]     0   149     1049      635       32        0       603    49152        0             0 klogd
[21103.028294] [    164]     0   164      629      231        0        0       231    45056        0             0 thd
[21103.028300] [    171]   100   171     1175      301       33        0       268    53248        0             0 dbus-daemon
[21103.028306] [    175]     0   175      670      330       32        0       298    45056        0             0 rpcbind
[21103.028312] [    180]     0   180     1091      777       64        0       713    49152        0             0 shredos_net.sh
[21103.028318] [    186]   101   186    19262      756      205        0       551    57344        0             0 ntpd
[21103.028324] [    192]     0   192     1049      619       32        0       587    45056        0             0 crond
[21103.028329] [    200]     0   200      689      101        0        0       101    45056        0             0 tftpd
[21103.028335] [    206]     0   206      945      352       46        0       306    45056        0             0 collectd
[21103.028340] [    207]     0   207     1049      642       32        0       610    49152        0             0 getty
[21103.028346] [    208]     0   208     1124      786       96        0       690    53248        0             0 nwipe_launcher
[21103.028352] [    209]     0   209     1157      902       96        0       806    53248        0             0 sh
[21103.028358] [    458]     0   458    43839     2618     1824        0       794    81920        0             0 nwipe
[21103.028364] [  28560]     0 28560     1049      623       32        0       591    49152        0             0 ifup
[21103.028369] [  28563]     0 28563     1049      564       32        0       532    45056        0             0 ip
[21103.028375] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=nwipe,pid=458,uid=0
[21103.028414] Out of memory: Killed process 458 (nwipe) total-vm:175356kB, anon-rss:7296kB, file-rss:0kB, shmem-rss:3176kB, UID:0 pgtables:80kB oom_score_adj:0
@PartialVolume
Copy link
Owner

IP invoked the oom-killer

Nwipe was killed just because it will be the biggest consumer of memory even though the out of memory might not be anything to do with nwipe.

As IP invoked the oom-killer maybe it's something to do with the network. Can you plug an ethernet lead in and connect it to a network running a DHCP server. I'm wondering whether IP software running in the background is causing a memory leak. If plugging into a network fixes the problem then it will give me an idea where to look.

@PartialVolume
Copy link
Owner

PartialVolume commented Feb 19, 2025

Something else you can do to determine whether it's a network issue.

Once nwipe has appeared, type ALT F2 to switch virtual terminals and run pkill shredos_net.sh, then ALT F1 and start the wipes. See if that then works without the out of memory. This will kill the software that manages the ethernet.

@PartialVolume
Copy link
Owner

PartialVolume commented Feb 19, 2025

Can you also post lspci -k so we have a record of your ethernet hardware and what driver is loaded.

@rdellaripa
Copy link
Author

I have it running right now with a network connection. If that fails, I'll try the rest.

@rdellaripa
Copy link
Author

Well, it worked unplugged. Now I can't get it to fail again. I'll update again when I find some consistency.

@rdellaripa
Copy link
Author

I'm afraid I don't have any technical details, but it seems to correlate to choosing Lagged Fibonacci as the PRNG.

@PartialVolume
Copy link
Owner

I'm afraid I don't have any technical details, but it seems to correlate to choosing Lagged Fibonacci as the PRNG.

I'm going to ask @Knogle to comment as the author of the lagged Fibonacci PRNG.

For what it's worth I've been using lagged Fibonacci to wipe at least 50 drives on two different systems, Dell I7 8 core 8GB (8 drives simultaneously) and a Clevo I7 20 core (one drive at a time) and haven't experienced any out of memory errors.

@rdellaripa Can you post the output of lspci -k so we can maybe find a correlation with the hardware. Thanks.

@rdellaripa
Copy link
Author

00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h (Models 30h-3fh) Processor Root Complex
Subsystem: Hewlett-Packard Company Device 226a
00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Mullins [Radeon R4/R5 Graphics] (rev 05)
DeviceName: Onboard IGD
Subsystem: Hewlett-Packard Company Device 226a
Kernel driver in use: radeon
00:01.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Kabini HDMI/DP Audio
Subsystem: Hewlett-Packard Company Device 226a
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h (Models 30h-3fh) Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
Subsystem: Hewlett-Packard Company Device 226a
Kernel driver in use: pcieport
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
Subsystem: Hewlett-Packard Company Device 226a
Kernel driver in use: pcieport
00:02.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
Subsystem: Hewlett-Packard Company Device 226a
Kernel driver in use: pcieport
00:02.5 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 16h Processor Functions 5:1
Subsystem: Hewlett-Packard Company Device 226a
Kernel driver in use: pcieport
00:08.0 Encryption controller: Advanced Micro Devices, Inc. [AMD] Kabini/Mullins PSP-Platform Security Processor
Subsystem: Hewlett-Packard Company Device 226a
00:10.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB XHCI Controller (rev 11)
Subsystem: Hewlett-Packard Company Device 226a
Kernel driver in use: xhci_hcd
00:11.0 SATA controller: Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode] (rev 40)
Subsystem: Hewlett-Packard Company Device 226a
Kernel driver in use: ahci
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 39)
Subsystem: Hewlett-Packard Company Device 226a
Kernel driver in use: ehci-pci
00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB EHCI Controller (rev 39)
Subsystem: Hewlett-Packard Company Device 226a
Kernel driver in use: ehci-pci
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 42)
Subsystem: Hewlett-Packard Company Device 226a
00:14.2 Audio device: Advanced Micro Devices, Inc. [AMD] FCH Azalia Controller (rev 02)
Subsystem: Hewlett-Packard Company Device 226a
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 11)
Subsystem: Hewlett-Packard Company Device 226a
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h (Models 30h-3fh) Processor Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h (Models 30h-3fh) Processor Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h (Models 30h-3fh) Processor Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h (Models 30h-3fh) Processor Function 3
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h (Models 30h-3fh) Processor Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Family 16h (Models 30h-3fh) Processor Function 5
02:00.0 Network controller: Realtek Semiconductor Co., Ltd. RTL8188EE Wireless Network Adapter (rev 01)
DeviceName: Realtek Focus RTL8188EE bgn 1x1 PCI-e HMC WW
Subsystem: Hewlett-Packard Company RTL8188EE mini-PCIe card
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL810xE PCI Express Fast Ethernet controller (rev 07)
DeviceName: Realtek PCIe FE Family Controller
Subsystem: Hewlett-Packard Company Device 226a
Kernel driver in use: r8169
04:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5229 PCI Express Card Reader (rev 01)
DeviceName: Realtek PCIE CardReader
Subsystem: Hewlett-Packard Company Device 226a

@Knogle
Copy link
Contributor

Knogle commented Feb 28, 2025

I hope you are doing fine :) Unfortunately I think it has nothing to do with nwipe. According to the trace, the OOM Killer was invoked by "ip" responsible for networking.
The nwipe process has 175M in use at this time. So I don't expect any issue from there.

@rdellaripa
Copy link
Author

Continuing to test. Just throwing these out here in case they're helpful later.

/var/log/messages:
[13229.407609] ip invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=2, oom_score_adj=0
[13229.407631] CPU: 1 UID: 0 PID: 3061 Comm: ip Tainted: G W N 6.11.11 #2
[13229.407639] Tainted: [W]=WARN, [N]=TEST
[13229.407641] Hardware name: Hewlett-Packard HP Pavilion 17 Notebook PC /226A, BIOS F.43 08/19/2015
[13229.407645] Call Trace:
[13229.407651]
[13229.407655] dump_stack_lvl+0x53/0x70
[13229.407666] dump_header+0x3f/0x1a0
[13229.407674] oom_kill_process+0xfd/0x210
[13229.407680] out_of_memory+0x242/0x570
[13229.407686] __alloc_pages_slowpath.constprop.0+0xac3/0xce0
[13229.407694] __alloc_pages_noprof+0x2a7/0x310
[13229.407700] rtl_open+0x1ae/0x4c0
[13229.407709] __dev_open+0xe4/0x180
[13229.407715] __dev_change_flags+0x1ac/0x220
[13229.407720] dev_change_flags+0x21/0x60
[13229.407725] devinet_ioctl+0x30e/0x7c0
[13229.407733] inet_ioctl+0x1c5/0x1e0
[13229.407740] sock_do_ioctl+0x77/0x140
[13229.407747] sock_ioctl+0x208/0x330
[13229.407752] __x64_sys_ioctl+0x92/0xd0
[13229.407758] do_syscall_64+0x54/0x110
[13229.407763] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[13229.407772] RIP: 0033:0x7f1f2a5a9938
[13229.407778] Code: 00 00 48 8d 44 24 08 48 89 54 24 e0 48 89 44 24 c0 48 8d 44 24 d0 48 89 44 24 c8 b8 10 00 00 00 c7 44 24 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 07 89 d0 c3 0f 1f 40 00 48 8b 15 b1 54 0d
[13229.407783] RSP: 002b:00007ffc7f2ab048 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[13229.407790] RAX: ffffffffffffffda RBX: 00007f1f2a729eea RCX: 00007f1f2a5a9938
[13229.407793] RDX: 00007ffc7f2ab0c0 RSI: 0000000000008914 RDI: 0000000000000003
[13229.407797] RBP: 0000000000000001 R08: 000000000000000b R09: 0000000000000803
[13229.407800] R10: 0000000000000012 R11: 0000000000000246 R12: 0000000000000003
[13229.407803] R13: 00007ffc7f2ab0c0 R14: ffffffffffffffff R15: 00007ffc7f2ab260
[13229.407807]
[13229.407810] Mem-Info:
[13229.407813] active_anon:29493 inactive_anon:178787 isolated_anon:0
active_file:16 inactive_file:1431402 isolated_file:0
unevictable:0 dirty:50114 writeback:0
slab_reclaimable:72284 slab_unreclaimable:6445
mapped:2481 shmem:205539 pagetables:337
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:0
free:25086 free_pcp:608 free_cma:0
[13229.407824] Node 0 active_anon:117972kB inactive_anon:715148kB active_file:64kB inactive_file:5725608kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:9924kB dirty:200456kB writeback:0kB shmem:822156kB writeback_tmp:0kB kernel_stack:2144kB pagetables:1348kB sec_pagetables:0kB all_unreclaimable? no
[13229.407833] Node 0 DMA free:13952kB boost:0kB min:20kB low:32kB high:44kB reserved_highatomic:0KB active_anon:264kB inactive_anon:1132kB active_file:0kB inactive_file:8kB unevictable:0kB writepending:8kB present:15996kB managed:15360kB mlocked:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[13229.407844] lowmem_reserve[]: 0 1919 6581 0
[13229.407852] Node 0 DMA32 free:63784kB boost:0kB min:3024kB low:4988kB high:6952kB reserved_highatomic:0KB active_anon:140kB inactive_anon:692kB active_file:0kB inactive_file:1771508kB unevictable:0kB writepending:193188kB present:2031428kB managed:1965692kB mlocked:0kB bounce:0kB free_pcp:384kB local_pcp:0kB free_cma:0kB
[13229.407862] lowmem_reserve[]: 0 0 4662 0
[13229.407869] Node 0 Normal free:22608kB boost:0kB min:7344kB low:12116kB high:16888kB reserved_highatomic:0KB active_anon:117576kB inactive_anon:713532kB active_file:64kB inactive_file:3953628kB unevictable:0kB writepending:6512kB present:5226496kB managed:5029416kB mlocked:0kB bounce:0kB free_pcp:2024kB local_pcp:0kB free_cma:0kB
[13229.407879] lowmem_reserve[]: 0 0 0 0
[13229.407886] Node 0 DMA: 1114kB (M) 618kB (M) 3216kB (M) 1332kB (M) 164kB (M) 2128kB (M) 0256kB 1512kB (M) 11024kB (U) 12048kB (U) 24096kB (M) = 13956kB
[13229.407917] Node 0 DMA32: 13559
4kB (UE) 12228kB (UE) 016kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 04096kB = 64012kB
[13229.407938] Node 0 Normal: 35544kB (UM) 11308kB (UM) 016kB 032kB 064kB 0128kB 0256kB 0512kB 01024kB 02048kB 0*4096kB = 23256kB
[13229.407960] 1636992 total pagecache pages
[13229.407962] 0 pages in swap cache
[13229.407963] Free swap = 0kB
[13229.407965] Total swap = 0kB
[13229.407967] 1818480 pages RAM
[13229.407968] 0 pages HighMem/MovableOnly
[13229.407970] 65863 pages reserved
[13229.407972] 0 pages cma reserved
[13229.407973] Tasks state (memory values in pages):
[13229.407975] [ pid ] uid tgid total_vm rss rss_anon rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
[13229.407981] [ 143] 0 143 1049 616 32 0 584 49152 0 0 syslogd
[13229.407989] [ 147] 0 147 1049 618 32 0 586 45056 0 0 klogd
[13229.407996] [ 163] 0 163 629 231 0 0 231 40960 0 0 thd
[13229.408002] [ 170] 100 170 1175 329 33 0 296 49152 0 0 dbus-daemon
[13229.408009] [ 174] 0 174 670 329 32 0 297 49152 0 0 rpcbind
[13229.408015] [ 179] 0 179 1091 757 64 0 693 49152 0 0 shredos_net.sh
[13229.408021] [ 185] 101 185 19262 784 204 0 580 61440 0 0 ntpd
[13229.408027] [ 191] 0 191 1049 621 32 0 589 45056 0 0 crond
[13229.408034] [ 199] 0 199 689 101 0 0 101 45056 0 0 tftpd
[13229.408039] [ 205] 0 205 945 381 78 0 303 49152 0 0 collectd
[13229.408046] [ 206] 0 206 914 676 64 0 612 53248 0 0 login
[13229.408051] [ 207] 0 207 1124 825 96 0 729 40960 0 0 nwipe_launcher
[13229.408058] [ 208] 0 208 1157 973 128 0 845 45056 0 0 sh
[13229.408064] [ 459] 0 459 42670 1405 640 0 765 77824 0 0 nwipe
[13229.408070] [ 3624] 0 3624 1356 1112 480 0 632 53248 0 0 top
[13229.408076] [ 3642] 0 3642 1150 991 128 0 863 53248 0 0 bash
[13229.408082] [ 4387] 0 4387 1150 489 123 0 366 53248 0 0 bash
[13229.408088] [ 4388] 0 4388 2479 975 128 0 847 57344 0 0 tee
[13229.408095] [ 2880] 0 2880 2479 1028 128 0 900 65536 0 0 sleep
[13229.408101] [ 3058] 0 3058 1049 635 32 0 603 45056 0 0 ifup
[13229.408106] [ 3061] 0 3061 1049 556 32 0 524 53248 0 0 ip
[13229.408112] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/,task=nwipe,pid=459,uid=0
[13229.408154] Out of memory: Killed process 459 (nwipe) total-vm:170680kB, anon-rss:2560kB, file-rss:0kB, shmem-rss:3060kB, UID:0 pgtables:76kB oom_score_adj:0

meminfo from just before and just after the oom:
Sun Mar 2 16:52:10 UTC 2025
MemTotal: 7010468 kB
MemFree: 50704 kB
MemAvailable: 6004220 kB
Buffers: 5757772 kB
Cached: 822288 kB
SwapCached: 0 kB
Active: 118032 kB
Inactive: 6473260 kB
Active(anon): 117968 kB
Inactive(anon): 715360 kB
Active(file): 64 kB
Inactive(file): 5757900 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 223828 kB
Writeback: 0 kB
AnonPages: 11088 kB
Mapped: 9872 kB
Shmem: 822152 kB
KReclaimable: 287928 kB
Slab: 313696 kB
SReclaimable: 287928 kB
SUnreclaim: 25768 kB
KernelStack: 2160 kB
PageTables: 1056 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3505232 kB
Committed_AS: 874144 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 12260 kB
VmallocChunk: 0 kB
Percpu: 1440 kB
CmaTotal: 0 kB
CmaFree: 0 kB
DirectMap4k: 16196 kB
DirectMap2M: 5160960 kB
DirectMap1G: 2097152 kB

Sun Mar 2 16:53:10 UTC 2025
MemTotal: 7010468 kB
MemFree: 6101760 kB
MemAvailable: 6049556 kB
Buffers: 1112 kB
Cached: 822276 kB
SwapCached: 0 kB
Active: 119772 kB
Inactive: 712280 kB
Active(anon): 118656 kB
Inactive(anon): 712112 kB
Active(file): 1116 kB
Inactive(file): 168 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 176 kB
Writeback: 0 kB
AnonPages: 8736 kB
Mapped: 8940 kB
Shmem: 822104 kB
KReclaimable: 10000 kB
Slab: 35792 kB
SReclaimable: 10000 kB
SUnreclaim: 25792 kB
KernelStack: 2140 kB
PageTables: 1484 kB
SecPageTables: 0 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 3505232 kB
Committed_AS: 838524 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 12244 kB
VmallocChunk: 0 kB
Percpu: 1440 kB
CmaTotal: 0 kB
CmaFree: 0 kB
DirectMap4k: 16196 kB
DirectMap2M: 5160960 kB
DirectMap1G: 2097152 kB

@rdellaripa
Copy link
Author

Further testing has shown that this issue occurs with ISAAC and ISAAC64 as well as lagged Fibonacci. I have not been able to reproduce it on Mersene Twister or XO Rohsiro.

@DerGregor
Copy link

I would also like to join in here. I have exactly the same "ip invoked oom-killer" problem.
see attached log.
the only change in the settings was the wipe method to "Fill with Zeros".

As IP invoked the oom-killer maybe it's something to do with the network. Can you plug an ethernet lead in and connect it to a network running a DHCP server. I'm wondering whether IP software running in the background is causing a memory leak. If plugging into a network fixes the problem then it will give me an idea where to look.

I have disabled the onboard LAN adapter in the BIOS.
so far the wipe process is running. I have never gotten above 0.3% before. i am now at 1.5%.

dmesg_2025-03-05-11-52-53_system_uuid_00000000-0000-0000-0000-4ccc6ad83f18.txt

@Knogle
Copy link
Contributor

Knogle commented Mar 7, 2025

Can you maybe run ip -V please, i'll try to find out if there is some sort of issue with ip.

@DerGregor
Copy link

Can you maybe run ip -V please, i'll try to find out if there is some sort of issue with ip.

i hope i understand you correctly. i opened another console with ALT + F2 and entered ip -V. but this parameter does not seem to exist.

Image

@DerGregor
Copy link

DerGregor commented Mar 7, 2025

short update. system is still running and stable with deactivated network adapter.
Image
I will let the system run over the weekend and report back if it was successful. I may then run it again with the network adapter activated and an active network connection. maybe it really only happens when you are offline and not with an active connection

Is there a reason why ShredOS comes with network support?

@PartialVolume
Copy link
Owner

PartialVolume commented Mar 8, 2025

Whether you use the network is entirely optional however ShredOS supports NTP (Network Time Protocol) so if you are plugged into a network with internet access it will attempt to contact a NTP server to check what the date and time is and adjust it if the system clock is wrong. If you don't want to connect to a network then you can adjust the system date and time via Nwipe's config menu 'c' (accessed via the drive selection screen) or alternatively via the command line.

Network access is required if you are PXE booting.

Network access is also required if you want to upload your PDFs and logs automatically to a lftp, ftp or sftp server.

In addition you can store Nwipe's config and customer list files remotely so if you are running a hundred ShredOS systems they all boot up with the same customer list and config. This is configured via the shredos_output=".." and shredos_config=".." commands that are placed on the kernel command line in grub.cfg

@PartialVolume
Copy link
Owner

PartialVolume commented Mar 8, 2025

maybe it really only happens when you are offline and not with an active connection

That's a possibility, I generally run ShredOS with an active internet enabled network on my desktop systems however I run laptop wipes without a network connection.

When I get a chance I'll run the desktops (that have more than one ethernet connection) without a network connection and see if I can reproduce the out of memory. It might also be specific to your hardware as well as only happening with an active network connection. I need to take a closer look at your lspci -k output.

@PartialVolume
Copy link
Owner

One other thought, when there is no active network connection the program called 'ip' is executed periodically, I don't remember how often whether every 5 seconds or 30 seconds to check the ethernet port status. This polling doesn't occur once a IP address has been requested and received from the DHCP server so it's possible that if the program called 'ip' is not freeing up allocated memory on exit then slowly memory will get used up.

As noted from previous comments about the -V option, the version of 'ip' being used is the stripped down busybox version. Just something to bear in mind when searching for the root cause of this issue.

@DerGregor
Copy link

I need to take a closer look at your lspci -k output.

yeh, I'll provide it on Monday (with enabled LAN adapter)

This polling doesn't occur once a IP address has been requested and received from the DHCP server so it's possible that if the program called 'ip' is not freeing up allocated memory on exit then slowly memory will get used up.

I could also test that on monday. I still have a few old routers lying around here so at least there would be a network to connect to, albeit without internet.

@rdellaripa
Copy link
Author

rdellaripa commented Mar 8, 2025

I'm actually running a wipe right now connected just to an old switch so it doesn't have DCHP, internet, etc.

What I don't understand at this point is that I've run top -oVIRT in another console, and I haven't seen ip just hanging out there slowly eating up memory. But it is obvious something is, because MemFree goes from 50704 kB to 6101760 kB free when the oom triggers...despite nwipe (or ntpd, the second largest process) growing in size over time. I thought maybe a device driver wasn't freeing memory, but lsmod doesn't show anything.

PS: I just noticed that pre-oom, meminfo shows Buffers as 5757772 kB and after, 1112 kB.

@DerGregor
Copy link

so first of all good news. switching off the NIC helped. my disks were successfully zeroed. Duration: approx. 44-45 hours.
today I reactivated the NIC in the BIOS and started a wipe process again. as expected, the process failed again after 5-10 minutes.
here is another log:
dmesg_2025-03-10-09-14-23_system_uuid_00000000-0000-0000-0000-4ccc6ad83f18.txt
here is the requested lspci -k output

lspci.txt

This polling doesn't occur once a IP address has been requested and received from the DHCP server so it's possible that if the program called 'ip' is not freeing up allocated memory on exit then slowly memory will get used up.

I have now started a new wipe process but this time I have connected an old TP-Link router. a network with DHCP etc. but without internet.
The process has been running smoothly so far. at least for 30 minutes. that's a good sign for me. because the system never ran that long when the process crashed.

@rdellaripa
Copy link
Author

Attached is the output of

cat /proc/meminfo
slabtop -o -s c | head -20
tail -25 /var/log/messages

run once a minute, snipped to the minute before and after the oom.
again, most of the memory seems to be in buffers. I'm not sure at the moment how to dig deeper on that.

meminfo-2025-03-10.txt

@PartialVolume
Copy link
Owner

Can you try killing the ntpd process and dhcpd process in turn in order to try and identify which one is causing the issue.

With no network connection I still can't reproduce the issue on my laptop. I was reading about some buffer overflow problem in ntpd but that was way back in the early 2000s. Triggered by a rogue ntp server. So maybe worth killing those processes to see if they are the cause.

@rdellaripa
Copy link
Author

I killed the ntpd process and the issue recurred. I can't kill the dhcpd process, as its spawned from an "if up" from shredos_net.sh

@PartialVolume
Copy link
Owner

I'm guessing then that killing shredos_net.sh then stops the out of memory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants