- 
                Notifications
    You must be signed in to change notification settings 
- Fork 129
mempressure: Fix Linux memory detection and reduce memory check overhead #219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mempressure: Fix Linux memory detection and reduce memory check overhead #219
Conversation
        
          
                src/util/mempressure.cpp
              
                Outdated
          
        
      | // Format: "MemAvailable: 1234567 kB" | ||
| size_t pos = line.find_first_of("0123456789"); | ||
| if (pos != std::string::npos) { | ||
| uint64_t mem_kb = std::stoull(line.substr(pos)); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have exception handling for invalid number format in case std::stoull throws an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a try/catch block around this
        
          
                src/init.cpp
              
                Outdated
          
        
      | // During normal operation: check every 10 minutes (between block arrivals). | ||
| // FlushStateToDisk calls SystemNeedsMemoryReleased() which uses the cached flag. | ||
| std::function<void()> check_memory_pressure; | ||
| check_memory_pressure = [&node, &scheduler, &check_memory_pressure]() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Self-referential lambda capture is undefined behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely need to fix this
        
          
                src/validation.cpp
              
                Outdated
          
        
      |  | ||
| // After flushing, immediately recheck memory pressure to update the flag | ||
| // This prevents repeated flushes on the same memory pressure event | ||
| if (SystemNeedsMemoryReleased()) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a race condition where the flag could be set again between check and update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback Kyle… all of this needs more work. Simply a proof of concept sketch for now… making sure it’s addressing Luke’s issue first. Looks like it’s on the right track conceptually, I need to put some real work into it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes definitely on the right track, great job!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a race condition where the flag could be set again between check and update.
good consideration though.. i'm reviewing your points now.
regarding the race condition - though yes, technically it exists but is benign. the actual code is changing a bit as I move forward (i.e. the recheck for mempressure will only happen IF we are releasing cache memory because of mempressure - it will no longer happen at the end of ANY flush) but it will still exist. not sure there is more to do here. the flag being set is atomic. don't want to add mutex in the middle of FlushStateToDisk, etc..
that said, if you have a suggestion please share. I'm not a c++ expert! ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
fee03eb    to
    775b944      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid implementation. The overflow check is the only real concern, and it's an edge case (would need 16+ exabytes in /proc/meminfo).
For macOS implementation, consider adding:
#ifdef __APPLE__
    // Use vm_stat or host_statistics64()
#endif
        
          
                src/util/mempressure.cpp
              
                Outdated
          
        
      | if (pos != std::string::npos) { | ||
| try { | ||
| uint64_t mem_kb = std::stoull(line.substr(pos)); | ||
| uint64_t available_memory = mem_kb * 1024; // Convert kB to bytes | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider ading an overflow check before multiplication:
if (mem_kb > UINT64_MAX / 1024) {
    // Treat huge values as "plenty of memory"
    break;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
easy enough. done.
b000f56    to
    8310027      
    Compare
  
    | 
 Can you share the steps to test this on Windows? | 
| 
 No idea (yet) specifically for windows. Generally testing has been setting up an environment that allows easier testing of our required conditions. Conditions being simply low available memory (default setting is <64MB). In my Linux testing, I have been using an Ubuntu VM with 4GB RAM and setting dbcachesize to 3.5GB. Fairly guaranteed to run out of memory during IBD. probably easier would be a VM with 2GB mem, etc… other potentials on a system with more total RAM, would be to artificially stress the memory to trigger low mem while running bitcoind in IBD… something like stress-ng or a script to use up/allocate a bunch of memory. generally I’m watching debug.log for the Memory pressure output.. | 
| 
 Docker Memory Pressure Testing GuideThis guide shows how to test the memory pressure detection features in a Docker container with limited memory. Prerequisites
 Step 1: Build the Docker ImageFrom the Bitcoin repository root, build the Docker image: docker build -t bitcoin-knots:mempressure -f contrib/docker/Dockerfile .This compiles your modified code into a Docker image based on Alpine Linux. Note: The build may take 10-30 minutes depending on your system. Step 2: Prepare Test Directory and ConfigCreate a directory for the blockchain data and a minimal config file: # Create data directory
mkdir -p ~/bitcoin_mempressure_test
# Create bitcoin.conf with high dbcache to trigger memory pressure
cat > ~/bitcoin_mempressure_test/bitcoin.conf << 'EOF'
dbcache=1800
EOFStep 3: Run Container with Memory LimitRun the container with a 2GB memory limit: docker run -it --rm --memory=2g --name bitcoin-test \
  -v ~/bitcoin_mempressure_test:/var/lib/bitcoind \
  bitcoin-knots:mempressure \
  -conf=/var/lib/bitcoind/bitcoin.conf -printtoconsoleCommand explanation:
 Step 4: Monitor Memory Pressure DetectionWatch the logs for memory pressure events. You should see messages like: On Linux (container detected via cgroup): On Windows (container detected via Job Object): Note: Memory pressure triggers when available memory drops below the threshold (default 64MB = 67108864 bytes). With dbcache=1800 in a 2GB container, you should see pressure events as the cache fills up during IBD or block processing. Make sure that after a memory pressure event in log, we then see a Flush and the next UpdateTip cache should be 0. we should NOT immediately get another Flush attempt. If you want to go wild and watch mem usage to confirm memory release and filter debug.log to useful info: watch -n 1 "docker stats bitcoin-test --no-stream"
tail -f ~/bitcoin_mempressure_test/debug.log | grep -E '(CheckMemoryPressure|Flushing large)'CleanupRemove test data when done: rm -rf ~/bitcoin_mempressure_testRemove Docker image: docker rmi bitcoin-knots:mempressure | 
3df8675    to
    e293be4      
    Compare
  
    … for macOS Linux: Use /proc/meminfo MemAvailable with cgroup memory limit detection for container environments Windows: Use GlobalMemoryStatusEx with Job Object support for container macOS: Use kern.memorystatus_vm_pressure_level sysctl to detect kernel CheckMemoryPressure(): updates atomic flag SystemNeedsMemoryReleased(): read of cached flag Schedule CheckMemoryPressure to run every 10 seconds
7889ded    to
    83d51c7      
    Compare
  
    83d51c7    to
    100bad0      
    Compare
  
    
Use /proc/meminfo MemAvailable instead of sysinfo() freeram+bufferram on Linux. MemAvailable accounts for reclaimable memory (page cache, etc), providing accurate memory availability.
Added support for macOS using kern.memorystatus_vm_pressure_level sysctl
Added support for containers on both linux and windows. (new macOS native containers to TBD)
Memory pressure is periodically via Scheduler (10s interval). SystemNeedsMemoryReleased() (used in FlushStateToDisk) now reads a cached flag instead of rechecking available memory each time. After a Flush/Sync, Update flag immediately after to prevent repeated flushes.
Fixes #188
Addresses #216
Regarding #216 point:
After much experimentation found that the custom memory allocator (PoolAllocator -> Pool.h - which uses a singly-linked list for its free list), is fairly hard to use in context of trying to free partial amounts of allocated memory. The current way of freeing this memory is to completely destroy and recreate it (in ReallocateCache which happens in Flush). All or nothing. So, doing anything other than a full Flush, will not help with memory.
Using Flush when we have a low available memory seems to work just fine though. (tested multiple times on linux, with <64MB of available memory).
Adding some instructions for creating a Docker image to test containers, in post below.
Needs further testing altogether. Specifically:
Windows testing
Linux testing (Tested)
MacOS testing
Container testing: Linux container, Windows container, Mac container
Edit 10/20/25: Still working on best metrics for determining container mempressure.