[LLDB] Deterministic module order in `Target::SetExecutableModule` #10746

DmT021 · 2025-05-24T21:43:58Z

This change separates module loading from symbol preloading to ensure a deterministic module loading order. Parallel module loading was causing non-deterministic behavior, but we only need to load symbols in parallel as that's the heavy part of the process.

Discussion https://discourse.llvm.org/t/non-deterministic-module-order-in-lldb/86446

This change separates module loading from symbol preloading to ensure a deterministic module loading order. Parallel module loading was causing non-deterministic behavior, but we only need to load symbols in parallel as that's the heavy part of the process.

DmT021 · 2025-05-24T21:44:59Z

@swift-ci Please test

dmpots · 2025-05-27T19:56:31Z

lldb/include/lldb/Target/Target.h

+  ///     will likely be provided.
+  lldb::ModuleSP GetOrCreateModule(
+      const ModuleSpec &module_spec, bool notify,
+      llvm::function_ref<void(std::function<void()>)> scheduleSymbolsPreload,


Do we need a separate overload for this or could we use a default value of nullptr for scheduleSymbolsPreload? It looks like llvm::function_ref has a suitable operator bool() defined.

I suppose it might require updating more callsites that currently pass the Status output pointer.

dmpots · 2025-05-27T21:59:21Z

lldb/test/API/functionalities/deterministic_module_order/Makefile

@@ -0,0 +1,26 @@
+C_SOURCES := main.c
+LD_EXTRAS := -L. -la -lb -lc -ld -le


I ran into a few problems when running this test on linux.

The -lc was confusing the locally build libc.so with the "real" libc and causing linking errors.

The linked shared libraries did not use a full path so only the basename is stored in the DT_NEEDED section. This does not resolve to a unique file location so it they are skipped by the load dependent code.

I worked around the first problem by not linking libc.so into a.out. But then when running the test with trace enabled I see the following output.

First run module list: ['a.out'] Run 2 module list: ['a.out'] ...

Obviously this is not going to detect a non-deterministic module order for linux. But if we change it to link to the full path of the shared objects

--- a/lldb/test/API/functionalities/deterministic_module_order/Makefile +++ b/lldb/test/API/functionalities/deterministic_module_order/Makefile @@ -1,9 +1,12 @@ C_SOURCES := main.c -LD_EXTRAS := -L. -la -lb -lc -ld -le +include Makefile.rules + +# Use full path to shared libraries to make test effective on linux. +LD_EXTRAS := $(BUILDDIR)/liba.so $(BUILDDIR)/libb.so $(BUILDDIR)/libc.so $(BUILDDIR)/libd.so $(BUILDDIR)/libe.so a.out: liba libb libc libd libe -include Makefile.rules +

Then the test looks like it would work

First run module list: ['a.out', 'liba.so', 'libb.so', 'libc.so', 'libd.so', 'libe.so'] Run 2 module list: ['a.out', 'liba.so', 'libb.so', 'libc.so', 'libd.so', 'libe.so'] ...

dmpots · 2025-05-27T22:01:16Z

lldb/source/Target/Target.cpp

+      ModuleSpec module_spec(platform_dependent_file_spec, m_arch.GetSpec());
+      auto image_module_sp = GetOrCreateModule(
+          module_spec, false /* notify */,
+          [&](auto preloadSymbols) { task_group.async(preloadSymbols); });


I think it would be good to gate the parallel loading of modules here based on the with the target.parallel-module-load setting.

@GeorgeHuyubo

Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered (his previous [attempt](#160225)). This change can be summarized as the following: * Plumb through a boolean flag to force no preload in `GetOrCreateModules` all the way through to `LoadModuleAtAddress`. * Parallelize `Module::PreloadSymbols` separately from `Target::GetOrCreateModule` and its caller `LoadModuleAtAddress` (this is what avoids the deadlock). These changes roughly maintain the performance characteristics of the previous implementation of parallel module loading. Testing on targets with between 5000 and 14000 modules, I saw similar numbers as before, often more than 10% faster in the new implementation across multiple trials for these massive targets. I think it's because we have less lock contention with this approach. # The deadlock See [bt.txt](https://github.com/user-attachments/files/22524471/bt.txt) for a sample backtrace of LLDB when the deadlock occurs. As @GeorgeHuyubo explains in his [PR](#160225), the deadlock occurs from an ABBA deadlock that happens when a thread context-switches out of `Module::PreloadSymbols`, goes into `Target::GetOrCreateModule` for another module, possibly entering this block: ``` if (!module_sp) { // The platform is responsible for finding and caching an appropriate // module in the shared module cache. if (m_platform_sp) { error = m_platform_sp->GetSharedModule( module_spec, m_process_sp.get(), module_sp, &search_paths, &old_modules, &did_create_module); } else { error = Status::FromErrorString("no platform is currently set"); } } ``` `Module::PreloadSymbols` holds a module-level mutex, and then `GetSharedModule` *attempts* to hold the mutex of the global shared `ModuleList`. So, this thread holds the module mutex, and waits on the global shared `ModuleList` mutex. A competing thread may execute `Target::GetOrCreateModule`, enter the same block as above, grabbing the global shared `ModuleList` mutex. Then, in `ModuleList::GetSharedModule`, we eventually call `ModuleList::FindModules` which eventually waits for the `Module` mutex held by the first thread (via `Module::GetUUID`). Thus, we deadlock. ## Reproducing the deadlock It might be worth noting that I've never been able to observe this deadlock issue during live debugging (e.g. launching or attaching to processes), however we were able to consistently reproduce this issue with coredumps when using the following settings: ``` (lldb) settings set target.parallel-module-load true (lldb) settings set target.preload-symbols true (lldb) settings set symbols.load-on-demand false (lldb) target create --core /some/core/file/here # deadlock happens ``` ## How this change avoids this deadlock This change avoids concurrent executions of `Module::PreloadSymbols` with `Target::GetOrCreateModule` by waiting until after the `Target::GetOrCreateModule` executions to run `Module::PreloadSymbols` in parallel. This avoids the ordering of holding a Module lock *then* the ModuleList lock, as `Target::GetOrCreateModule` executions maintain the ordering of the shared ModuleList lock first (from what I've read and tested). ## Why not read-write lock? Some feedback in #160225 was to modify mutexes used in these components with read-write locks. This might be a good idea overall, but I don't think it would *easily* resolve this specific deadlock. `Module::PreloadSymbols` would probably need a write lock to Module, so even if we had a read lock in `Module::GetUUID` we would still contend. Maybe the `ModuleList` lock could be a read lock that converts to a write lock if it chooses to update the module, but it seems likely that some thread would try to update the shared module list and then the write lock would contend again. Perhaps with deeper architectural changes, we could fix this issue? # Other attempts One downside of this approach (and the former approach of parallel module loading) is that each DYLD would need to implement this pattern themselves. With @clayborg's help, I looked at a few other approaches: * In `Target::GetOrCreateModule`, backgrounding the `Module::PreloadSymbols` call by adding it directly to the thread pool via `Debugger::GetThreadPool().async()`. This required adding a lock to `Module::SetLoadAddress` (probably should be one there already) since `ObjectFileELF::SetLoadAddress` is not thread-safe (updates sections). Unfortunately, during execution, this causes the preload symbols to run synchronously with `Target::GetOrCreateModule`, preventing us from truly parallelizing the execution. * In `Module::PreloadSymbols`, backgrounding the `symtab` and `sym_file` `PreloadSymbols` calls individually, but similar issues as the above. * Passing a callback function like swiftlang#10746 instead of the boolean I use in this change. It's functionally the same change IMO, with some design tradeoffs: * Pro: the caller doesn't need to explicitly call `Module::PreloadSymbols` itself, and can instead call whatever function is passed into the callback. * Con: the caller needs to delay the execution of the callback such that it occurs after the `GetOrCreateModule` logic, otherwise we run into the same issue. I thought this would be trickier for the caller, requiring some kinda condition variable or otherwise storing the calls to execute afterwards. # Test Plan: ``` ninja check-lldb ``` --------- Co-authored-by: Tom Yang <[email protected]>

@GeorgeHuyubo

… (#166480) Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered (his previous [attempt](llvm/llvm-project#160225)). This change can be summarized as the following: * Plumb through a boolean flag to force no preload in `GetOrCreateModules` all the way through to `LoadModuleAtAddress`. * Parallelize `Module::PreloadSymbols` separately from `Target::GetOrCreateModule` and its caller `LoadModuleAtAddress` (this is what avoids the deadlock). These changes roughly maintain the performance characteristics of the previous implementation of parallel module loading. Testing on targets with between 5000 and 14000 modules, I saw similar numbers as before, often more than 10% faster in the new implementation across multiple trials for these massive targets. I think it's because we have less lock contention with this approach. # The deadlock See [bt.txt](https://github.com/user-attachments/files/22524471/bt.txt) for a sample backtrace of LLDB when the deadlock occurs. As @GeorgeHuyubo explains in his [PR](llvm/llvm-project#160225), the deadlock occurs from an ABBA deadlock that happens when a thread context-switches out of `Module::PreloadSymbols`, goes into `Target::GetOrCreateModule` for another module, possibly entering this block: ``` if (!module_sp) { // The platform is responsible for finding and caching an appropriate // module in the shared module cache. if (m_platform_sp) { error = m_platform_sp->GetSharedModule( module_spec, m_process_sp.get(), module_sp, &search_paths, &old_modules, &did_create_module); } else { error = Status::FromErrorString("no platform is currently set"); } } ``` `Module::PreloadSymbols` holds a module-level mutex, and then `GetSharedModule` *attempts* to hold the mutex of the global shared `ModuleList`. So, this thread holds the module mutex, and waits on the global shared `ModuleList` mutex. A competing thread may execute `Target::GetOrCreateModule`, enter the same block as above, grabbing the global shared `ModuleList` mutex. Then, in `ModuleList::GetSharedModule`, we eventually call `ModuleList::FindModules` which eventually waits for the `Module` mutex held by the first thread (via `Module::GetUUID`). Thus, we deadlock. ## Reproducing the deadlock It might be worth noting that I've never been able to observe this deadlock issue during live debugging (e.g. launching or attaching to processes), however we were able to consistently reproduce this issue with coredumps when using the following settings: ``` (lldb) settings set target.parallel-module-load true (lldb) settings set target.preload-symbols true (lldb) settings set symbols.load-on-demand false (lldb) target create --core /some/core/file/here # deadlock happens ``` ## How this change avoids this deadlock This change avoids concurrent executions of `Module::PreloadSymbols` with `Target::GetOrCreateModule` by waiting until after the `Target::GetOrCreateModule` executions to run `Module::PreloadSymbols` in parallel. This avoids the ordering of holding a Module lock *then* the ModuleList lock, as `Target::GetOrCreateModule` executions maintain the ordering of the shared ModuleList lock first (from what I've read and tested). ## Why not read-write lock? Some feedback in llvm/llvm-project#160225 was to modify mutexes used in these components with read-write locks. This might be a good idea overall, but I don't think it would *easily* resolve this specific deadlock. `Module::PreloadSymbols` would probably need a write lock to Module, so even if we had a read lock in `Module::GetUUID` we would still contend. Maybe the `ModuleList` lock could be a read lock that converts to a write lock if it chooses to update the module, but it seems likely that some thread would try to update the shared module list and then the write lock would contend again. Perhaps with deeper architectural changes, we could fix this issue? # Other attempts One downside of this approach (and the former approach of parallel module loading) is that each DYLD would need to implement this pattern themselves. With @clayborg's help, I looked at a few other approaches: * In `Target::GetOrCreateModule`, backgrounding the `Module::PreloadSymbols` call by adding it directly to the thread pool via `Debugger::GetThreadPool().async()`. This required adding a lock to `Module::SetLoadAddress` (probably should be one there already) since `ObjectFileELF::SetLoadAddress` is not thread-safe (updates sections). Unfortunately, during execution, this causes the preload symbols to run synchronously with `Target::GetOrCreateModule`, preventing us from truly parallelizing the execution. * In `Module::PreloadSymbols`, backgrounding the `symtab` and `sym_file` `PreloadSymbols` calls individually, but similar issues as the above. * Passing a callback function like swiftlang/llvm-project#10746 instead of the boolean I use in this change. It's functionally the same change IMO, with some design tradeoffs: * Pro: the caller doesn't need to explicitly call `Module::PreloadSymbols` itself, and can instead call whatever function is passed into the callback. * Con: the caller needs to delay the execution of the callback such that it occurs after the `GetOrCreateModule` logic, otherwise we run into the same issue. I thought this would be trickier for the caller, requiring some kinda condition variable or otherwise storing the calls to execute afterwards. # Test Plan: ``` ninja check-lldb ``` --------- Co-authored-by: Tom Yang <[email protected]>

@GeorgeHuyubo

Another attempt at resolving the deadlock issue @GeorgeHuyubo discovered (his previous [attempt](llvm/llvm-project#160225)). This change can be summarized as the following: * Plumb through a boolean flag to force no preload in `GetOrCreateModules` all the way through to `LoadModuleAtAddress`. * Parallelize `Module::PreloadSymbols` separately from `Target::GetOrCreateModule` and its caller `LoadModuleAtAddress` (this is what avoids the deadlock). These changes roughly maintain the performance characteristics of the previous implementation of parallel module loading. Testing on targets with between 5000 and 14000 modules, I saw similar numbers as before, often more than 10% faster in the new implementation across multiple trials for these massive targets. I think it's because we have less lock contention with this approach. # The deadlock See [bt.txt](https://github.com/user-attachments/files/22524471/bt.txt) for a sample backtrace of LLDB when the deadlock occurs. As @GeorgeHuyubo explains in his [PR](llvm/llvm-project#160225), the deadlock occurs from an ABBA deadlock that happens when a thread context-switches out of `Module::PreloadSymbols`, goes into `Target::GetOrCreateModule` for another module, possibly entering this block: ``` if (!module_sp) { // The platform is responsible for finding and caching an appropriate // module in the shared module cache. if (m_platform_sp) { error = m_platform_sp->GetSharedModule( module_spec, m_process_sp.get(), module_sp, &search_paths, &old_modules, &did_create_module); } else { error = Status::FromErrorString("no platform is currently set"); } } ``` `Module::PreloadSymbols` holds a module-level mutex, and then `GetSharedModule` *attempts* to hold the mutex of the global shared `ModuleList`. So, this thread holds the module mutex, and waits on the global shared `ModuleList` mutex. A competing thread may execute `Target::GetOrCreateModule`, enter the same block as above, grabbing the global shared `ModuleList` mutex. Then, in `ModuleList::GetSharedModule`, we eventually call `ModuleList::FindModules` which eventually waits for the `Module` mutex held by the first thread (via `Module::GetUUID`). Thus, we deadlock. ## Reproducing the deadlock It might be worth noting that I've never been able to observe this deadlock issue during live debugging (e.g. launching or attaching to processes), however we were able to consistently reproduce this issue with coredumps when using the following settings: ``` (lldb) settings set target.parallel-module-load true (lldb) settings set target.preload-symbols true (lldb) settings set symbols.load-on-demand false (lldb) target create --core /some/core/file/here # deadlock happens ``` ## How this change avoids this deadlock This change avoids concurrent executions of `Module::PreloadSymbols` with `Target::GetOrCreateModule` by waiting until after the `Target::GetOrCreateModule` executions to run `Module::PreloadSymbols` in parallel. This avoids the ordering of holding a Module lock *then* the ModuleList lock, as `Target::GetOrCreateModule` executions maintain the ordering of the shared ModuleList lock first (from what I've read and tested). ## Why not read-write lock? Some feedback in llvm/llvm-project#160225 was to modify mutexes used in these components with read-write locks. This might be a good idea overall, but I don't think it would *easily* resolve this specific deadlock. `Module::PreloadSymbols` would probably need a write lock to Module, so even if we had a read lock in `Module::GetUUID` we would still contend. Maybe the `ModuleList` lock could be a read lock that converts to a write lock if it chooses to update the module, but it seems likely that some thread would try to update the shared module list and then the write lock would contend again. Perhaps with deeper architectural changes, we could fix this issue? # Other attempts One downside of this approach (and the former approach of parallel module loading) is that each DYLD would need to implement this pattern themselves. With @clayborg's help, I looked at a few other approaches: * In `Target::GetOrCreateModule`, backgrounding the `Module::PreloadSymbols` call by adding it directly to the thread pool via `Debugger::GetThreadPool().async()`. This required adding a lock to `Module::SetLoadAddress` (probably should be one there already) since `ObjectFileELF::SetLoadAddress` is not thread-safe (updates sections). Unfortunately, during execution, this causes the preload symbols to run synchronously with `Target::GetOrCreateModule`, preventing us from truly parallelizing the execution. * In `Module::PreloadSymbols`, backgrounding the `symtab` and `sym_file` `PreloadSymbols` calls individually, but similar issues as the above. * Passing a callback function like swiftlang/llvm-project#10746 instead of the boolean I use in this change. It's functionally the same change IMO, with some design tradeoffs: * Pro: the caller doesn't need to explicitly call `Module::PreloadSymbols` itself, and can instead call whatever function is passed into the callback. * Con: the caller needs to delay the execution of the callback such that it occurs after the `GetOrCreateModule` logic, otherwise we run into the same issue. I thought this would be trickier for the caller, requiring some kinda condition variable or otherwise storing the calls to execute afterwards. # Test Plan: ``` ninja check-lldb ``` --------- Co-authored-by: Tom Yang <[email protected]> Signed-off-by: Hafidz Muzakky <[email protected]>

DmT021 marked this pull request as ready for review May 25, 2025 16:45

DmT021 requested a review from a team as a code owner May 25, 2025 16:45

dmpots reviewed May 27, 2025

View reviewed changes

zhyty mentioned this pull request Nov 5, 2025

[lldb] fix parallel module loading deadlock for Linux DYLD llvm/llvm-project#166480

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LLDB] Deterministic module order in `Target::SetExecutableModule` #10746

[LLDB] Deterministic module order in `Target::SetExecutableModule` #10746

Uh oh!

DmT021 commented May 24, 2025 •

edited

Loading

Uh oh!

DmT021 commented May 24, 2025

Uh oh!

dmpots May 27, 2025

Uh oh!

dmpots May 27, 2025

Uh oh!

dmpots May 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -0,0 +1,26 @@
		C_SOURCES := main.c
		LD_EXTRAS := -L. -la -lb -lc -ld -le

[LLDB] Deterministic module order in Target::SetExecutableModule #10746

Are you sure you want to change the base?

[LLDB] Deterministic module order in Target::SetExecutableModule #10746

Uh oh!

Conversation

DmT021 commented May 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DmT021 commented May 24, 2025

Uh oh!

dmpots May 27, 2025

Choose a reason for hiding this comment

Uh oh!

dmpots May 27, 2025

Choose a reason for hiding this comment

Uh oh!

dmpots May 27, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[LLDB] Deterministic module order in `Target::SetExecutableModule` #10746

[LLDB] Deterministic module order in `Target::SetExecutableModule` #10746

DmT021 commented May 24, 2025 •

edited

Loading