Fix leaks in asynchronous mode #358

JuanGonzalezCaminero · 2025-03-12T09:25:29Z

This fixes the extraction of leaked tracks in Async mode. The previous extraction of leaks didn't work in most cases for two main reasons:

The size of the buffer used for extracting the leaked tracks is limited, so if there are too many, the extraction needs to be done in multiple steps. However, there was no mechanism to do this, tracks would get postponed until the next extraction step, but the corresponding events were notified that all leaks had been extracted. This means that whenever the number of leaks exceeded the buffer size the overflow was lost.
There wasn't a way to guarantee that all leaks corresponding to the event requesting a flush had been extracted. The tracks are not extracted in order of insertion, and the transport kernels were allowed to insert new tracks into the leak queues in-between transfers of tracks from device to host. This means that we couldn't know when all leaks corresponding to an event had actually been extracted.

With these changes:

The leak queues are frozen whenever an event requests a flush of their leaks. We keep two leak queues per particle type, we swap them at the start of the leak extraction, and don't swap them back until the extraction ends and another event requests a flush. This ensures that when we finish transferring the frozen queue, all leaks associated to that event have been extracted.
We use a new state machine that doesn't return to the idle state until all leaks have been transferred. The functionality is similar to that of the scoring thread, but instead of running separately from the transport loop, the work is done by the AdePT thread.
Returning all leaks uncovers an issue with the MPMC queues used to store leaked tracks during transport. These queues have a fixed size and need to be emptied before they can be used again, and it is left to the user to check whether the queue is full. The queues were inadvertedly overflowing in specific use cases (Such as the testEm3 geometry when running large events), this has been solved by stopping transport in situations where the queues are close to overflowing but a leak extraction is already in progress. Since this is not an issue in more realistic use cases, this mechanism ensures that the execution will be error-free in all situations without affecting runtime in realistic runs.
In specific use cases like running large events in testEm3 the results were sometimes non-reproducible. This was likely due to some interference between the leak extraction, hit extraction, and particle injection, which ran on the same stream. The issue is solved by running these three operations on different streams, but should be revisited in the future.

The new ExtractStates are the following:

Idle: No extraction is in progress
ExtractionRequested: An event requested a flush, waiting for transport to finish
- We move from Idle to ExtractionRequested when an event has requested a flush, and their EventState is HitsFlushed. The number of particles in flight for that event is guaranteed to be 0 at this point
- We wait for the transport kernels to finish running as they are writing to the same leak queues that we need to transfer
TracksNeedTransfer: An event requested a flush, leak buffer on device has tracks to transfer
- Transport has finished, the leak queues are frozen and we can start transferring tracks
PreparingTracks: Tracks are being copied to the staging buffer
- Tracks are copied from the leak queues to the transfer buffer, the numbers of transferred and remaining tracks are copied to the host.
TracksReadyToCopy: Staging buffer is ready to be copied to host
CopyingTracks: Tracks are being copied to host
TracksOnHost: Some or all the tracks have been transferred from device to host and are waiting in the copy buffer
SavingTracks: Tracks are being copied to per-event queues
- Tracks are copied from the transfer buffer, which can contain tracks for all events, into per-event queues
TracksSaved: Tracks have been moved from the copy buffer to their respective per-event queues
- All tracks are stored in the per-event queues, so the transfer buffer can be re-used. If there are any remaining leaks on GPU the state goes back to TracksNeedTransfer, otherwise the state moves to Idle

phsft-bot · 2025-03-12T09:26:01Z

Can one of the admins verify this patch?

This PR introduces a new feature for the async mode: `/adept/FinishLastNParticlesOnCPU`. It is not complete yet, as they are currently just killed and not pushed to the CPU, as this will require #358 to work and be merged. Still, killing the last N particles can currently be used to assess some performance bottlenecks with the magnetic field. Also, a timer is added to the printout of the async printouts. This allows for the plotting the numbers of flight vs time and not just only vs iterations, as this severely skews it towards the fast iterations at the tail: <img width="637" alt="Screenshot 2025-03-16 at 10 45 45" src="https://github.com/user-attachments/assets/b09065a3-567f-4e92-8528-95d60ff3b772" />

andresailer · 2025-03-17T16:12:47Z

CMakeLists.txt

@@ -42,6 +42,12 @@ set(CMAKE_CUDA_STANDARD_REQUIRED ${CMAKE_CXX_STANDARD_REQUIRED})
 set(CMAKE_CUDA_EXTENSIONS OFF)
 set(CMAKE_INCLUDE_DIRECTORIES_PROJECT_BEFORE ON)

+
+# DEBUG
+# set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fsanitize=address -fno-omit-frame-pointer -g") 


Can I interest you in our builds with no-omit-framepointer enabled in the compiler (gcc14fp)?

SeverinDiederichs

Great, thank you for this big effort! See minor comments below

SeverinDiederichs · 2025-04-15T14:23:23Z

test/regression/CMakeLists.txt

-file(COPY "${PROJECT_SOURCE_DIR}/examples/data/testEm3.gdml" DESTINATION "${PROJECT_BINARY_DIR}")
-set(TESTING_GDML "${PROJECT_BINARY_DIR}/testEm3.gdml")
+# file(COPY "${PROJECT_SOURCE_DIR}/examples/data/testEm3.gdml" DESTINATION "${PROJECT_BINARY_DIR}")
+# file(COPY "${PROJECT_SOURCE_DIR}/examples/data/testEm3_regions.gdml" DESTINATION "${PROJECT_BINARY_DIR}")


This seems to be a leftover and should be reverted. (the commented out code was fixed and the now removed code is actually correct)

We are actually not using the copied files as we give the absolute path in the scripts. I can uncomment these lines and change the path in the scripts too, both things work for me

Ah, if it is not used, then it can be fully deleted :) I just remember the commenting being a quick fix to get rid of the problem fixed in #379

SeverinDiederichs · 2025-04-15T14:24:09Z

test/regression/IntegrationTest/src/RunAction.cc

-    std::cout << "Run time: " << time << "\n";
+    G4cout << "Run time: " << time << "\n";


I think this was changed because with G4cout it didn't print the time anymore. Can you confirm it actually prints the time although it is not using std::cout?

Ah I didn't remember which one was correct, as both were printing the time. I changed it because I thought it hadn't been updated, will revert

If both work, then no preference from my side

SeverinDiederichs

See minor comments

SeverinDiederichs · 2025-04-28T11:04:28Z

include/AdePT/core/AsyncAdePTTransport.cuh

+          // if (gpuState.extractState.load(std::memory_order_acquire) == ExtractState::Idle &&
+          //   std::any_of(eventStates.begin(), eventStates.end(), [](const auto &eventState) {
+          //     return eventState.load(std::memory_order_acquire) == EventState::HitsFlushed;
+          //   })) {


Suggested change

// if (gpuState.extractState.load(std::memory_order_acquire) == ExtractState::Idle &&

// std::any_of(eventStates.begin(), eventStates.end(), [](const auto &eventState) {

// return eventState.load(std::memory_order_acquire) == EventState::HitsFlushed;

// })) {

Looks like some leftover?

SeverinDiederichs · 2025-04-28T11:05:59Z

include/AdePT/core/AsyncAdePTTransport.cuh

+          const unsigned int grid_size      = (trackBuffer.fNumFromDevice + block_size - 1) / block_size;
+          FillFromDeviceBuffer<<<grid_size, block_size, 0, extractStream>>>(
+              allLeaked, trackBuffer.fromDevice_dev.get(), trackBuffer.fNumFromDevice,
+              // printtotal, allLeaked, trackBuffer.fromDevice_dev.get(), trackBuffer.fNumFromDevice,


Suggested change

// printtotal, allLeaked, trackBuffer.fromDevice_dev.get(), trackBuffer.fNumFromDevice,

SeverinDiederichs · 2025-04-28T11:10:34Z

include/AdePT/core/AsyncAdePTTransport.cuh

-        if (auto &toDevice = trackBuffer.getActiveBuffer(); toDevice.nTrack.load(std::memory_order_acquire) > 0) {
+        if (auto &toDevice = trackBuffer.getActiveBuffer(); toDevice.nTrack > 0) {


Why was the atomic read removed here?

SeverinDiederichs · 2025-04-28T11:10:44Z

include/AdePT/core/AsyncAdePTTransport.cuh

-          const auto nInject = std::min(toDevice.nTrack.load(std::memory_order_acquire), toDevice.maxTracks);
+          const auto nInject = std::min(toDevice.nTrack.load(), toDevice.maxTracks);


Here I understand that we use the lock now

SeverinDiederichs

Great, thank you for this big effort!

JuanGonzalezCaminero added the bug Type: Something isn't working label Mar 12, 2025

SeverinDiederichs mentioned this pull request Mar 16, 2025

Enable event finishing #360

Merged

JuanGonzalezCaminero force-pushed the async_edep branch from 479f247 to 2e750f5 Compare March 17, 2025 16:09

andresailer reviewed Mar 17, 2025

View reviewed changes

JuanGonzalezCaminero force-pushed the async_edep branch 3 times, most recently from e9b1bcc to b7cc64e Compare April 8, 2025 16:34

JuanGonzalezCaminero added 3 commits April 11, 2025 17:13

Fix extraction of leaks in Async mode

fa43e3a

Fix memory leak

fa8ca36

Cleanup

f76cc97

JuanGonzalezCaminero force-pushed the async_edep branch 2 times, most recently from fd5cef9 to e5b4da4 Compare April 15, 2025 08:32

JuanGonzalezCaminero marked this pull request as ready for review April 15, 2025 10:22

SeverinDiederichs reviewed Apr 15, 2025

View reviewed changes

JuanGonzalezCaminero added 7 commits April 28, 2025 10:25

New test script and geometry

2375fa7

Adding a reproducibility test when running in multiple regions

696d893

Remove unused state

70afe9f

Fix unchecked leak queue overflow

98d2a1f

Process leaks on separate stream

b764895

Separate hit transfer and injection streams

04f70b8

Format

967c93e

JuanGonzalezCaminero force-pushed the async_edep branch from b749ae4 to 01d449c Compare April 28, 2025 08:26

Remove debug printouts

db4007d

JuanGonzalezCaminero force-pushed the async_edep branch from 01d449c to db4007d Compare April 28, 2025 08:54

SeverinDiederichs reviewed Apr 28, 2025

View reviewed changes

SeverinDiederichs approved these changes Apr 28, 2025

View reviewed changes

JuanGonzalezCaminero merged commit 0bf239b into apt-sim:master Apr 28, 2025
3 checks passed

		std::cout << "Run time: " << time << "\n";
		G4cout << "Run time: " << time << "\n";

		if (auto &toDevice = trackBuffer.getActiveBuffer(); toDevice.nTrack.load(std::memory_order_acquire) > 0) {
		if (auto &toDevice = trackBuffer.getActiveBuffer(); toDevice.nTrack > 0) {

		const auto nInject = std::min(toDevice.nTrack.load(std::memory_order_acquire), toDevice.maxTracks);
		const auto nInject = std::min(toDevice.nTrack.load(), toDevice.maxTracks);

Fix leaks in asynchronous mode #358

Fix leaks in asynchronous mode #358

Uh oh!

Conversation

JuanGonzalezCaminero commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phsft-bot commented Mar 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SeverinDiederichs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SeverinDiederichs left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SeverinDiederichs left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

JuanGonzalezCaminero commented Mar 12, 2025 •

edited

Loading