Skip to content

fix(csharp): key CloudFetchDownloader.downloadTasks by ChunkIndex#436

Open
msrathore-db wants to merge 1 commit into
mainfrom
fix/cloudfetch-downloader-tracking-dict
Open

fix(csharp): key CloudFetchDownloader.downloadTasks by ChunkIndex#436
msrathore-db wants to merge 1 commit into
mainfrom
fix/cloudfetch-downloader-tracking-dict

Conversation

@msrathore-db
Copy link
Copy Markdown
Collaborator

Summary

Re-keys the active-downloads bookkeeping dictionary in CloudFetchDownloader.DownloadFilesAsync from <Task, IDownloadResult> to <long, Task> so the continuation's TryRemove actually removes its own entry.

Background

The dict was declared as:

var downloadTasks = new ConcurrentDictionary<Task, IDownloadResult>();

…and populated with downloadTasks[downloadTask] = downloadResult, where downloadTask is the continuation returned by DownloadFileAsync(...).ContinueWith(t => { ... }).

Inside the continuation lambda, however, t is the antecedent task (the DownloadFileAsync task itself), not the continuation. So:

.ContinueWith(t =>
{
    ...
    downloadTasks.TryRemove(t, out _);  // never matches; entries accumulate
});

Entries are never removed. The dict grows entry-by-entry for every chunk in a query and is only freed when DownloadFilesAsync returns (the variable goes out of scope). Task.WhenAll(downloadTasks.Keys) at the end-of-results guard ends up awaiting every download started in the query, including the ones that finished hours earlier — harmless because awaiting completed tasks is essentially free, but wasteful and the code's claim that the dict tracks active downloads is wrong.

Fix

  • Key the dict by downloadResult.ChunkIndex (a long, already unique per chunk in a query).
  • Capture chunkIndex before the lambda so the closure can use it for TryRemove.
  • Update Task.WhenAll to use .Values (the tasks now live as values, not keys).
  • Drop the IDownloadResult value type — verified by reading every reference that the value was never read out of the dict (only assigned, removed without being read, or ignored via out _); the IDownloadResult referenced inside the continuation comes from the foreach loop variable via closure capture, not from a dict lookup.

Why a separate PR

This came up during review of #183 (straggler download mitigation). It's not straggler-related — the bug pre-existed in the generic CloudFetch download loop. Splitting it out so it can be reviewed on its own merits.

Test plan

  • dotnet build succeeds with 0 warnings, 0 errors
  • All 693 unit tests pass (dotnet test --filter "FullyQualifiedName~Unit")
  • CloudFetch E2E tests against a live workspace (requires endpoint config — local run failed all 21 in 1ms each on connection setup; not caused by this change)

This pull request and its description were written by Isaac.

The bookkeeping dict was declared as ConcurrentDictionary<Task, IDownloadResult>
and populated with `downloadTasks[downloadTask] = downloadResult` where
downloadTask is the continuation returned by DownloadFileAsync(...).ContinueWith(...).
Inside that continuation, however, the lambda's `t` parameter is the ANTECEDENT
task (the DownloadFileAsync task itself), not the continuation. So
`downloadTasks.TryRemove(t, ...)` never matched any key and entries accumulated
for the lifetime of DownloadFilesAsync. The dict is local and GC'd at method
exit so there's no observable leak, but the code lies about what it tracks
("active" downloads vs. "every download seen") and Task.WhenAll at the end-of-
results guard awaits already-completed tasks needlessly.

Switch the key to downloadResult.ChunkIndex (a long that's already unique per
chunk in a query) so the continuation's TryRemove uses the same key the add
used. Update Task.WhenAll to use .Values.

The IDownloadResult value was never read out of the dict — every site either
ignored the value or wrote it — so dropping it is safe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant