-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
area:ioIO formats/read/writeIO formats/read/writearea:storageStorages (local/s3/adls/gcs)Storages (local/s3/adls/gcs)help wantedExtra attention is neededExtra attention is needed
Description
Summary
Remote runs currently list inputs and download all files up-front before precheck/processing. This can be wasteful if a run aborts early or only needs a subset.
Proposed change
- Keep the initial list step to compute the input set.
- Download inputs just-in-time per file before precheck/read.
- Cache downloaded paths to avoid double downloads across stages.
Notes
- This likely touches run -> precheck -> read pipeline because InputFile expects source_local_path to exist today.
- Should preserve dry-run list-only behavior.
Acceptance
- No behavior change in results.
- Reduced temp usage and unnecessary downloads for remote sources.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:ioIO formats/read/writeIO formats/read/writearea:storageStorages (local/s3/adls/gcs)Storages (local/s3/adls/gcs)help wantedExtra attention is neededExtra attention is needed