Skip to content

Conversation

@pranavdev022
Copy link
Contributor

@pranavdev022 pranavdev022 commented Oct 17, 2025

What changes were proposed in this pull request?

This PR optimizes memory management for cached local relations when cloning Spark sessions by implementing reference counting instead of data replication.

Current behavior:

  • When a session is cloned, cached local relation data stored in the block manager is replicated.
  • Each clone creates a duplicate copy of the data with a new block ID.
  • This causes unnecessary memory pressure.

Proposed changes:

  • Implement reference counting for cached local relations during session cloning.
  • Retain the same block ID and data reference when cloning sessions, incrementing a ref count instead of copying
  • Add a hash-to-blockId mapping in ArtifactManager for efficient block lookup
  • Clean up blocks from block manager memory when ref count reaches zero

Why are the changes needed?

Cloning sessions is a common operation in Spark applications (e.g., for creating isolated execution contexts). The current approach of duplicating cached data can significantly increase memory footprint, especially when:

  • Sessions are cloned frequently
  • Cached relations contain large datasets
  • Multiple clones exist simultaneously

This optimization reduces memory pressure, improves performance by avoiding unnecessary data copies.

Does this PR introduce any user-facing change?

No. This is an internal optimization that improves memory efficiency without changing user-facing APIs or behavior.

How was this patch tested?

  • Added unit tests to verify the reference count logic functioning.
  • Existing unit tests for ArtifactManager and session cloning.

Was this patch authored or co-authored using generative AI tooling?

No

@pranavdev022 pranavdev022 force-pushed the clone-artifactmanager-fix branch 7 times, most recently from b55b86b to 386757d Compare October 23, 2025 12:44
@github-actions github-actions bot added the BUILD label Oct 23, 2025
@pranavdev022 pranavdev022 changed the title [WIP] [SPARK-XXXXX][SQL] Optimize memory usage in session cloning with ref-counted cached local relations [WIP] [SPARK-54001][SQL] Optimize memory usage in session cloning with ref-counted cached local relations Oct 23, 2025
@pranavdev022 pranavdev022 changed the title [WIP] [SPARK-54001][SQL] Optimize memory usage in session cloning with ref-counted cached local relations [SPARK-54001][SQL] Optimize memory usage in session cloning with ref-counted cached local relations Oct 23, 2025
@pranavdev022 pranavdev022 force-pushed the clone-artifactmanager-fix branch from 386757d to 784e5a1 Compare October 23, 2025 13:56
Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hvanhovell
Copy link
Contributor

Merging to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants