You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-54001][SQL] Optimize memory usage in session cloning with ref-counted cached local relations
### What changes were proposed in this pull request?
This PR optimizes memory management for cached local relations when cloning Spark sessions by implementing reference counting instead of data replication.
**Current behavior:**
- When a session is cloned, cached local relation data stored in the block manager is replicated.
- Each clone creates a duplicate copy of the data with a new block ID.
- This causes unnecessary memory pressure.
**Proposed changes:**
- Implement reference counting for cached local relations during session cloning.
- Retain the same block ID and data reference when cloning sessions, incrementing a ref count instead of copying
- Add a hash-to-blockId mapping in ArtifactManager for efficient block lookup
- Clean up blocks from block manager memory when ref count reaches zero
### Why are the changes needed?
Cloning sessions is a common operation in Spark applications (e.g., for creating isolated execution contexts). The current approach of duplicating cached data can significantly increase memory footprint, especially when:
- Sessions are cloned frequently
- Cached relations contain large datasets
- Multiple clones exist simultaneously
This optimization reduces memory pressure, improves performance by avoiding unnecessary data copies.
### Does this PR introduce _any_ user-facing change?
No. This is an internal optimization that improves memory efficiency without changing user-facing APIs or behavior.
### How was this patch tested?
- Added unit tests to verify the reference count logic functioning.
- Existing unit tests for ArtifactManager and session cloning.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes#52651 from pranavdev022/clone-artifactmanager-fix.
Authored-by: pranavdev022 <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
Copy file name to clipboardExpand all lines: sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/CheckConnectJvmClientCompatibility.scala
0 commit comments