Add direct conversion of out-of-core arrays to PyTorch and TensorFlow tensors by Copilot · Pull Request #47 · j143/ooc

Copilot · 2025-11-20T01:37:16Z

Enables seamless conversion of memory-mapped Paper arrays to device tensors for deep learning workflows. Minimizes memory usage through zero-copy conversion where possible.

Changes

Core API (`paper/numpy_api.py`)

ndarray.to_torch(device=None) - Converts to PyTorch tensor with optional device specification
- Zero-copy for writable memmaps on CPU
- Handles read-only memmaps with minimal copy overhead
- Supports CUDA device transfer
ndarray.to_tensorflow() - Converts to TensorFlow tensor via buffer protocol
Both methods compute lazy arrays automatically before conversion

Testing (`tests/test_tensor_conversion.py`)

17 test cases covering materialized/lazy arrays, CPU/GPU devices, error handling
Validates zero-copy behavior and computation correctness

Documentation

Demo showing complete out-of-core workflow (examples/tensor_conversion_demo.py)
Updated README with API reference and usage patterns

Usage

from paper import numpy_api as pnp

# Load large array with memory mapping
arr = pnp.load("matrix.dat", shape=(10000, 10000), dtype=np.float32)

# Build lazy computation
result = (arr * 2).compute()

# Convert efficiently to device tensors
torch_cpu = result.to_torch()              # zero-copy for writable arrays
torch_gpu = result.to_torch(device='cuda') # copy to GPU
tf_tensor = result.to_tensorflow()          # buffer protocol conversion

Notes

PyTorch and TensorFlow are optional dependencies (graceful ImportError)
Zero-copy optimization applies to writable memory-mapped arrays on CPU
Lazy arrays are materialized once during conversion

Original prompt

This section details on the original issue you should resolve

<issue_title>direct conversion of out-of-core arrays to device tensors, support for DLPack or memory mapping</issue_title>
<issue_description>. Direct Conversion of Out-of-Core Arrays to Device Tensors

Allow users to seamlessly convert large arrays managed out-of-core (on disk, not in RAM) into tensors compatible with PyTorch, TensorFlow, JAX, or other frameworks that run on GPU or other devices.

The conversion should avoid unnecessary data copies, enabling direct access by deep learning libraries for maximum performance.

Key Goals:

Minimize memory usage: Only the relevant chunk/page/batch should be loaded into device memory.

Zero-copy or memory-mapped access: Whenever possible, avoid copying data by sharing memory pointers (especially for Numpy/PyTorch/CUDF integrations).

API Example:
from paper import numpy_api as pnp
import numpy as np

# Load a large out-of-core array (no data read yet)
arr = pnp.load("large_matrix.dat", shape=(10000, 10000), dtype=np.float32)

# Build your computation graph (lazy, nothing loaded)
c = arr * 2 + 4

# Execute the computation plan – now you get a result object
result = c.compute()

# Convert result to PyTorch and TensorFlow tensors
torch_tensor = result.to_torch()      # Efficient, memory-mapped conversion
tf_tensor = result.to_tensorflow()    # Efficient, memory-mapped conversion
Memory mapping: Map disk data into RAM (using mmap), then wrap as tensor without physical copy.

Tests
Add unit tests</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes direct conversion of out-of-core arrays to device tensors, support for DLPack or memory mapping #46

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Co-authored-by: j143 <53068787+j143@users.noreply.github.com>

j143

Solid implementation overall with thoughtful design. However, before merging, I'd like clarification on several critical questions:

API DESIGN & STABILITY

Public API Status: Should to_torch() and to_tensorflow() be marked experimental in docstrings? First-cut APIs often need breaking changes.
Zero-Copy Preconditions: Document EXACTLY when zero-copy is guaranteed:
- Writable memmap on CPU: ✓
- Read-only memmap: falls back to copy
- Non-contiguous/sliced arrays: unclear
- Alignment, endianness requirements?

PYTORCH-SPECIFIC CONCERNS

CUDA Safety (line 231): cpu_tensor.to(device) can fail silently if CUDA unavailable. Need error handling BEFORE .to() call.
Memmap Tensor Lifetime: What happens if underlying file is closed/resized while tensor is in use? Document constraints for training loops.
Gradient Flow: Test that zero-copy tensors support requires_grad=True. Current tests don't verify this.
Non-Contiguous Arrays: Add tests for transposed/sliced arrays to verify torch.from_numpy() semantics.

TENSORFLOW CONCERNS

Buffer Protocol: Which TF API is used? Test both eager and graph execution modes.
Device Placement: Always CPU? Should match PyTorch with device parameter?
Dtype Coverage: Tests only check float32. Need float16, int64, bool, complex.

TESTING GAPS

Lazy Array Behavior: Do you cache materialized result or recompute on subsequent calls? Test both .to_torch() on same lazy array.
Interleaved Operations: No test for .compute() -> .to_torch() sequences.
Error Modes: Invalid device specs, undersized tensors, unsupported dtypes.

DOCUMENTATION

Failure Examples: Add README example showing error when GPU unavailable.
Copy Semantics: Clearly document when zero-copy vs. actual copy happens.

This is good foundational work with solid error handling. These questions are mainly about clarifying API contracts before this becomes stable.

Initial plan

ee5696f

Copilot AI assigned Copilot and j143 Nov 20, 2025

Copilot started work on behalf of j143 November 20, 2025 01:37 View session

Copilot AI and others added 3 commits November 20, 2025 01:44

Add tensor conversion methods to_torch() and to_tensorflow()

2064de8

Co-authored-by: j143 <53068787+j143@users.noreply.github.com>

Add tensor conversion demo example

b76bfc1

Co-authored-by: j143 <53068787+j143@users.noreply.github.com>

Update README with tensor conversion documentation

dc05211

Co-authored-by: j143 <53068787+j143@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add direct conversion of out-of-core arrays to device tensors~~ Add direct conversion of out-of-core arrays to PyTorch and TensorFlow tensors Nov 20, 2025

Copilot AI requested a review from j143 November 20, 2025 01:50

Copilot finished work on behalf of j143 November 20, 2025 01:50

j143 reviewed Dec 6, 2025

View reviewed changes

j143 approved these changes Mar 8, 2026

View reviewed changes

j143 marked this pull request as ready for review March 8, 2026 17:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add direct conversion of out-of-core arrays to PyTorch and TensorFlow tensors#47

Add direct conversion of out-of-core arrays to PyTorch and TensorFlow tensors#47
Copilot wants to merge 4 commits into
mainfrom
copilot/direct-conversion-out-of-core-arrays

Copilot AI commented Nov 20, 2025 •

edited

Loading

Uh oh!

j143 left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Core API (paper/numpy_api.py)

Testing (tests/test_tensor_conversion.py)

Documentation

Usage

Notes

Comments on the Issue (you are @copilot in this section)

Uh oh!

j143 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

API DESIGN & STABILITY

PYTORCH-SPECIFIC CONCERNS

TENSORFLOW CONCERNS

TESTING GAPS

DOCUMENTATION

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 20, 2025 •

edited

Loading

Core API (`paper/numpy_api.py`)

Testing (`tests/test_tensor_conversion.py`)

j143 left a comment •

edited

Loading