Skip to content

direct conversion of out-of-core arrays to device tensors, support for DLPack or memory mapping #46

@j143

Description

@j143

. Direct Conversion of Out-of-Core Arrays to Device Tensors

Allow users to seamlessly convert large arrays managed out-of-core (on disk, not in RAM) into tensors compatible with PyTorch, TensorFlow, JAX, or other frameworks that run on GPU or other devices.

The conversion should avoid unnecessary data copies, enabling direct access by deep learning libraries for maximum performance.

Key Goals:

  1. Minimize memory usage: Only the relevant chunk/page/batch should be loaded into device memory.
  2. Zero-copy or memory-mapped access: Whenever possible, avoid copying data by sharing memory pointers (especially for Numpy/PyTorch/CUDF integrations).

API Example:

from paper import numpy_api as pnp
import numpy as np

# Load a large out-of-core array (no data read yet)
arr = pnp.load("large_matrix.dat", shape=(10000, 10000), dtype=np.float32)

# Build your computation graph (lazy, nothing loaded)
c = arr * 2 + 4

# Execute the computation plan – now you get a result object
result = c.compute()

# Convert result to PyTorch and TensorFlow tensors
torch_tensor = result.to_torch()      # Efficient, memory-mapped conversion
tf_tensor = result.to_tensorflow()    # Efficient, memory-mapped conversion

Memory mapping: Map disk data into RAM (using mmap), then wrap as tensor without physical copy.

Tests
Add unit tests

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions