. Direct Conversion of Out-of-Core Arrays to Device Tensors
Allow users to seamlessly convert large arrays managed out-of-core (on disk, not in RAM) into tensors compatible with PyTorch, TensorFlow, JAX, or other frameworks that run on GPU or other devices.
The conversion should avoid unnecessary data copies, enabling direct access by deep learning libraries for maximum performance.
Key Goals:
- Minimize memory usage: Only the relevant chunk/page/batch should be loaded into device memory.
- Zero-copy or memory-mapped access: Whenever possible, avoid copying data by sharing memory pointers (especially for Numpy/PyTorch/CUDF integrations).
API Example:
from paper import numpy_api as pnp
import numpy as np
# Load a large out-of-core array (no data read yet)
arr = pnp.load("large_matrix.dat", shape=(10000, 10000), dtype=np.float32)
# Build your computation graph (lazy, nothing loaded)
c = arr * 2 + 4
# Execute the computation plan – now you get a result object
result = c.compute()
# Convert result to PyTorch and TensorFlow tensors
torch_tensor = result.to_torch() # Efficient, memory-mapped conversion
tf_tensor = result.to_tensorflow() # Efficient, memory-mapped conversion
Memory mapping: Map disk data into RAM (using mmap), then wrap as tensor without physical copy.
Tests
Add unit tests
. Direct Conversion of Out-of-Core Arrays to Device Tensors
Allow users to seamlessly convert large arrays managed out-of-core (on disk, not in RAM) into tensors compatible with PyTorch, TensorFlow, JAX, or other frameworks that run on GPU or other devices.
The conversion should avoid unnecessary data copies, enabling direct access by deep learning libraries for maximum performance.
Key Goals:
API Example:
Memory mapping: Map disk data into RAM (using mmap), then wrap as tensor without physical copy.
Tests
Add unit tests