direct conversion of out-of-core arrays to device tensors, support for DLPack or memory mapping

. Direct Conversion of Out-of-Core Arrays to Device Tensors

Allow users to seamlessly convert large arrays managed out-of-core (on disk, not in RAM) into tensors compatible with PyTorch, TensorFlow, JAX, or other frameworks that run on GPU or other devices.

The conversion should avoid unnecessary data copies, enabling direct access by deep learning libraries for maximum performance.

Key Goals:
1. Minimize memory usage: Only the relevant chunk/page/batch should be loaded into device memory.
2. Zero-copy or memory-mapped access: Whenever possible, avoid copying data by sharing memory pointers (especially for Numpy/PyTorch/CUDF integrations).

API Example:

```python
from paper import numpy_api as pnp
import numpy as np

# Load a large out-of-core array (no data read yet)
arr = pnp.load("large_matrix.dat", shape=(10000, 10000), dtype=np.float32)

# Build your computation graph (lazy, nothing loaded)
c = arr * 2 + 4

# Execute the computation plan – now you get a result object
result = c.compute()

# Convert result to PyTorch and TensorFlow tensors
torch_tensor = result.to_torch()      # Efficient, memory-mapped conversion
tf_tensor = result.to_tensorflow()    # Efficient, memory-mapped conversion

```

Memory mapping: Map disk data into RAM (using mmap), then wrap as tensor without physical copy.

Tests
Add unit tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

direct conversion of out-of-core arrays to device tensors, support for DLPack or memory mapping #46

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

direct conversion of out-of-core arrays to device tensors, support for DLPack or memory mapping #46

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions