[API Proposal]: IsDense/Sparse/Contiguous #111964

michaelgsharp · 2025-01-29T17:58:13Z

Background and motivation

Tensors can be either be dense (all elements are next to each other in memory and all elements are represented in memory) or sparse (either elements are not next to each other in memory say from slicing, or you are manipulating the strides to conserve memory while representing more elements than are actually present in memory). We have a way to determine this internally, but there is no public way of doing this. In the cases a user needs to know whether the tensor is dense or sparse, a user has to figure out how to calculate that themselves. We need to expose this to users.

We expect this to be a common query so instead of calculating it each time it should be a property.

API Proposal

namespace System.Numerics.Tensors;

public interface IReadOnlyTensor<TSelf, T> : IEnumerable<T>
        where TSelf : IReadOnlyTensor<TSelf, T>
{
    bool IsSparse { get; }
// TODO: Should we have a helper ToDense? Other frameworks do.
}

API Usage

Tensor<int> tensor = Tensor.Create<int>([1, 2, 3, 4], [2, 2]);
// Will be false.
bool dense = tensor.IsSparse;

// Create a tensor with only 1 element in memory but actually representing a 2 x 2 tensor with all values 1.
Tensor<int> tensor = Tensor.Create<int>([1], [2, 2], [0, 0]);
// Will be true.
bool dense = tensor.IsSparse;

Alternative Designs

IsSparse is what is used by PyTorch, but we could do the inverse on our side and something like IsDense, but we would then need another parameter IsContiguous since dense/contiguous could be separate. Sparse does have some nuance about exactly what it means though in other frameworks (see glossary below).

IsView - could be used to refer to anything that is not fully dense/contiguous. See glossary below for additional details.

Onnx Runtime does not have any property to represent this, they just check if the tensor is a DenseTensor<T>. I don't think this is a good approach for us.

IsDistinct - could be used when the data matches exactly 1 to 1 with its representation. No other frameworks that I could find use this though, so it would be very different from existing frameworks.

Risks

This would be a new api in a preview object, so the risks are very minimal.

Glossary

PyTorch Sparse - https://pytorch.org/docs/stable/sparse.html. PyTorch uses Sparse to refer to a tensor where "elements are mostly zero valued." They support 5 different formats of sparse (see prior link). If a tensor is sparse, they also track how many dimensions are represented in a sparse format and how many are represented in a dense format (in the same tensor).

PyTorch Views - https://pytorch.org/docs/stable/tensor_view.html. PyTorch uses a view to essentially let you know that this "Tensor" is actually pointing to the memory of another tensor. Kinda like our TensorSpan. But these views don't have to be contiguous.

PyTorch IsContiguous - https://pytorch.org/docs/stable/generated/torch.Tensor.is_contiguous.html. Basically the same as we would consider contiguous (the data is contiguous in memory), but they provide additional options/details about how the memory is laid out.

OnnxRuntime C# api doesn't have sparse tensors, but their python api does and it matches exactly PyTorch, https://onnxruntime.ai/docs/api/python/api_summary.html#sparsetensor. In fact you can bind PyTorch tensors directly as input/ouput.

TensorFlow also supports sparse tensors, https://www.tensorflow.org/guide/sparse_tensor, they are interepretted the same way as PyTorch, but they only support 1 format compared to PyTorch's 5.

The text was updated successfully, but these errors were encountered:

michaelgsharp · 2025-01-29T17:58:22Z

@tannergooding

dotnet-policy-service · 2025-01-29T17:58:38Z

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

tannergooding · 2025-01-29T18:04:04Z

We should probably list some of the other names under the Alternative Designs section and give a brief summary of how different ecosystems use Dense vs Contiguous vs Sparse, etc.

We should likely also put the consideration that this is a property as we expect it to be a common query and therefore cached in the tensor, rather than dynamically determined each call.

hez2010 · 2025-01-29T19:22:46Z

IMO IsDense seems to be the opposite to what I'm thinking about.
For example, in PyTorch, we have IsSparse instead: https://pytorch.org/docs/stable/generated/torch.Tensor.is_sparse.html

michaelgsharp · 2025-02-05T07:53:08Z

Interesting.

OnnxRuntime C# api currently just checks if the class is derived from DenseTensor. I think IsSparse is probably better because that can refer to both non-dense and non-contiguous with only a single property (unless we want them split out like that).

I think I would change it personally to use IsSparse instead.

Background and motivation

Tensors can be either be dense (all elements are next to each other in memory and all elements are represented in memory) or sparse (either elements are not next to each other in memory say from slicing, or you are manipulating the strides to conserve memory while representing more elements than are actually present in memory). We have a way to determine this internally, but there is no public way of doing this. In the cases a user needs to know whether the tensor is dense or sparse, a user has to figure out how to calculate that themselves. We need to expose this to users.

We expect this to be a common query so instead of calculating it each time it should be a property.

API Proposal

namespace System.Numerics.Tensors;

public interface IReadOnlyTensor<TSelf, T> : IEnumerable<T>
        where TSelf : IReadOnlyTensor<TSelf, T>
{
    bool IsSparse { get; }
}

API Usage

Tensor<int> tensor = Tensor.Create<int>([1, 2, 3, 4], [2, 2]);
// Will be false.
bool dense = tensor.IsSparse;

// Create a tensor with only 1 element in memory but actually representing a 2 x 2 tensor with all values 1.
Tensor<int> tensor = Tensor.Create<int>([1], [2, 2], [0, 0]);
// Will be true.
bool dense = tensor.IsSparse;

Alternative Designs

IsSparse is what is used by PyTorch, but we could do the inverse on our side and something like IsDense, but we would then need another parameter IsContiguous since dense/contiguous could be separate.

Onnx Runtime does not have any property to represent this, they just check if the tensor is a DenseTensor<T>. I don't think this is a good approach for us.

Risks

This would be a new api in a preview object, so the risks are very minimal.

michaelgsharp added the api-suggestion label Jan 29, 2025

dotnet-issue-labeler bot added the area-System.Numerics.Tensors label Jan 29, 2025

dotnet-policy-service bot added the untriaged label Jan 29, 2025

michaelgsharp changed the title ~~[API Proposal]: IsDense~~ [API Proposal]: IsContiguous Feb 21, 2025

michaelgsharp changed the title ~~[API Proposal]: IsContiguous~~ [API Proposal]: IsDense/Sparse/Contiguous Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[API Proposal]: IsDense/Sparse/Contiguous #111964

[API Proposal]: IsDense/Sparse/Contiguous #111964

michaelgsharp commented Jan 29, 2025 •

edited

Loading

michaelgsharp commented Jan 29, 2025

dotnet-policy-service bot commented Jan 29, 2025

tannergooding commented Jan 29, 2025

hez2010 commented Jan 29, 2025 •

edited

Loading

michaelgsharp commented Feb 5, 2025

[API Proposal]: IsDense/Sparse/Contiguous #111964

[API Proposal]: IsDense/Sparse/Contiguous #111964

Comments

michaelgsharp commented Jan 29, 2025 • edited Loading

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

Glossary

michaelgsharp commented Jan 29, 2025

dotnet-policy-service bot commented Jan 29, 2025

tannergooding commented Jan 29, 2025

hez2010 commented Jan 29, 2025 • edited Loading

michaelgsharp commented Feb 5, 2025

Background and motivation

API Proposal

API Usage

Alternative Designs

Risks

michaelgsharp commented Jan 29, 2025 •

edited

Loading

hez2010 commented Jan 29, 2025 •

edited

Loading