Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: IsDense/Sparse/Contiguous #111964

Open
michaelgsharp opened this issue Jan 29, 2025 · 5 comments
Open

[API Proposal]: IsDense/Sparse/Contiguous #111964

michaelgsharp opened this issue Jan 29, 2025 · 5 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Numerics.Tensors untriaged New issue has not been triaged by the area owner

Comments

@michaelgsharp
Copy link
Member

michaelgsharp commented Jan 29, 2025

Background and motivation

Tensors can be either be dense (all elements are next to each other in memory and all elements are represented in memory) or sparse (either elements are not next to each other in memory say from slicing, or you are manipulating the strides to conserve memory while representing more elements than are actually present in memory). We have a way to determine this internally, but there is no public way of doing this. In the cases a user needs to know whether the tensor is dense or sparse, a user has to figure out how to calculate that themselves. We need to expose this to users.

We expect this to be a common query so instead of calculating it each time it should be a property.

API Proposal

namespace System.Numerics.Tensors;

public interface IReadOnlyTensor<TSelf, T> : IEnumerable<T>
        where TSelf : IReadOnlyTensor<TSelf, T>
{
    bool IsSparse { get; }
// TODO: Should we have a helper ToDense? Other frameworks do.
}

API Usage

Tensor<int> tensor = Tensor.Create<int>([1, 2, 3, 4], [2, 2]);
// Will be false.
bool dense = tensor.IsSparse;

// Create a tensor with only 1 element in memory but actually representing a 2 x 2 tensor with all values 1.
Tensor<int> tensor = Tensor.Create<int>([1], [2, 2], [0, 0]);
// Will be true.
bool dense = tensor.IsSparse;

Alternative Designs

IsSparse is what is used by PyTorch, but we could do the inverse on our side and something like IsDense, but we would then need another parameter IsContiguous since dense/contiguous could be separate. Sparse does have some nuance about exactly what it means though in other frameworks (see glossary below).

IsView - could be used to refer to anything that is not fully dense/contiguous. See glossary below for additional details.

Onnx Runtime does not have any property to represent this, they just check if the tensor is a DenseTensor<T>. I don't think this is a good approach for us.

IsDistinct - could be used when the data matches exactly 1 to 1 with its representation. No other frameworks that I could find use this though, so it would be very different from existing frameworks.

Risks

This would be a new api in a preview object, so the risks are very minimal.

Glossary

PyTorch Sparse - https://pytorch.org/docs/stable/sparse.html. PyTorch uses Sparse to refer to a tensor where "elements are mostly zero valued." They support 5 different formats of sparse (see prior link). If a tensor is sparse, they also track how many dimensions are represented in a sparse format and how many are represented in a dense format (in the same tensor).

PyTorch Views - https://pytorch.org/docs/stable/tensor_view.html. PyTorch uses a view to essentially let you know that this "Tensor" is actually pointing to the memory of another tensor. Kinda like our TensorSpan. But these views don't have to be contiguous.

PyTorch IsContiguous - https://pytorch.org/docs/stable/generated/torch.Tensor.is_contiguous.html. Basically the same as we would consider contiguous (the data is contiguous in memory), but they provide additional options/details about how the memory is laid out.

OnnxRuntime C# api doesn't have sparse tensors, but their python api does and it matches exactly PyTorch, https://onnxruntime.ai/docs/api/python/api_summary.html#sparsetensor. In fact you can bind PyTorch tensors directly as input/ouput.

TensorFlow also supports sparse tensors, https://www.tensorflow.org/guide/sparse_tensor, they are interepretted the same way as PyTorch, but they only support 1 format compared to PyTorch's 5.

@michaelgsharp michaelgsharp added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Jan 29, 2025
@michaelgsharp
Copy link
Member Author

@tannergooding

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 29, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics-tensors
See info in area-owners.md if you want to be subscribed.

@tannergooding
Copy link
Member

We should probably list some of the other names under the Alternative Designs section and give a brief summary of how different ecosystems use Dense vs Contiguous vs Sparse, etc.

We should likely also put the consideration that this is a property as we expect it to be a common query and therefore cached in the tensor, rather than dynamically determined each call.

@hez2010
Copy link
Contributor

hez2010 commented Jan 29, 2025

IMO IsDense seems to be the opposite to what I'm thinking about.
For example, in PyTorch, we have IsSparse instead: https://pytorch.org/docs/stable/generated/torch.Tensor.is_sparse.html

@michaelgsharp
Copy link
Member Author

Interesting.

OnnxRuntime C# api currently just checks if the class is derived from DenseTensor. I think IsSparse is probably better because that can refer to both non-dense and non-contiguous with only a single property (unless we want them split out like that).

I think I would change it personally to use IsSparse instead.
 

Background and motivation

Tensors can be either be dense (all elements are next to each other in memory and all elements are represented in memory) or sparse (either elements are not next to each other in memory say from slicing, or you are manipulating the strides to conserve memory while representing more elements than are actually present in memory). We have a way to determine this internally, but there is no public way of doing this. In the cases a user needs to know whether the tensor is dense or sparse, a user has to figure out how to calculate that themselves. We need to expose this to users.

We expect this to be a common query so instead of calculating it each time it should be a property.

API Proposal

namespace System.Numerics.Tensors;

public interface IReadOnlyTensor<TSelf, T> : IEnumerable<T>
        where TSelf : IReadOnlyTensor<TSelf, T>
{
    bool IsSparse { get; }
}

API Usage

Tensor<int> tensor = Tensor.Create<int>([1, 2, 3, 4], [2, 2]);
// Will be false.
bool dense = tensor.IsSparse;

// Create a tensor with only 1 element in memory but actually representing a 2 x 2 tensor with all values 1.
Tensor<int> tensor = Tensor.Create<int>([1], [2, 2], [0, 0]);
// Will be true.
bool dense = tensor.IsSparse;

Alternative Designs

IsSparse is what is used by PyTorch, but we could do the inverse on our side and something like IsDense, but we would then need another parameter IsContiguous since dense/contiguous could be separate.

Onnx Runtime does not have any property to represent this, they just check if the tensor is a DenseTensor<T>. I don't think this is a good approach for us.

Risks

This would be a new api in a preview object, so the risks are very minimal.

@michaelgsharp michaelgsharp changed the title [API Proposal]: IsDense [API Proposal]: IsContiguous Feb 21, 2025
@michaelgsharp michaelgsharp changed the title [API Proposal]: IsContiguous [API Proposal]: IsDense/Sparse/Contiguous Feb 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Numerics.Tensors untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

3 participants