Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory efficient lazy multiplication possible? #552

Open
vsaase opened this issue Jan 2, 2021 · 7 comments
Open

Memory efficient lazy multiplication possible? #552

vsaase opened this issue Jan 2, 2021 · 7 comments

Comments

@vsaase
Copy link

vsaase commented Jan 2, 2021

Hi,
do you have support for something like this without allocation all n*m memory:

n=1000
m=2000
i = Index(n)
j = Index(m)
A=ITensor(i)
B=ITensor(j)
C = A*B

I am specifically looking for functionality like lazy Kronecker products provided by Kronecker.jl and LazyArray.jl, but without merging the axes like Kronecker product does.

thanks,
Victor

@vsaase
Copy link
Author

vsaase commented Jan 2, 2021

maybe like this?

import Base: +, *

using ITensors

struct LazyITensor{op}
    A::ITensor
    B::ITensor
end

*(a::LazyITensor{*}, b::ITensor{N}) where N = begin
    a.A * (a.B * b)
end

*(a::LazyITensor{+}, b::ITensor{N}) where N = begin
    a.A * b + a.B * b
end

@dkarrasch
Copy link

LinearMaps.jl has support for all kinds of lazy "matrix" manipulations: addition/subtraction, scalar/operator multiplication, Kronecker products, horizontal/vertical/diagonal concatenation. You can use any AbstractMatrix types as building blocks.

@mtfishman
Copy link
Member

mtfishman commented Jan 4, 2021

@vsaase, that is a good question and in general we would like to have lazy operations available for ITensor operations. We were even discussing making operations lazy by default, but with current Julia semantics (not having any reference counting) it is a bit "dangerous" because of something like this:

n = 1000
m = 2000
i = Index(n)
j = Index(m)
A = ITensor(i)
B = ITensor(j)
# Say we made this lazy (returning something like a `LazyITensor{*}`)
C = A * B
# The following modifies the eventual result of `C` once the contraction is materialized,
# which could cause a lot of confusion/unwanted bugs. With reference counting we 
# could determine that there are multiple references to `A` and perform an appropriate copy here
A[1] = 2.0

I think for now it would be good to have it as opt-in, so you would specify that an ITensor involved is going to be lazy, and the laziness "propagates":

A = lazy(ITensor(i)) # Could be a macro `A = @lazy ITensor(i)`
B = ITensor(j)
C = A * B # Since `A` is a lazy ITensor, this operation is performed lazily (returning something like a `LazyITensor{*}`)

A complication that comes about is that once you start contracting tensors lazily, determining the most efficient contraction sequence in general scales exponentially with the number of tensors being contracted. There are many heuristic algorithms to determine a good contraction sequence, which we plan to implement but haven't had the time yet. For references on this, see for example https://arxiv.org/pdf/1304.6112.pdf and https://arxiv.org/pdf/2001.08063.pdf. In addition, TensorOperations.jl currently has functionality for contraction sequence optimization: https://jutho.github.io/TensorOperations.jl/stable/indexnotation/#Contraction-order-and-@tensoropt-macro

Another potential complication is determining what operations materialize the lazy ITensors or continue to propagate the laziness, but perhaps that is fairly solvable.

Anyway, without all of this machinery, you will have to determine the optimal sequence for your own contraction. My main question is, once you have created your "lazy" contraction A * B, what would you like to do with it? If you just wanted to contract it with more ITensors, like D and E, you can just do (D * A) * (B * E), if you have determined that is the best contraction order (so no laziness needed). If you want to use it as part of a linear map in an iterative solver, I could imagine you might want to compose a variety of linear maps which may involve outer products in which case laziness could be useful. In that case, you could make your own wrapper type or use something like LinearMaps.jl. I've found that it is easy enough to write specialized wrapper types for linear maps involving ITensors in conjunction with KrylovKit.jl, for example:

using ITensors
using KrylovKit

# Linear map representing:
# A + α |x><x|
struct LinearMap
  A::ITensor
  x::ITensor
  α::Number
end

# Define the operation when acting
# on a vector:
# (A + α |x><x|)|v>
function (M::LinearMap)(v::ITensor)
  Mv = M.A * v + M.α * M.x' * (dag(M.x) * v)
  return noprime(Mv)
end

i = Index(100)

A = randomITensor(i', dag(i))
x = randomITensor(i)
α = 2.0

M = LinearMap(A, x, α)

v0 = randomITensor(i)
λs, vs = eigsolve(M, v0, 1)

@show norm(M(vs[1]) - λs[1] * vs[1])

In theory we could provide some built-in linear maps involving ITensors which have lazy operations defined, but there can be a very wide range of operators that are of interest for different problems so I think it is outside the scope of the package (and lazy compositions could be handled by packages like LinearMaps.jl).

Cheers,
Matt

@vsaase
Copy link
Author

vsaase commented Jan 5, 2021

thanks for the long and detailed answer! Indeed I want to solve a linear problem with Krylov propagation. So I think using LinearMaps.jl is the way to go. Another thing is that some of the linear maps must be sparse matrices, which seems to block me from using tensor notation and forces me back to kronecker products.

@mtfishman
Copy link
Member

Out of curiousity, what kind of sparsity are you interested in? We have support for block sparse tensors in ITensors.jl (based on a block sparse tensor type defined in https://github.com/ITensor/NDTensors.jl). By making the blocks all have size 1, we can make general sparse tensors, which we have used successfully in physics applications. Currently the interface in ITensors.jl for sparse tensors is geared more towards physics applications, but I have been thinking about adding a less physics-oriented interface.

@vsaase
Copy link
Author

vsaase commented Jan 5, 2021

that sounds interesting, I will have a look at it. at first I dismissed it because of the uninviting README of NDTensors.jl and lack of documentation. I am simulating magnetic resonance imaging, which involves Kronecker products of spatial and spin dynamics. The sparse matrices are generated from nearest neighbor interactions in a spatial grid due to flow and diffusion, but also other parts of the dynamics can get sparse.
can you point me to a physics example?

@mtfishman
Copy link
Member

mtfishman commented Jan 5, 2021

Currently NDTensors is still more of an "internal" package for use within ITensors, though it is a lot more stable now than when I wrote that warning initially. It implements a Tensor type that is an AbstractArray with a more traditional array interface. The Tensor type can have a variety of sparsity structures, for example dense, block sparse, and diagonal. That is where all of the "work" happens in ITensors (i.e. when ITensors are contracted, they are converted to Tensors which are then contracted with a traditional einsum-like contraction function). Once the NDTensors API is a bit more established and we have a wide enough range of efficient operations defined, we will make the package more official. You can take a look at Section 9 of our paper to get a brief overview of the interface for block sparse tensors: https://arxiv.org/pdf/2007.14822.pdf

For a physics example, you could look here: https://github.com/ITensor/ITensors.jl/blob/master/examples/dmrg/1d_heisenberg_conserve_spin.jl#L12-L24 (and at other examples in that directory). If you print out some of the tensors of the MPO object, for example with @show H[50], you can see the type of sparsity that we use:

using ITensors

N = 100
sites = siteinds("S=1", N; conserve_qns = true)
# Define the types of interactions
ampo = AutoMPO()
for j in 1:N-1
  ampo .+= 0.5, "S+", j, "S-", j+1
  ampo .+= 0.5, "S-", j, "S+", j+1
  ampo .+=      "Sz", j, "Sz", j+1
end
# Create the tensors from the interactions
H = MPO(ampo, sites)
println("Before making more sparse")
@show dim(H[N÷2])
@show nnz(H[N÷2])
@show prod(nblocks(inds(H[N÷2])))
@show nnzblocks(H[N÷2])

println("\nAfter making more sparse")
H = splitblocks(linkinds, H)
@show dim(H[N÷2])
@show nnz(H[N÷2])
@show prod(nblocks(inds(H[N÷2])))
@show nnzblocks(H[N÷2])

This example is solving for the ground state of the Heisenberg model (a quantum model of spin degrees of freedom interacting locally), and the MPO encodes the interactions between the spin degrees of freedom. Like in your application, there are multiple levels of sparsity, some arising from the global spin symmetry that gets conserved and some arising from the sparsity of the interactions (which gets exploited by the function splitblocks(linkinds, H), which splits certain degrees of freedom into blocks of size 1 and drops zero blocks).

A simple example of constructing block sparse tensors with symmetry is the following:

using ITensors

# Block sizes
d = 1
i = Index([QN(0, 2) => d, QN(1, 2) => d], "i")

# Parity conserving ITensors (ITensors have a well defined parity)
# By default they have parity 0. All blocks consistent with the specified symmetry
# sector are filled with random elements.
A = randomITensor(i', dag(i))
B = randomITensor(i', dag(i))
C = randomITensor(QN(1, 2), i', dag(i))

# This errors, can't set an element that doesn't have a consistent symmetry sector
#A[i' => 1, i => 2]

# Add them
A + B

# Can't add QN ITensors with different symmetry sector
#A + C

# Contract them
A' * B
A' * C

# Combine the indices to turn into a vector
comb = combiner(i', dag(i))
A * comb

Some background about block sparse tensors that conserve symmetries can be found here: http://itensor.org/docs.cgi?vers=julia&page=book/block_sparse as well as in the ITensor paper I linked to above.

If your tensors don't obey a certain symmetry, but still have a sparse structure (which is also fairly common in our applications, especially for tensors representing interactions), I have considered adding the following type of interface:

using ITensors

# Block sizes
d = 1
i = Index([d, d], "i")

# Initially they have no blocks
A = ITensor(i', dag(i))
B = ITensor(i', dag(i))
# Add blocks as needed
A[i' => 1, i => 1] = 11.0
A[i' => 1, i => 2] = 12.0

so it is very similar to the interface above when there are symmetries involved, but only the block structure is specified when creating the tensors, not the symmetry. Note this is not available right now, but the infrastructure exists in NDTensors to do this, so it would be fairly easy to add. If this seems useful for your applications, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants