Want to understand nextOverflow calculation in std/math/emulated/field_ops.go > mulPreCond #662

avras · 2023-04-26T15:07:10Z

avras
Apr 26, 2023

I am unable to understand the calculation of nextOverflow in the mulPreCond method of std/math/emulated/field_ops.go. Here is a link to the code.

The mulPreCond function is as follows:

func (f *Field[T]) mulPreCond(a, b *Element[T]) (nextOverflow uint, err error) {
	reduceRight := a.overflow < b.overflow
	nbResLimbs := nbMultiplicationResLimbs(len(a.Limbs), len(b.Limbs))
	nextOverflow = f.fParams.BitsPerLimb() + uint(math.Log2(float64(2*nbResLimbs-1))) + 1 + a.overflow + b.overflow
	if nextOverflow > f.maxOverflow() {
		err = overflowError{op: "mul", nextOverflow: nextOverflow, maxOverflow: f.maxOverflow(), reduceRight: reduceRight}
	}
	return
}

The inputs a and b are slices containing the limbs of the multiplication operands. The function nbMultiplicationResLimbs(len(a.Limbs), len(b.Limbs)) returns len(a.Limbs) + len(b.Limbs)-1. This corresponds to the number of limbs in the product of a and b. For example, if a and b have 4 limbs each, the product f will have 7 limbs as shown below.

The number of terms in a product limb f_i can be at most the minimum of the number of limbs in a and b. So the maximum bitwidth of a product limb would be f.Params.BitsPerLimb()*2 + a.overflow + b.overflow + ceil(Log2(Min(len(a.limbs), len(b.limbs)). The last term accounts for the number of carry bits.

To get the overflow value, we can remove one of the f.Params.BitsPerLimb(). So the value for nextOverflow would be f.Params.BitsPerLimb() + a.overflow + b.overflow + ceil(Log2(Min(len(a.limbs), len(b.limbs)).

I could not figure out the reasoning behind the uint(math.Log2(float64(2*nbResLimbs-1))) term in the current nextOverflow calculation. Is it something to do with the constraint check in the mul function?

ivokub · 2023-04-26T21:30:45Z

ivokub
Apr 26, 2023
Maintainer

Very good observations! And you are completely right -- we compute the overflow such that the check in the mul function would fit into a scalar field. On a very high level part of the check is computing f(c) = f_1 * c + f_2 * c^2 + f_3 * c^3 + ... f^6 * c^6 for some small constants 1,...,n. But now looking at it, it looks like we do not take into account that c also adds a few bits. Usually the constants are very small and we have some spare bits here and there, so should be safe, but I think it is a bug.

And your remark really makes me think that we actually should compute two overflows in the mulPreCond function -- the actual overflow of the limbs and the "safe" overflow for checking in mul. But I guess it wouldn't save much as the change happens inside log2, so it may be a few bits here or there.

By the way, empirically actually it is never worth to amortise multiplications before modular reductions. As every multiplication increases the number of limbs by two, then the final modular reduction becomes really expensive. Lets say we have 4-limb elements, then single mul 8 limbs, second mul (two 8 limb elements) 16 limbs. Then when modreducing we have to binary decompose 16 very saturated (close to the bitwidth of the scalar field) elements. We can get it better with table lookups, but it is still expensive.

Due to that, I'm actually thinking about losing plain mul and replacing it with optimized modmul implementation. This would make the overflow computation irrelevant. But I haven't figured the optimized modmul out yet and it is waiting behind some other urgent work :).

0 replies

avras · 2023-04-27T06:54:32Z

avras
Apr 27, 2023
Author

Thanks for the quick response! I was not expecting one within a few hours.

It does not seem like the constraint check in mul needs a scalar field check as long as the limbs products do not overflow. This technique is from the xjsnark paper as far as I know. Here is the relevant portion from Section IV.B.

For $c \in {1,2, \ldots, 2m-1}$, we check constraints of the form $$\left(\sum_{i=0}^{2m-2} z_i c^i\right) = \left(\sum_{i=0}^{m-1} x_i c^i\right)\left(\sum_{i=0}^{m-1} y_i c^i\right).$$

There $x_i$'s and $y_i$'s represent the limbs of the inputs being multiplied. Suppose these limbs have bitwidths f.fParams.BitsPerLimb() + x.overflow and f.fParams.BitsPerLimb() + y.overflow respectively.

Let $$f(c) = \left(\sum_{i=0}^{m-1} x_i c^i\right)\left(\sum_{i=0}^{m-1} y_i c^i\right),$$ i.e. the right hand side of the above set of constraints as a function of $c$. Given $x$ and $y$, the values of $f(c)$ are known. So the $z_i$'s need to satisfy the following linear system of equations.

$$\begin{bmatrix} 1 & 1 & 1 & \cdots & 1 \\\ 1 & 2 & 2^2 & \cdots & 2^{2m-2} \\\ 1 & 3 & 3^2 & \cdots & 3^{2m-2} \\\ \vdots & \vdots & \vdots & \cdots & \vdots \\\ 1 & 2m-1 & (2m-1)^2 & \cdots & (2m-1)^{2m-2} \end{bmatrix} \begin{bmatrix} z_0 \\ z_1 \\ z_2 \\ \vdots \\ z_{2m-2} \end{bmatrix} = \begin{bmatrix} f(1) \\ f(2) \\ f(3) \\ \vdots \\ f(2m-1) \end{bmatrix}$$

As the coefficient matrix on the left is a Vandermonde matrix with distinct rows, it has an inverse. So the above system has a unique solution for the $z_i$'s in the native field. Let $p$ be the modulus of this native field.

We know one solution of the above linear system. It is given by

$$z_k = \underset{i+j=k}{\sum_{i=0}^{m-1} \sum_{j=0}^{m-1}} x_i y_j \bmod p,$$

for $k=0,1,2,\ldots,2m-2$. By uniqueness, this must be the only solution.

The maximum bitwidth of $$\underset{i+j=k}{\sum_{i=0}^{m-1} \sum_{j=0}^{m-1}} x_i y_j$$ is bounded by f.fParams.BitsPerLimb() *2 + x.overflow + y.overflow + Ceil(Log2(m)). The last term is for the carry bits from adding at most $m$ products of the form $x_iy_j$ where each such product occupies f.fParams.BitsPerLimb() *2 + x.overflow + y.overflow bits.

If we can ensure that this maximum bitwidth does not exceed $\lceil \log2(p) \rceil - 1$ (the capacity of the native field), we are guaranteed that there is no wraparound in the below equation. This is the "as long as the limbs products do not overflow" clause at the beginning of my response.

$$z_k = \underset{i+j=k}{\sum_{i=0}^{m-1} \sum_{j=0}^{m-1}} x_i y_j \bmod p.$$

So we can assume equality without the $\bmod p$. That is, we have

$$z_k = \underset{i+j=k}{\sum_{i=0}^{m-1} \sum_{j=0}^{m-1}} x_i y_j.$$

Let me know if I got something wrong in this argument. I had a (albeit) quick look at the arkworks-rs/nonnative code. The pre_mul_reduce function there does not seem to account for the bitwidth of the constraint check. For instance, the bitwidth of $c^i$ is not considered.

2 replies

ivokub Apr 27, 2023
Maintainer

Yup, seems right. This is definitely one way to compute z_k without needing to take into account the widths of the small constants c.

However, this is not always the most optimal to perform in-circuit. When we would compute:

z_k = \sum_{i=0,...,m-1; j=0,...,m-1;i+j=k} x_i y_j

Then we have nbLimbs^2 pairs to compute. In a SNARK circuit this adds a new constraint. And for PLONK, we also have to add a constraint for every addition. So, the complexity is O(nbLimbs^2).

However, if we would do it such that we compute all z_k outside the circuit using hints (non-deterministic assignment), then we have to check that all z_k are correctly computed. For that what we can do is that we consider polynomials

Z(c) = \sum z_i c^i
X(c) = \sum x_i c^i
Y(c) = \sum y_i c^i

and have to show that

Z(c) = X(c) Y(c).

We can use Schwartz-Zippel lemma, but the problem is that we don't have randomness in-circuit! But it isn't a problem, the degree of Z(c) is small (2nblimbs-1) and it is sufficient to evaluate all polys at 2nblimbs points and compare. In R1CS we get the added benefit that if the points are constants, then variable multiplication by constant is free (we only multiply the coefficients), as are additions. So the total complexity is 2*nbLimbs * 2 (for every constant c need to have one multiplication X(c) * Y(c) and one equality check). So, asymptotically it is cheaper.

In practice, the asymptotics kick in around nblimbs = 3,4 (there are other factors also).

avras Apr 27, 2023
Author

Thanks for the clarifications!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Want to understand nextOverflow calculation in std/math/emulated/field_ops.go > mulPreCond #662

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Want to understand nextOverflow calculation in std/math/emulated/field_ops.go > mulPreCond #662

avras Apr 26, 2023

Replies: 2 comments · 2 replies

ivokub Apr 26, 2023 Maintainer

avras Apr 27, 2023 Author

ivokub Apr 27, 2023 Maintainer

avras Apr 27, 2023 Author

avras
Apr 26, 2023

Replies: 2 comments 2 replies

ivokub
Apr 26, 2023
Maintainer

avras
Apr 27, 2023
Author

ivokub Apr 27, 2023
Maintainer

avras Apr 27, 2023
Author