Slow initial runs followd by blazing fast computation #8415

h-OUS-e · 2024-10-12T10:02:11Z

Hello,

I am having this issue where I run an initial computation, basic multiplication of matrices using TFJS. On the first run, the computation is extremely slow, taking 800ms. On the second run, it takes 5ms to run. I mocked up a simpler version for debugging and I get ~90ms on the first run and 1ms on subsequent runs. The JS code is here:

document.getElementById("testOptim").addEventListener('click', function(){
console.time('testOptim: ');
let a = tf.randomNormal([15, 3]);
let b = tf.randomNormal([15, 15]);
let c = tf.matMul(b, a);
console.timeEnd('testOptim: ');
});

And I am using latest tfjs module: <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@latest/dist/tf.min.js"></script>

Based on previous reported issues, I suspect the problem is with how initial caching of kernels is slow, but I couldn't find a good solution to fix it. Any directions or thoughts would be appreciated!

shmishra99 · 2024-10-17T17:37:04Z

Hi @h-OUS-e,

I was able to replicate the code you shared. The initial slowness is due to the loading and compiling of the kernel. Once the kernel is loaded and compiled, it will be cached for use.

Code:

console.time('testOptim1: ');
let a = tf.randomNormal([15, 3]);
let b = tf.randomNormal([15, 15]);
let c = tf.matMul(b, a);
console.timeEnd('testOptim1: ');

console.time('testOptim: 2 ');
a = tf.randomNormal([16, 4]);
b = tf.randomNormal([16, 16]);
c = tf.matMul(b, a);
console.timeEnd('testOptim: 2 ');

console.time('testOptim: 3 ');
a = tf.randomNormal([17, 4]);
b = tf.randomNormal([17, 17]);
c = tf.matMul(b, a);
console.timeEnd('testOptim: 3 ');

console.time('testOptim: 4 ');
a = tf.randomNormal([18, 5]);
b = tf.randomNormal([18, 18]);
c = tf.matMul(b, a);
console.timeEnd('testOptim: 4 ');

Results:

From the results, you can see that the first run (testOptim1:) takes longer compared to the subsequent runs (testOptim: 2, testOptim: 3, and testOptim: 4). This is due to the kernel loading and compiling, while the later multiplications take less time because the results are cached.
To improve performance, you could consider registering the kernel during tfjs initialization by loading dummy tensors.

Let me know if this helps!

Thank you!

h-OUS-e · 2024-10-17T19:56:28Z

Thank you @shmishra99. Is the loading and compiling of the kernel always slow? Preloading dummy tensors on initialization seemed to solve the issue. But for my main application, the range of possible tensors to be multiplied with each other is too large, and loading them all would make the initial loading of the page very slow. I basically need to load every possible dummy tensor, where the largest would be of a shape of [200,200] or more. Is there a better solution?

For now, I set the backend to the cpu using tf.setBackend('cpu');, and to my surprise it has been very fast, which appears to work for the needs of my application. But beyond that, I remain curious if there is a way to still utilize the GPU without the slow loading and compiling time of the kernels?

Thanks!

shmishra99 · 2024-10-18T05:42:16Z

@h-OUS-e , As per my understanding, kernels will need to be loaded at some point, whether it's during initial page load or when tensors are executed. While the CPU backend might perform well in terms of kernel loading, GPU backends are generally faster for tensor operations.
Thank You!!

h-OUS-e · 2024-10-18T06:55:42Z

I see, and you’re right that the CPU is slower for larger operations. Would you you know of a fast way to preload the kernels with dummy tensors assuming I want to load all possible tensors of shape (n,n)? Thank you!On Oct 18, 2024, at 7:42 AM, Shivam Mishra ***@***.***> wrote: @h-OUS-e , As per my understanding, kernels will need to be loaded at some point, whether it's during initial page load or when tensors are executed. While the CPU backend might perform well in terms of kernel loading, GPU backends are generally faster for tensor operations. Thank You!! —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>

shmishra99 · 2024-10-18T07:22:13Z

@h-OUS-e , To preload the tensors, you can use the following code:

tf.ready().then(() => {
  const LoadTensor = tf.randomNormal([n, n]);
  LoadTensor.data();
});

However, I will say that this approach is not highly recommended, as it may slow down the initial page loading time.

Please let me know if this helps. Thank you!

h-OUS-e · 2024-10-19T14:04:54Z

Thank you @shmishra99. It helps but as you mentioned, produces slow initial page loading times. Are there other solutions? I checked tf.env().set('ENGINE_COMPILE_ONLY', true);, but I couldn't get it to produce meaningful results.

shmishra99 · 2024-10-28T05:19:42Z

Hi @h-OUS-e ,

Sorry for the late response. As I understand it, loading and compiling the kernel will take some time in all cases, so the app will experience some latency during page load or afterward.

Thank You!!

github-actions · 2024-11-05T02:00:53Z

This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you.

github-actions · 2024-11-13T02:01:18Z

This issue was closed due to lack of activity after being marked stale for past 7 days.

google-ml-butler · 2024-11-13T02:01:20Z

Are you satisfied with the resolution of your issue?
Yes
No

h-OUS-e added the type:others label Oct 12, 2024

shmishra99 self-assigned this Oct 14, 2024

shmishra99 added type:performance and removed type:others labels Oct 17, 2024

shmishra99 added the stat:awaiting response label Oct 17, 2024

google-ml-butler bot removed the stat:awaiting response label Oct 17, 2024

shmishra99 added the stat:awaiting response label Oct 18, 2024

google-ml-butler bot removed the stat:awaiting response label Oct 18, 2024

shmishra99 added the stat:awaiting response label Oct 18, 2024

google-ml-butler bot removed the stat:awaiting response label Oct 19, 2024

shmishra99 added the stat:awaiting response label Oct 28, 2024

github-actions bot added the stale label Nov 5, 2024

github-actions bot closed this as completed Nov 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow initial runs followd by blazing fast computation #8415

Slow initial runs followd by blazing fast computation #8415

h-OUS-e commented Oct 12, 2024

shmishra99 commented Oct 17, 2024

h-OUS-e commented Oct 17, 2024

shmishra99 commented Oct 18, 2024

h-OUS-e commented Oct 18, 2024 via email

shmishra99 commented Oct 18, 2024

h-OUS-e commented Oct 19, 2024

shmishra99 commented Oct 28, 2024

github-actions bot commented Nov 5, 2024

github-actions bot commented Nov 13, 2024

google-ml-butler bot commented Nov 13, 2024

Slow initial runs followd by blazing fast computation #8415

Slow initial runs followd by blazing fast computation #8415

Comments

h-OUS-e commented Oct 12, 2024

shmishra99 commented Oct 17, 2024

h-OUS-e commented Oct 17, 2024

shmishra99 commented Oct 18, 2024

h-OUS-e commented Oct 18, 2024 via email

shmishra99 commented Oct 18, 2024

h-OUS-e commented Oct 19, 2024

shmishra99 commented Oct 28, 2024

github-actions bot commented Nov 5, 2024

github-actions bot commented Nov 13, 2024

google-ml-butler bot commented Nov 13, 2024