-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow initial runs followd by blazing fast computation #8415
Comments
Hi @h-OUS-e, I was able to replicate the code you shared. The initial slowness is due to the loading and compiling of the kernel. Once the kernel is loaded and compiled, it will be cached for use. Code:
Results: From the results, you can see that the first run (testOptim1:) takes longer compared to the subsequent runs (testOptim: 2, testOptim: 3, and testOptim: 4). This is due to the kernel loading and compiling, while the later multiplications take less time because the results are cached. Let me know if this helps! Thank you! |
Thank you @shmishra99. Is the loading and compiling of the kernel always slow? Preloading dummy tensors on initialization seemed to solve the issue. But for my main application, the range of possible tensors to be multiplied with each other is too large, and loading them all would make the initial loading of the page very slow. I basically need to load every possible dummy tensor, where the largest would be of a shape of [200,200] or more. Is there a better solution? For now, I set the backend to the cpu using tf.setBackend('cpu');, and to my surprise it has been very fast, which appears to work for the needs of my application. But beyond that, I remain curious if there is a way to still utilize the GPU without the slow loading and compiling time of the kernels? Thanks! |
@h-OUS-e , As per my understanding, kernels will need to be loaded at some point, whether it's during initial page load or when tensors are executed. While the CPU backend might perform well in terms of kernel loading, GPU backends are generally faster for tensor operations. |
I see, and you’re right that the CPU is slower for larger operations. Would you you know of a fast way to preload the kernels with dummy tensors assuming I want to load all possible tensors of shape (n,n)? Thank you!On Oct 18, 2024, at 7:42 AM, Shivam Mishra ***@***.***> wrote:
@h-OUS-e , As per my understanding, kernels will need to be loaded at some point, whether it's during initial page load or when tensors are executed. While the CPU backend might perform well in terms of kernel loading, GPU backends are generally faster for tensor operations.
Thank You!!
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
@h-OUS-e , To preload the tensors, you can use the following code:
However, I will say that this approach is not highly recommended, as it may slow down the initial page loading time. Please let me know if this helps. Thank you! |
Thank you @shmishra99. It helps but as you mentioned, produces slow initial page loading times. Are there other solutions? I checked tf.env().set('ENGINE_COMPILE_ONLY', true);, but I couldn't get it to produce meaningful results. |
Hi @h-OUS-e , Sorry for the late response. As I understand it, loading and compiling the kernel will take some time in all cases, so the app will experience some latency during page load or afterward. Thank You!! |
This issue has been marked stale because it has no recent activity since 7 days. It will be closed if no further activity occurs. Thank you. |
This issue was closed due to lack of activity after being marked stale for past 7 days. |
Hello,
I am having this issue where I run an initial computation, basic multiplication of matrices using TFJS. On the first run, the computation is extremely slow, taking 800ms. On the second run, it takes 5ms to run. I mocked up a simpler version for debugging and I get ~90ms on the first run and 1ms on subsequent runs. The JS code is here:
document.getElementById("testOptim").addEventListener('click', function(){
console.time('testOptim: ');
let a = tf.randomNormal([15, 3]);
let b = tf.randomNormal([15, 15]);
let c = tf.matMul(b, a);
console.timeEnd('testOptim: ');
});
And I am using latest tfjs module: <script src="https://cdn.jsdelivr.net/npm/@tensorflow/tfjs@latest/dist/tf.min.js"></script>
Based on previous reported issues, I suspect the problem is with how initial caching of kernels is slow, but I couldn't find a good solution to fix it. Any directions or thoughts would be appreciated!
The text was updated successfully, but these errors were encountered: