Scheduler remove lock contention #179

musitdev · 2024-04-05T16:39:43Z

I've added the unexecuted Tx verification at startup. Now, unexecuted Tx are executed.

I've put a zombies VM detection. I use the client.is_alive() like the current scheduler.

From my test I see that zombies VM always return true to this call and stay running indefinitely using all the CPU. From the VM logs, the kernel has crashed, but it seems to still running.
They are stopped after the MAX_VM_RUN_TIME, but It takes time.

So the current implementation halt crashed VM only after the MAX_VM_RUN_TIME time. I didn't manage to have a VM that crash and doesn't seem alive.

I think to do a better Zombie detection we can implement 2 things:

detected that the VM didn't call get_task() after a certain time. I didn't do the test, but it seems that the VM crash early.
Add some activity detection using the file system. In the shim SDK, we can create a task that looks at a file and, if it's not present, recreate it. This way, by removing the file, we can detect if it's recreated or not.

From my test, the number of zombie VM depends on the node CPU usage.

It's a raw implementation to test. I think we can dissociate the task scheduling part and the VM management part to simplify the loop and the VM (start/stop/zombie) management.

musitdev added 3 commits March 24, 2024 18:12

refactor scheduler to remove locks

3a5c7ef

add a pending task in scheduler to absorbe back presure

f30bc5b

add unexecuted Tx verification at start up and zombies VM detection

4fd3224

musitdev requested a review from tuommaki April 5, 2024 16:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scheduler remove lock contention #179

Scheduler remove lock contention #179

musitdev commented Apr 5, 2024

Scheduler remove lock contention #179

Are you sure you want to change the base?

Scheduler remove lock contention #179

Conversation

musitdev commented Apr 5, 2024