You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, when I explored the cluster utilisation rate (number of running GPUs / total number of GPUs) based on the job start time, end time, and the number of GPUs for each job, I found that the maximum utilisation rate of the Kalos cluster is only around 70%, and there are lots of periods where less than 40% or even 20% of the total GPUs of the cluster are used, which is quite weird and is not the case for Seren. I also found that the Seren data has ~800k job records, while Kalos only has ~60k. Does this mean that not all jobs are recorded for Kalos, which further leads to the severe under-utilisation?
Sincerely appreciate it if you could help clarify this. Also thank you so much for sharing this fantastic dataset.
The text was updated successfully, but these errors were encountered:
Hi, when I explored the cluster utilisation rate (number of running GPUs / total number of GPUs) based on the job start time, end time, and the number of GPUs for each job, I found that the maximum utilisation rate of the Kalos cluster is only around 70%, and there are lots of periods where less than 40% or even 20% of the total GPUs of the cluster are used, which is quite weird and is not the case for Seren. I also found that the Seren data has ~800k job records, while Kalos only has ~60k. Does this mean that not all jobs are recorded for Kalos, which further leads to the severe under-utilisation?
Sincerely appreciate it if you could help clarify this. Also thank you so much for sharing this fantastic dataset.
The text was updated successfully, but these errors were encountered: