You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The errors we report in case of OOM and Segmentation-Fault are now much better, but I was wondering is there a way to make them more "user-friendly"?
Currently, at least for the MPI case, we report the mpiexec output, which is great, but could there be a way to report a cleaner error in addition to this, that could clearly identify this as a OOM error (or a seg-fault if possible)?
Is there something that packages (like Bodo) could do to make this experience better/easier?
What's the best way to automate restart of engines in this case? Ideally, if enabled, in cases where the engines crash, if we could clean up the processes, display a message (e.g. "engines crashed due to OOM, restarting engines..."), and then restart the engines, that would be useful.