Skip to content

Commit

Permalink
[rqd] Add frame recovery logic for docker mode
Browse files Browse the repository at this point in the history
Whenever rqd restarts it loses track of all the frames launched by it that haven't
finished. This change adds a new configurable option to backup frame states to a file,
that is used to recover the frame cache state and try to re-bind to the running frames.

This first version only works on docker mode
  • Loading branch information
DiegoTavares committed Dec 10, 2024
1 parent 291b694 commit d7ac144
Show file tree
Hide file tree
Showing 6 changed files with 400 additions and 73 deletions.
1 change: 1 addition & 0 deletions proto/rqd.proto
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@ message RunFrame {
int32 num_gpus = 23;
report.ChildrenProcStats children = 24;
string os = 25;
int32 pid = 26;
}

message RunFrameSeq {
Expand Down
5 changes: 5 additions & 0 deletions rqd/rqd/rqconstants.py
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,11 @@
DOCKER_MOUNTS = []
DOCKER_SHELL_PATH = "/bin/sh"

# Backup running frames cache. Backup cache is turned off if this path is set to
# None or ""
BACKUP_CACHE_PATH = "/tmp/opencue/running_frames_backup.dat"
BACKUP_CACHE_TIME_TO_LIVE_SECONDS = 60

try:
if os.path.isfile(CONFIG_FILE):
# Hostname can come from here: rqutil.getHostname()
Expand Down
Loading

0 comments on commit d7ac144

Please sign in to comment.