Conversation
WardLT
left a comment
There was a problem hiding this comment.
It'll need that event to block/unlock the runner due to weights.
Another thing you'd want are tests. Might as well start testing these agents in isolation now using a faked Auditor that'll pass whichever signal you tell it to (I can show you an example)
cascade/agents/agents.py
Outdated
| # await self.queue.put(spec) | ||
|
|
||
| @action | ||
| async def receive_weights(self, weights: bytes) -> None: |
There was a problem hiding this comment.
I might suggest having "receive weights" also receive a version number, in the off chance a worker somehow misses a version (no idea how, but remain paranoid about messages dropping).
You could also store a hash of the weights if you feel really fancy.
There was a problem hiding this comment.
yes this is a good idea
cascade/agents/agents.py
Outdated
| self.attempt += 1 | ||
| # wait for new weights | ||
| # not sure how best to do this | ||
| # await_event_async from academy? |
There was a problem hiding this comment.
I would do an event. It'll take a few parts:
- Each have its own
weights_updatedEvent as an attribute - This loop will clear the event if an audit fails and then await it being set
- The
receive_weightsfunction sets the Event whenever it finishes updating the model weights.
There was a problem hiding this comment.
easy, i can implement this pattern before we meet
WardLT
left a comment
There was a problem hiding this comment.
Seems right to me. We'll talk tests today
Refactor so that we have one
DynamicsRunneragent per trajectory. TheDynamicsRunneragent owns the state for that trajectory. It will pass this around as needed as metadata to other agents. This will greatly reduce the need for agents to share state information and simplify the code, making scalable distributed runs more accessible.