Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List of refactoring and code improvement opportunities #114

Open
rishi-s8 opened this issue Oct 18, 2024 · 1 comment
Open

List of refactoring and code improvement opportunities #114

rishi-s8 opened this issue Oct 18, 2024 · 1 comment

Comments

@rishi-s8
Copy link
Collaborator

I am listing a few things that would improve the performance and consistency of the code:

  1. Use torch functions and tensors for as many things as possible, including model averaging. Reduce the use of Python data types as much as possible.
  2. Migrate functions that use numpy and numpy arrays to torch tensors.
  3. Ideally create an append-only log, for example, for the accuracy, loss and similar things, create a CSV at the start, and then each round just appends a line at the end instead of maintaining the whole log in memory.
  4. As mentioned in Improve GRPC broadcast implementation #65, grpc all_gather, and receives from multiple nodes is sequential and blocks until it receives the messages in order. A better way to do this might be to interleave synchronous waiting with the actual message and when the condition is not satisfied (not in the current round or the node is too busy), move to another node and come back to this node later.
  5. Ideally, we should not poll the current round of another node through recurrent messages. We can use something like a condition while asking for a round, and the polled node will respond when the condition is satisfied.
  6. The choice of synchronous or not should be for each receive and not on the state of the node.

Feel free to add things to this list as a comment on this issue.

@rishi-s8
Copy link
Collaborator Author

  1. Trap KeyboardInterrupt and kill all processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant