Skip to content

Tutorial: computer vision #216

@justheuristic

Description

@justheuristic

Let's add a tutorial for training VIT/ResNet50 with Decentralized SGD

The intent is to use DecentralizedSGD optimizer with vissl library for swav.

Here's a basic tutorial for training simclr in vissl: https://colab.research.google.com/drive/1Rt3Plt3ph84i1A-eolLFafybwjrBFxYe?usp=sharing .

The engineering is up to you, but it appears that the two hardest tasks will be to

The main request for DecentralizedSGD is to implement a training regime where it would be able to run averaging all the time with latest parameters. The issue with the current implementation is that DecentralizedSGD will spend up to half of the time looking for groups and when it will end up averaging model parameters, these parameters will be an older snapshot since before the averager began looking for group. Finally, when the averager actually updates model parameters, these updates will disregard any local changes to model parameters made during averaging.

Here's a few ideas on how to improve DecentralizedSGD:

  • after an evaraging round, instead of overwriting, it would be better to compute weight = weight + averaged_weight - weight_before_averaging_step. This will prevent DecentralizedSGD from disregarding local updates concurrent with averager.step.
  • in DecentralizedAverager, we can implement a callback that allows the user to update the model parameters right before the beginning of AllReduce (i.e. after the group is formed). This should significantly reduce the staleness of averaged parameters.
  • in DecentralizedSGD, we can modify the code for calling averager step to allow for concurrent matchmaking and allreduce. In other words, once the averager has found one group, let him immediately look for the next group while still running allreduce.

Implementing this into an example will require the following steps:

  • create a root folder, e.g. ./hivemind/examples/swav, containing...
  • modified training runner that uses DecentralizedSGD
  • basic README similar to this or this
    • describe what it does (and how)
    • list additional requirements
    • full how-to-run guide

Metadata

Metadata

Labels

enhancementNew feature or requesthelp wantedExtra attention is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions