Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to converge when changing num_users and frac #11

Open
jkup64 opened this issue Apr 21, 2022 · 1 comment
Open

Failed to converge when changing num_users and frac #11

jkup64 opened this issue Apr 21, 2022 · 1 comment

Comments

@jkup64
Copy link

jkup64 commented Apr 21, 2022

Description

When I change the num_user to 10 and frac to 0.3 with --iid, which means each epoch there are 3 client been choosen, I find the model become better then worse.

Reproduce

$ python main_fed.py --dataset mnist --model mlp --num_classes 10 --epochs 1000 --lr 0.05 --num_users 10 --shard_per_user 2 --frac 0.3 --local_ep 1 --local_bs 8 --results_save run1 --iid

Out

device: cuda:0
MLP(
  (layer_input): Linear(in_features=784, out_features=512, bias=True)
  (relu): ReLU()
  (dropout): Dropout(p=0.5, inplace=False)
  (layer_hidden1): Linear(in_features=512, out_features=256, bias=True)
  (layer_hidden2): Linear(in_features=256, out_features=256, bias=True)
  (layer_hidden3): Linear(in_features=256, out_features=128, bias=True)
  (layer_out): Linear(in_features=128, out_features=10, bias=True)
  (softmax): Softmax(dim=1)
)
Round 0, lr: 0.050000, [5 6 0]
Round   0, Average loss 2.038, Test loss 1.794, Test accuracy: 67.63
Round 1, lr: 0.050000, [6 4 5]
Round   1, Average loss 1.748, Test loss 1.611, Test accuracy: 85.05
Round 2, lr: 0.050000, [7 9 4]
Round   2, Average loss 1.761, Test loss 1.717, Test accuracy: 74.39
Round 3, lr: 0.050000, [7 4 9]
Round   3, Average loss 1.856, Test loss 1.843, Test accuracy: 61.74
Round 4, lr: 0.050000, [9 2 5]
Round   4, Average loss 1.948, Test loss 1.863, Test accuracy: 59.83
Round 5, lr: 0.050000, [2 6 7]
Round   5, Average loss 2.039, Test loss 1.990, Test accuracy: 47.11
Round 6, lr: 0.050000, [0 7 2]
Round   6, Average loss 2.025, Test loss 1.997, Test accuracy: 46.39
Round 7, lr: 0.050000, [4 3 2]
Round   7, Average loss 2.017, Test loss 2.104, Test accuracy: 35.68
Round 8, lr: 0.050000, [2 9 1]
Round   8, Average loss 2.128, Test loss 2.113, Test accuracy: 34.82
Round 9, lr: 0.050000, [2 7 5]
Round   9, Average loss 2.127, Test loss 2.190, Test accuracy: 27.09
Round 10, lr: 0.050000, [1 9 7]
Round  10, Average loss 2.194, Test loss 2.239, Test accuracy: 22.21
Round 11, lr: 0.050000, [0 2 3]
Round  11, Average loss 2.236, Test loss 2.186, Test accuracy: 27.53
Round 12, lr: 0.050000, [3 9 5]
Round  12, Average loss 2.188, Test loss 2.108, Test accuracy: 35.29
Round 13, lr: 0.050000, [3 6 5]
Round  13, Average loss 2.172, Test loss 2.237, Test accuracy: 22.45
Round 14, lr: 0.050000, [9 8 4]
Round  14, Average loss 2.258, Test loss 2.175, Test accuracy: 28.61
Round 15, lr: 0.050000, [2 7 1]
Round  15, Average loss 2.178, Test loss 2.161, Test accuracy: 29.99
Round 16, lr: 0.050000, [9 6 4]
Round  16, Average loss 2.192, Test loss 2.280, Test accuracy: 18.10
Round 17, lr: 0.050000, [2 4 0]
Round  17, Average loss 2.284, Test loss 2.125, Test accuracy: 33.60
Round 18, lr: 0.050000, [4 1 0]
Round  18, Average loss 2.226, Test loss 2.352, Test accuracy: 10.94
Round 19, lr: 0.050000, [6 0 7]
Round  19, Average loss 2.355, Test loss 2.352, Test accuracy: 10.94
Round 20, lr: 0.050000, [1 8 6]
Round  20, Average loss 2.351, Test loss 2.339, Test accuracy: 12.24
Round 21, lr: 0.050000, [1 2 3]
Round  21, Average loss 2.338, Test loss 2.339, Test accuracy: 12.24
Round 22, lr: 0.050000, [9 3 1]
Round  22, Average loss 2.340, Test loss 2.339, Test accuracy: 12.24
Round 23, lr: 0.050000, [4 2 0]
Round  23, Average loss 2.337, Test loss 2.339, Test accuracy: 12.24
Round 24, lr: 0.050000, [8 1 5]
@jkup64 jkup64 changed the title Failed to converge when changing num_users and frac, when i.i.d Failed to converge when changing num_users and frac, or whether set --i.i.d or not Apr 21, 2022
@jkup64 jkup64 changed the title Failed to converge when changing num_users and frac, or whether set --i.i.d or not Failed to converge when changing num_users and frac, or set --i.i.d Apr 21, 2022
@jkup64 jkup64 changed the title Failed to converge when changing num_users and frac, or set --i.i.d Failed to converge when changing num_users and frac Apr 21, 2022
@jkup64
Copy link
Author

jkup64 commented Apr 21, 2022

You can solve this problem simply by set lr_decay =0.95 and replae

w_local, loss = local.train(net=net_local.to(args.device))

with

w_local, loss = local.train(net=net_local.to(args.device), lr=lr)

or choose other powerful optim rather than SGD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant