-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: number sections must be larger than 0 #358
Comments
Hello! Thanks for raising the issue. I'll take a look at get back to you on this. |
Hi @SeolhwaLee. Are you using a different sampler that is resulting in an empty batch? If yes, is that intentional? |
@karthikprasad I used RandomSampler like this.
The empty batch is not my intent. Maybe it needs somehow raise empty batch process logic on this code? |
Hi @SeolhwaLee , Since you are explicitly using RandomSampler, I suspect you are trying out something different? Could you share your code notebook and context? |
Hello @SeolhwaLee, Has your issue been resolved? |
I don't have time to investigate this recently. But will do it ASAP. I will comment here if have any progress. |
Hi, I'm having the same issue. Have you found a way to solve it @SeolhwaLee? |
@lucacorbucci , would you mind sharing your fully reproducible code on a colab? That would help us tremendously to debug the issue. |
Hi @karthikprasad I will try, but I don't know if it is possible to reproduce it on colab. I am trying to run a federated learning algorithm. I have several clients training a model and each of them uses Opacus. |
Hi @lucacorbucci I am using Opacus in the same way as you are. My question is: do you feed the client models to the make_private function after each round of federation or only once at the start of the whole process? I am trying to use the make_private_with_epsilon function and am worried about the privacy accounting process. I have to specify the number of epochs there and I am wondering if I should give epochs = federation rounds * local epochs or epochs = local epochs. |
I also had this doubt in the past. I don't know if there is a correct answer to this question. If we're in a cross-device scenario where there are millions of nodes, I'd go with the epochs = local epochs approach because you don't know in advance how many federation rounds you will perform. Instead, If we're in a cross-silo device, maybe it could also be possible to know in advance the amount of federation rounds. Then in this case it would be possible to use epochs = federation rounds * local epochs. Have you considered the use of the function make_private passing the noise as a parameter? In this case you don't need to specify the number of epochs in advance |
I have. I am actually working on this as part of a research project, so in my case, it makes more sense for me to fix the value of epsilon and check model performance at those values. I have actually not yet experimented with either case because of this issue that there is a BatchNorm function in my code whose per_sample gradients are not being populated so I am getting a "Per sample gradient is not initialized. Not updated in backward pass?" error. I know for a fact that the BatchNorm is being used because its gradient is being populated but not its per_sample gradient for some reason. Any idea about this? |
BatchNorm is not DP friendly, have you used the ModuleValidator? https://opacus.ai/tutorials/guide_to_module_validator |
Absolutely, I used ModuleValidator before anything else. Even after that, this one particular LayerNorm is causing me issues. What I don't understand is under what circumstances can it happen that a layer's grad is populated but not its per_sample grad. I even opened an issue about it a few minutes ago. |
Hi, I've seen the issue you opened but I don't have a solution. |
Hi,
I tried to apply OPACUS in my model. But I've got this error when I ran the code like below.
One of my hack approaches was worked well without error when I added the below code in
opacus/utils/batch_memory_manager.py
for avoiding this error.But, the generated result was quite weird so I'm suspicious this hack was affecting model performance maybe?
Could you give any feedback that this hack is okay??
Thank you in advance!
P.S. I used this tutorial https://opacus.ai/tutorials/building_text_classifier (But different dataset and model)
The text was updated successfully, but these errors were encountered: