ValueError: number sections must be larger than 0 #358

SeolhwaLee · 2022-02-15T22:22:10Z

Hi,

I tried to apply OPACUS in my model. But I've got this error when I ran the code like below.

Traceback (most recent call last):
  File "/home/seol/miniconda3/envs/pytorch_p37/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 772, in array_split
    Nsections = len(indices_or_sections) + 1
TypeError: object of type 'int' has no len()

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main_dp.py", line 462, in <module>
    main(args)
  File "main_dp.py", line 166, in main
    for step, batch in enumerate(tqdm(memory_safe_data_loader)):
  File "/home/seol/miniconda3/envs/pytorch_p37/lib/python3.7/site-packages/tqdm/std.py", line 1133, in __iter__
    for obj in iterable:
  File "/home/seol/miniconda3/envs/pytorch_p37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/seol/miniconda3/envs/pytorch_p37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 560, in _next_data
    index = self._next_index()  # may raise StopIteration
  File "/home/seol/miniconda3/envs/pytorch_p37/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 512, in _next_index
    return next(self._sampler_iter)  # may raise StopIteration
  File "/home/seol/miniconda3/envs/pytorch_p37/lib/python3.7/site-packages/opacus/utils/batch_memory_manager.py", line 48, in __iter__
    batch_idxs, math.ceil(len(batch_idxs) / self.max_batch_size)
  File "<__array_function__ internals>", line 6, in array_split
  File "/home/seol/miniconda3/envs/pytorch_p37/lib/python3.7/site-packages/numpy/lib/shape_base.py", line 778, in array_split
    raise ValueError('number sections must be larger than 0.')
ValueError: number sections must be larger than 0.

One of my hack approaches was worked well without error when I added the below code in opacus/utils/batch_memory_manager.py for avoiding this error.

def __iter__(self):
     for batch_idxs in self.samplers:
         if not bool(batch_idxs):
             continue

But, the generated result was quite weird so I'm suspicious this hack was affecting model performance maybe?

Could you give any feedback that this hack is okay??

Thank you in advance!

P.S. I used this tutorial https://opacus.ai/tutorials/building_text_classifier (But different dataset and model)

The text was updated successfully, but these errors were encountered:

karthikprasad · 2022-02-18T18:52:35Z

Hello! Thanks for raising the issue. I'll take a look at get back to you on this.

karthikprasad · 2022-02-19T00:08:28Z

Hi @SeolhwaLee. Are you using a different sampler that is resulting in an empty batch? If yes, is that intentional?

SeolhwaLee · 2022-02-21T11:19:32Z

@karthikprasad I used RandomSampler like this.

train_sampler = RandomSampler(train_dataset)
train_dataloader = DataLoader(
      dataset=train_dataset,
      sampler=train_sampler,
      collate_fn=train_dataset.collate_fn,
      batch_size=args.batch_size)

The empty batch is not my intent. Maybe it needs somehow raise empty batch process logic on this code?

karthikprasad · 2022-02-21T19:06:07Z

Hi @SeolhwaLee ,
The tutorial you linked does not use RandomSampler. When you call make_private() and pass your dataloader, Opacus internally switches it with DPDataLoader that uses Uniform Batch Sampler to sample according to SGM.
Also note that the DPDataloader automatically handles empty_batches.

Since you are explicitly using RandomSampler, I suspect you are trying out something different? Could you share your code notebook and context?

ashkan-software · 2022-03-01T23:31:29Z

Hello @SeolhwaLee,

Has your issue been resolved?

SeolhwaLee · 2022-03-03T13:24:31Z

Hi @ashkan-software

I don't have time to investigate this recently. But will do it ASAP.

I will comment here if have any progress.

lucacorbucci · 2022-12-13T09:26:11Z

Hi, I'm having the same issue. Have you found a way to solve it @SeolhwaLee?

karthikprasad · 2022-12-13T13:01:09Z

@lucacorbucci , would you mind sharing your fully reproducible code on a colab? That would help us tremendously to debug the issue.

lucacorbucci · 2022-12-14T08:31:08Z

Hi @karthikprasad I will try, but I don't know if it is possible to reproduce it on colab. I am trying to run a federated learning algorithm. I have several clients training a model and each of them uses Opacus.
The repo with the code I've written is not public yet but maybe I can share it with you when I publish it with a script to reproduce the bug.

anirban-nath · 2023-03-28T11:13:23Z

Hi @lucacorbucci I am using Opacus in the same way as you are. My question is: do you feed the client models to the make_private function after each round of federation or only once at the start of the whole process?

I am trying to use the make_private_with_epsilon function and am worried about the privacy accounting process. I have to specify the number of epochs there and I am wondering if I should give epochs = federation rounds * local epochs or epochs = local epochs.

lucacorbucci · 2023-03-28T13:25:29Z

I am trying to use the make_private_with_epsilon function and am worried about the privacy accounting process. I have to specify the number of epochs there and I am wondering if I should give epochs = federation rounds * local epochs or epochs = local epochs.

I also had this doubt in the past. I don't know if there is a correct answer to this question. If we're in a cross-device scenario where there are millions of nodes, I'd go with the epochs = local epochs approach because you don't know in advance how many federation rounds you will perform. Instead, If we're in a cross-silo device, maybe it could also be possible to know in advance the amount of federation rounds. Then in this case it would be possible to use epochs = federation rounds * local epochs.

Have you considered the use of the function make_private passing the noise as a parameter? In this case you don't need to specify the number of epochs in advance

anirban-nath · 2023-03-29T06:19:03Z

Have you considered the use of the function make_private passing the noise as a parameter? In this case, you don't need to specify the number of epochs in advance

I have. I am actually working on this as part of a research project, so in my case, it makes more sense for me to fix the value of epsilon and check model performance at those values. I have actually not yet experimented with either case because of this issue that there is a BatchNorm function in my code whose per_sample gradients are not being populated so I am getting a "Per sample gradient is not initialized. Not updated in backward pass?" error.

I know for a fact that the BatchNorm is being used because its gradient is being populated but not its per_sample gradient for some reason. Any idea about this?

lucacorbucci · 2023-03-29T08:00:50Z

BatchNorm function in my code whose per_sample gradients are not being populated so I am getting a "Per sample gradient is not initialized. Not updated in backward pass?" error.

BatchNorm is not DP friendly, have you used the ModuleValidator? https://opacus.ai/tutorials/guide_to_module_validator

anirban-nath · 2023-03-29T08:07:17Z

BatchNorm is not DP friendly, have you used the ModuleValidator? https://opacus.ai/tutorials/guide_to_module_validator

Absolutely, I used ModuleValidator before anything else. Even after that, this one particular LayerNorm is causing me issues. What I don't understand is under what circumstances can it happen that a layer's grad is populated but not its per_sample grad. I even opened an issue about it a few minutes ago.

lucacorbucci · 2023-03-29T09:34:34Z

Hi, I've seen the issue you opened but I don't have a solution.
Maybe a possible workaround could be to use Opacus with functorch. I'm not 100% sure that it will work but here, they said: "With functorch, Opacus can now handle almost all input models, removing previous limitation where we could only handle certain standard layers.".
I think that it is worth a try

ffuuugor added the bug Something isn't working label Mar 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: number sections must be larger than 0 #358

ValueError: number sections must be larger than 0 #358

SeolhwaLee commented Feb 15, 2022

karthikprasad commented Feb 18, 2022

karthikprasad commented Feb 19, 2022

SeolhwaLee commented Feb 21, 2022

karthikprasad commented Feb 21, 2022 •

edited

Loading

ashkan-software commented Mar 1, 2022

SeolhwaLee commented Mar 3, 2022

lucacorbucci commented Dec 13, 2022

karthikprasad commented Dec 13, 2022

lucacorbucci commented Dec 14, 2022

anirban-nath commented Mar 28, 2023

lucacorbucci commented Mar 28, 2023

anirban-nath commented Mar 29, 2023

lucacorbucci commented Mar 29, 2023

anirban-nath commented Mar 29, 2023

lucacorbucci commented Mar 29, 2023

ValueError: number sections must be larger than 0 #358

ValueError: number sections must be larger than 0 #358

Comments

SeolhwaLee commented Feb 15, 2022

karthikprasad commented Feb 18, 2022

karthikprasad commented Feb 19, 2022

SeolhwaLee commented Feb 21, 2022

karthikprasad commented Feb 21, 2022 • edited Loading

ashkan-software commented Mar 1, 2022

SeolhwaLee commented Mar 3, 2022

lucacorbucci commented Dec 13, 2022

karthikprasad commented Dec 13, 2022

lucacorbucci commented Dec 14, 2022

anirban-nath commented Mar 28, 2023

lucacorbucci commented Mar 28, 2023

anirban-nath commented Mar 29, 2023

lucacorbucci commented Mar 29, 2023

anirban-nath commented Mar 29, 2023

lucacorbucci commented Mar 29, 2023

karthikprasad commented Feb 21, 2022 •

edited

Loading