Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues about create_data #5

Open
sunnyHelen opened this issue May 10, 2022 · 21 comments
Open

issues about create_data #5

sunnyHelen opened this issue May 10, 2022 · 21 comments

Comments

@sunnyHelen
Copy link

Hi, thanks for sharing your great work. I encounter some issues during creating data by running create_data.py
First
create reduced point cloud for training set
[ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last):
File "tools/create_data.py", line 247, in
out_dir=args.out_dir)
File "tools/create_data.py", line 24, in kitti_data_prep
kitti.create_reduced_point_cloud(root_path, info_prefix)
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/kitti_converter.py", line 374, in create_reduced_point_cloud
_create_reduced_point_cloud(data_path, train_info_path, save_path)
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/kitti_converter.py", line 314, in _create_reduced_point_cloud
count=-1).reshape([-1, num_features])
ValueError: cannot reshape array of size 461536 into shape (6)

It seems to set the num_features=4 and front_camera_id=2?
in this line:

I assume doing this can solve the problem but encounter another problem when
Create GT Database of KittiDataset
[ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last):
File "tools/create_data.py", line 247, in
out_dir=args.out_dir)
File "tools/create_data.py", line 44, in kitti_data_prep
with_bbox=True) # for moca
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/create_gt_database.py", line 275, in create_groundtruth_database
P0 = np.array(example['P0']).reshape(4, 4)
KeyError: 'P0'

Can you help me figure out how to solve these issues?

@zhyever
Copy link
Owner

zhyever commented May 11, 2022

You should set front_camera_id as 0 for KITTI.

:D Since the released codes are only supporting pre-training on KITTI, data preparation is similar to standard mmdet3d. So, you can utilize the standard mmdet3d (correct version introduced in README.md) to run create_data.py and then link the prepared data to the simipu repo.

@sunnyHelen
Copy link
Author

Thank you for your quick reply.
when I create GT Database of KittiDataset
[ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last):
File "tools/create_data.py", line 247, in
out_dir=args.out_dir)
File "tools/create_data.py", line 44, in kitti_data_prep
with_bbox=True) # for moca
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/create_gt_database.py", line 275, in create_groundtruth_database
P0 = np.array(example['P0']).reshape(4, 4)
KeyError: 'P0'

P0 = np.array(example['P0']).reshape(4, 4)

It seems no P0 key. And there are some different places compared with the mmdet3d one. How should I properly creat the data?

@zhyever
Copy link
Owner

zhyever commented May 18, 2022

Sorry that I missed your problems since I was busy recently. There is a problem with my last answer. You should set front_camera_id=2.

Actually, I recommend that you clone the mmdet3d and utilize the official codes to generate the KITTI dataset. You can directly link the mmdet3d-generated KITTI to the SimIPU repo.

@sunnyHelen
Copy link
Author

Got it. Thanks for your reply.

@sunnyHelen
Copy link
Author

But I encounter a problem when I attempt to conduct Camera-lidar fusion-based 3D object detection on kitti dataset.
I follow your instruction to do that:
bash tools/dist_train.sh project_cl/configs/kitti_det3d/moca_r50_kitti.py 8 --work-dir work_dir/

But there is a problem when loading data. Does it seem related to the data label? Could please help me?

Original Traceback (most recent call last):
File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/lustre/chen/hzha/mmdetection/mmdet/datasets/dataset_wrappers.py", line 151, in getitem
return self.dataset[idx % self._ori_len]
File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/custom_3d.py", line 387, in getitem
data = self.prepare_train_data(idx)
File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/kitti_dataset.py", line 122, in prepare_train_data
example = self.pipeline(input_dict)
File "/mnt/lustre/chen/hzha/mmdetection/mmdet/datasets/pipelines/compose.py", line 40, in call
data = t(data)
File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/transforms_3d.py", line 185, in call
img=img)
File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 388, in sample_all
avoid_coll_boxes_2d)
File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in sample_class_v2
sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled],
File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in
sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled],
KeyError: 'box2d_camera'

@zhyever
Copy link
Owner

zhyever commented May 23, 2022

Oh, this issue is caused by the key of box2d_camera in dp_sampler. In 'tools/create_data.py', you can find the calling of create_groundtruth_database, which is used to generate the sampled objects for data augment. Since we choose the moca as our baseline method, there are tons of modifications to this ground_database generation function.

Hence, if you create the Kitti dataset via the official mmdet3d codebase, I think you should run the create_groundtruth_database function (comment other lines of code in the kitti_data_prep function) in SimIPU (or Moca) to create the sampled object dataset. If you have created the sampled object dataset via our codes, but there are still these bugs, please report to me and I will have a check. I run the codes before I push this repo to github, so there should have been OK.

@sunnyHelen
Copy link
Author

Thanks a lot. I used the official mmdet3d to create the data label before. I'll follow your instruction to run the create_groundtruth_database function.

@sunnyHelen
Copy link
Author

Hi. I tried to run the create_groundtruth_database function. But it seems we go back to the previous problem:

[ ] 0/3712, elapsed: 0s, ETA:Traceback (most recent call last):
File "tools/create_data.py", line 247, in
out_dir=args.out_dir)
File "tools/create_data.py", line 44, in kitti_data_prep
with_bbox=True) # for moca
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/tools/data_converter/create_gt_database.py", line 275, in create_groundtruth_database
P0 = np.array(example['P0']).reshape(4, 4)
KeyError: 'P0'

@zhyever
Copy link
Owner

zhyever commented May 29, 2022

Let me explain why there are problems. We first conduct experiments on KITTI dataset, where the used images come from the second camera. So, when creating the KITTI, all PX should be P2 (utilize the camera parameters from the second camera). Later, we try to do experiments on Waymo, where the utilized images are in the front view, having a number of 0. Hence, we hack the codes to generate related data with P0.

However, when I push the codes that only support KITTI, I forget to change the data-related codes to the KITTI version. So, you meet problems about KeyError: 'P0'. For KITTI, just utilize P2. :D

@sunnyHelen
Copy link
Author

Hi, thanks for your help. I successfully created the label after changing P0-->P2.
But the error still exists when:
bash tools/dist_train.sh project_cl/configs/kitti_det3d/moca_r50_kitti.py 8 --work-dir work_dir/

Original Traceback (most recent call last):
File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
data = fetcher.fetch(index)
File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in
data = [self.dataset[idx] for idx in possibly_batched_index]
File "/mnt/lustre/chenzhuo1/hzha/mmdetection/mmdet/datasets/dataset_wrappers.py", line 151, in getitem
return self.dataset[idx % self._ori_len]
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/custom_3d.py", line 387, in getitem
data = self.prepare_train_data(idx)
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/kitti_dataset.py", line 122, in prepare_train_data
example = self.pipeline(input_dict)
File "/mnt/lustre/chenzhuo1/hzha/mmdetection/mmdet/datasets/pipelines/compose.py", line 40, in call
data = t(data)
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/transforms_3d.py", line 185, in call
img=img)
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 388, in sample_all
avoid_coll_boxes_2d)
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in sample_class_v2
sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled],
File "/mnt/lustre/chenzhuo1/hzha/SimIPU/mmdet3d/datasets/pipelines/dbsampler.py", line 546, in
sp_boxes_2d = np.stack([i['box2d_camera'] for i in sampled],
KeyError: 'box2d_camera'

@zhyever
Copy link
Owner

zhyever commented May 29, 2022

I will have a check from scratch ASAP and update this repo. Btw, that's the problem only for the Moca training (our downstream task on 3D detection). While the gt_sampler does not work, you can still run the SimIPU since our pre-training method does not need any gt information.

@sunnyHelen
Copy link
Author

Yeah, I've tried the pretraining code, which is totally ok. Thanks for your help.

@bhavyagoyal
Copy link

Hi @zhyever, I am running into the same error (KeyError: 'box2d_camera') for the downstream evaluation on Kitti dataset. Pretraining step does not have any issue. Let me know if there is an update. Thanks for the help!

@sunnyHelen
Copy link
Author

Hi, is there any new thing about solving the problem?

@zhyever
Copy link
Owner

zhyever commented Jun 7, 2022

Sorry for the late.

Download the pkl and the zipped gt_database.

Rename the pkl file to kitti_dbinfos_train.pkl and put it under your data folder. Unzip the .zip file, rename the folder to kitti_gt_database, and put it under your data folder.

The result can be like this:
image

Then, run the training script again.

@sunnyHelen
Copy link
Author

Thanks a lot for your apply. It seems the data problem is solved. But there are still some problems while training.

Traceback (most recent call last):
File "tools/train.py", line 222, in
main()
File "tools/train.py", line 218, in main
meta=meta)
File "/mnt/lustre/chen/hzha/SimIPU/mmdet3d/apis/train.py", line 34, in train_model
meta=meta)
File "/mnt/lustre/chen/hzha/mmdetection/mmdet/apis/train.py", line 170, in train_detector
meta=meta)
File "/mnt/lustre/chen/hzha/mmdetection/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/mnt/cache/chenzhuo1/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
runner.run(data_loaders, cfg.workflow)
File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
epoch_runner(data_loaders[i], **kwargs)
File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 42, in train_step
**kwargs)
File "/mnt/cache/chen/anaconda3/envs/simipu/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 42, in train_step
and self.reducer._rebuild_buckets()):
RuntimeError: Expected to have finished reduction in the prior iteration before starting a new one. This error indicates that your module has parameters that were not used in producing loss. You
can enable unused parameter detection by passing the keyword argument find_unused_parameters=True to torch.nn.parallel.DistributedDataParallel, and by
making sure all forward function outputs participate in calculating loss.
If you already have done the above, then the distributed data parallel module wasn't able to locate the output tensors in the return value of your module's forward function. Please include the loss function and the structure of the return value of forward of your module when reporting this issue (e.g. list, dict, iterable).
Parameter indices which did not receive grad for rank 0: 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
179 180 181 182 183 184 185 186 187 188 189 ...
In addition, you can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print out information about which particular parameters did not receive gradient on this ran
k as part of this error

@sunnyHelen
Copy link
Author

I tried to pass the keyword argument find_unused_parameters=True to `torch.nn.parallel.DistributedDataParallel. But it doesn't work.

@zhyever
Copy link
Owner

zhyever commented Jun 7, 2022

Set this flag in your config file instead of passing it by the shell.

You can add a line of find_unused_parameters=True in your config file.

@sunnyHelen
Copy link
Author

Yes. It works! Many thanks for your help.

@bhavyagoyal
Copy link

bhavyagoyal commented Jun 7, 2022

Thanks @zhyever. The funetuning on kitti3d detection is resolved now. But there seems to be an error during the evaluation (after 30 epochs). Here is the log for the error.

  File "tools/train.py", line 222, in <module>
    main()
  File "tools/train.py", line 218, in main
    meta=meta)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/apis/train.py", line 34, in train_model
    meta=meta)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdetection/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 54, in train
    self.call_hook('after_train_epoch')
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
    getattr(hook, fn_name)(self)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 279, in after_train_epoch
    key_score = self.evaluate(runner, results)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdetection/mmdet/core/evaluation/eval_hooks.py", line 177, in evaluate
    results, logger=runner.logger, **self.eval_kwargs)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/datasets/kitti_dataset.py", line 412, in evaluate
    eval_types=eval_types)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 709, in kitti_eval
    eval_types)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 613, in do_eval
    min_overlaps)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 479, in eval_class
    rets = calculate_iou_partly(dt_annos, gt_annos, metric, num_parts)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 382, in calculate_iou_partly
    dt_boxes).astype(np.float64)
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/eval.py", line 116, in bev_box_overlap
    from .rotate_iou import rotate_iou_gpu_eval
  File "/home/bhavya.goyal/Documents/SimIPU/mmdet3d/core/evaluation/kitti_utils/rotate_iou.py", line 292, in <module>
    criterion=-1):
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/decorators.py", line 101, in kernel_jit
    kernel.bind()
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/compiler.py", line 548, in bind
    self._func.get()
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/compiler.py", line 426, in get
    ptx = self.ptx.get()
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/compiler.py", line 397, in get
    **self._extra_options)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 496, in llvm_to_ptx
    ptx = cu.compile(**opts)
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 233, in compile
    self._try_error(err, 'Failed to compile\n')
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 251, in _try_error
    self.driver.check_error(err, "%s\n%s" % (msg, self.get_log()))
  File "/home/bhavya.goyal/miniconda3/envs/simipuenv2/lib/python3.7/site-packages/numba/cuda/cudadrv/nvvm.py", line 141, in check_error
    raise exc
numba.cuda.cudadrv.error.NvvmError: Failed to compile

<unnamed> (66, 23): parse expected comma after load's type
NVVM_ERROR_COMPILATION

@zhyever
Copy link
Owner

zhyever commented Jun 8, 2022

That's something related to the build of mmdet3d (in this repo, SimIPU). Refer to Issue for more information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants