Feature: PoC for the bounding box reflection in bbox_shift_scale_rotate
#1125
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
About this PR
I had implemented a bbox reflection functionality for
bbox_shift_scale_rotate
for my own specific usecase. I think this functionality will be beneficial for other users. So I made this PR.My goal is to integrate the bbox reflection mode like (cv2.BORDER_REFLECT and cv2.BORDER_WRAP) into current transforms like
ShiftScaleRotate
.But it was not an easy task.
So in this PR, I provide a only functional version (
bboxes_shift_scale_rotate_reflect
) as a first step. I think it is sufficient to look into how it works and what challenges exist.And if you think this implementation seems promising to merge, I want to get into the next step.
I'm not sure this PR can be merged, but I hope this implementation and analysis inspire someone.
Demo
This is a demo: An input image(left), a result of
bbox_shift_scale_rotate
(center), and a result ofbboxes_shift_scale_rotate_reflect
(right) proposed in this PR.The full runnable code is here:
About implementation
This implementation is very straightforward.
A summary is here:
This works.
But this is not efficient because this makes many bbox copies that will be removed finally.
I searched for an efficient algorithm for this task, but I could not find it.
Therefore, I decided to continue adopting this method.
To mitigate the disadvantage for the performance, I decided to implement this functionality in the vectorized bboxes.
This is why this PR has some 'bboxes_xyz' functions vectorized versions of existing
bbox_xyz
counterparts.This PR has many changes, but I tried not to change any existing codes to avoid unintentional problems.
About performance
The std output of the above example is here.
The time of 385 µs is not so bad compared to the img transform result of 3.39ms.
But it is slower than the original
shift_scale_rotate
despite vectorization.The benefit of vectorization appears as the number of bboxes becomes large.
To see this, I run the following codes:
The results are here (the parameters are the same as the previous example):
The vectorized version surpassed the original one as the number of bboxes increased. Also, note that the number of bboxes processed by the
bboxes_shift_scale_rotate_reflect
is about some times larger thanthe
bbox_shift_scale_rotate
does becausebbox_shift_scale_rotate
does not process any of reflected bboxes.Know issues and some notes
I list some known issues and notes that I found through the implementation.
1. Computational cost depends on parameters.
For example, assume to set a very small scale factor, like scale=0.01. In such a case, the number of bbox will be multiplied by 10^4.
This can cause performance issues, so users should be careful about small scale factors.
2. Extra care for tracking the bbox and labels is needed
Albumentation allows adding label information as a different target by usinglabel_fields
.Since bbox reflection changes the number of input bboxes, it makes it difficult to track the relation between bbox and label_fields.I think some extra care about thelabel_fields
is needed to integrate this functionality into albumentation's pipeline.--> The
label_fields
are automatically concatenated to bbox, so we do not need extra work about it.3. Extra care for mismatches between numpy ndarray and tuple bboxes is needed
Albumentation allows adding label information to the bboxes as an additional element, and the label information can be a string type.
So bbox should be split into pure bbox and label so that the
np.array(bboxes)
does not createstr
ndarray.To solve this issue, I introduced
to_ndarray_bboxes(bboxes)
andto_tuple_bboxes(bboxes, labels)
functions.4. This implementation does not care about the difference between
BORDER_REFLECT
andBORDER_REFLECT_101
Since I am not sure that this gives a significant disadvantage for results, this implementation does not care about the difference between
BORDER_REFLECT
andBORDER_REFLECT_101
to avoid extra complications.I may need to do something about it, but I don't have any ideas on how to implement BORDER_REFLECT_101 precisely at the moment.
5. A well-established algorithm is wanted
This implementation is easy to understand but it is better if there is a more efficient and established algorithm.