Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tag stability, counter loss and flipping #19

Open
qbonnard opened this issue Feb 26, 2014 · 18 comments
Open

Improve tag stability, counter loss and flipping #19

qbonnard opened this issue Feb 26, 2014 · 18 comments

Comments

@qbonnard
Copy link
Member

I haven't investigated yet, but the Z-axis seems to flip on estimate3d-gui, especially when the camera is almost perpendicular to the tag.

This issue is a reminder to investigate more ;)

@qbonnard qbonnard added the bug label Feb 26, 2014
@severin-lemaignan
Copy link
Member

I've noticed it as well. This is definitely a bug. If I'm correct, the Z axis should be on the contrary very stable.

@qbonnard
Copy link
Member Author

I suspect it has to do with the order of the corners... Guilty until proven innocent.

@severin-lemaignan
Copy link
Member

Sounds like a good suspect, indeed.

@ayberkozgur
Copy link
Member

Sorry to give bad news guys, but I think I've figured it out. It's probably the same phenomenon as the famous upside-down optical illusion (or whatever it's called). I think this picture summarizes the situation perfectly: http://www.mindmotivations.com/images/optical-illusion1.jpg The order of the corners is the same in both up and down cases.

I guess the only way to solve this problem with a single tag is to determine the perspective of the tag good enough so that we can distinguish whether the "up" corner is closer to us than the "down" corner. Currently the only way to do this is to determine lines that are between the left-up corners and right-up corners are longer or shorter than the lines that are between the left-down corners and right-down corners. This becomes more and more noise-prone as the tag gets flatter and flatter in the perspective, i.e all lines get shorter, hence the flipping we observe. Please observe that the flipping diminishes and then stops when you get the camera closer and closer to the tag, and doesn't happen at all when the tag is "looking towards" the camera, i.e not too flat.

In my application, I'm planning to solve this issue by using multiple tags that are fixed referenced beforehand among themselves and all referenced to the camera + outlier detection and elimination.

@ayberkozgur
Copy link
Member

There is no bug as we're currently doing nothing wrong. This will be more of an enhancement if we achieve to solve this some other way, so I'm changing the labels.

@severin-lemaignan
Copy link
Member

Hum, interesting.

Considering the following tag:


           A
             ,'._
            /    `._
           /        `._
         ,'            `._
        /                 `.
       /                    `-.  D
      /                        ;-
    ,'                        /
   _                        ,'
 B  `._                    /
       `.                ,'
         `-.           ,'
            `-.       /
               `-.  ,'
                  `` C

What about the computing the cross-product of AB and AD to check if the angle is smaller or larger than the cross-product of CB and CD to check which one is closer to us?

@ayberkozgur
Copy link
Member

I drew some geometric diagrams to convince myself. If my reasoning is right, this is due to slight misdetection of corner locations on the screen (e.g due to pixel resolution/lighting etc.) which would result the same no matter which method of calculation we use. Further, this should result in calculating the following cross product to find the +Z axis: (B_world_coordinate - A_world_coordinate)x(D_world_coordinate - A_world_coordinate). This really amounts to getting the A, B, D world coordinates right, which should be already done by taking that cross product.

But, if there is misdetection only on the ABD triangle and the BDC triangle is clean (e.g due to C being closer or getting more light on it somehow), we can use the (B_world_coordinate - C_world_coordinate)x(D_world_coordinate - C_world_coordinate) cross product instead. And, if there is misdetection in the ABD triangle, A should look closer to us than B and D. In addition, BDC is clean, so C also appears closer to us than B and D. So, it might just be the case that A appears even closer than C, making our ABxAD vs. CBxCD check useless.

I currently see two "solutions":

  • Take all 4 possible cross products and vote/weighted average according to some confidence metric/take one or more and discard others according to some metric
  • Check order of closeness of corners to the camera, if it is such that A > C > B ~ D (the above situation, ~ denotes order doesn't matter) report error or do not report tag at all

If it is the case that the corners are such that C > B ~ D > A where the actual order is A > B ~ D > C, we're pretty much out of luck.

@ayberkozgur
Copy link
Member

Please note that this whole issue is also caused by the actual bending of the paper. On second thought, the major culprit is probably the bending of the paper, which means that the above method (voting 4 corners) could actually work.

@severin-lemaignan
Copy link
Member

After discussion, the current proposal is:

  • single marker: return is only if we are confident about the perspective
  • more than one tag: select the Z direction of each tags such as the camera position is consistent between all tags (consensus on the camera position)

@ayberkozgur
Copy link
Member

Suits me well.

@ayberkozgur
Copy link
Member

FYI, my initial experiments on the Cellulo side suggest that median filtering on a time window is very robust against flipping.

@qbonnard
Copy link
Member Author

Very interesting... That could very well replace the average filtering, and "fix" the z-flipping issue more simply than the proposal above. The advantage is that it works also with a single tag, the disadvantage is that it needs a few frames... which is OK, because the tag flipping means that there are several frames already.

So you just take a median of the last values for each component of the transformation matrix, or is it a bit fancier ? How big is your time window ?

@ayberkozgur
Copy link
Member

It is a bit fancier :) Here is what I do:

Get the translations (3-vector) and rotations (quaternion) in their own windows and calculate their respective medians. Median in more than one dimensions is defined as a "geometric median" (point in space whose sum of L1 distances to the window points is the least). The catch is that geometric median is proven to have neither an explicit formula nor an exact time algorithm, but it is known that it sums up to the convex optimization of a convex function. The way I calculate them is using the Weiszfeld-Ostresh algorithm which is iterative and is basically a case of gradient descent, which might be off-putting in terms of performance. I've had runs that converged in 5 iterations and runs that converged in 20 iterations. It can be tuned by setting the initial point as the mean and playing with the step size.

This works well for 3-vectors but you need to express quaternions as points in a Riemannian manifold for it to work. This is a bit beyond my mathematical knowledge, but it turns out to be again the convex optimization of a convex function. You only need special treatment for them, such as having different distance measures and different maps.

Once you have both medians, stick them into a transform matrix and you're good to go. You can find my implementations in here (there are also references to the papers I got the algorithms from):
https://github.com/ayberkozgur/libgdx/blob/master/gdx/src/com/badlogic/gdx/math/Quaternion.java
https://github.com/ayberkozgur/libgdx/blob/master/gdx/src/com/badlogic/gdx/math/Vector3.java
https://github.com/ayberkozgur/libgdx/blob/master/gdx/src/com/badlogic/gdx/math/Matrix4.java

@ayberkozgur
Copy link
Member

By the way, I used a 10 sample window, but I think it can be lowered a bit more.

And, this can also be applied to the scale (3-vector) of a transform matrix. Scale doesn't make sense in the Chilitags world but I just wanted to put it out there.

@ayberkozgur
Copy link
Member

Time for this issue to rise from the dead:

During the NCCR meetings, I had the chance to talk to a guy from ETHZ's Agile and Dexterous Robotics Lab who implemented a similar tag-based application (based on another tag library). They are using a Kalman filter on the tag pose + IMU data when available in order to counter flipping issues as well as the loss of the tag due to blurry camera image. He said that they are getting very good results from this. I think @severin-lemaignan also mentioned trying a Kalman filter at some point. He said the code was open source too. We should definitely look at this at some point, it will be very cheap to calculate.

The exact same thing goes for chilitrack: chili-epfl/chilitrack#4

Also changing the name to reflect the issue better.

@ayberkozgur ayberkozgur changed the title The Z-Axis seems to flip on estimate3g-gui Improve tag stability, loss and counter flipping Oct 27, 2014
@ayberkozgur ayberkozgur changed the title Improve tag stability, loss and counter flipping Improve tag stability, counter loss and flipping Oct 27, 2014
@valtron
Copy link

valtron commented May 20, 2018

Related issue in OpenCV: opencv/opencv#8813

And here's a workaround.

@qbonnard
Copy link
Member Author

Thanks for the tip :)

@andraspalffy
Copy link

Any tips how to fix this after my homogeneous transform was created? I have a strong constraint that my pattern faces into a certain direction, so detection the "flipping" is not a problem.

I would like to transfer this:
image

to this:
image

My biggest problem is that translation is also affected by the flipping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants