Saturday, August 27, 2011

19

STEREO VISION: WHY CAMERA CALIBRATION IS NEEDED?

As you might well know, if you have 2 different views from the same scene then you can estimate the 3D coordinates of any point in the scene by finding the position of that point in the left image and in the right image and then apply some trigonometry.

Let's assume that we have no previous information about the relation between the cameras. We can find a point of interest in the left image, but we don't know where that point of interest will appear in the right image. So, what do we do? We have no other option than scan the whole right image looking for our point of interest. 


Google ads, probably not very well related to the audience of this blog...

Now I can hear you say: "But that would be soooo slooooow!!". Yep, you are absolutely right, that brute force approach is really slow. But, if we know the relation between both cameras then we can calculate something called epipolar lines

What is so special about these lines? Well, the magic about this lines is that a point in the left image will always have its correspondent point on the right image laying on its correspondent epipolar line! So, now instead of having to scan the whole right image to find our matching point, we only have to look over a single line :)

But wait!! There is more!! If the cameras are completely parallel then something very special happens... the epipolar lines become parallel. This means that the match of a point in the left image will appear in the exact same line on the right image! Isn't that awesome?

That is one of the main reasons to wanting to calibrate your stereo camera. Another good reason is that the lenses of the cameras introduces some distortion. That distortion makes straight lines in the real world appear curved in the image... and you don't like that, do you?

Let me show you a couple of image just to clarify.


The image above shows a couple of images taken with an uncalibrated stereo camera. The cameras are more or less parallel, but they are not perfectly aligned. Do you see the red point? See how it is not in the same line on the left and right images?

Now, almost all the methods used to calculate a dense disparity map rely on calibrated images, so if we try to use this images to calculate the dense disparity map we will get really poor results. You can check it on the next picture:


But now, if we apply the magic of calibration:


The image above has been rectified and undistorted (notice the black borders around the image, they are the result of removing the distortion and aligning the images so the epipolar lines are parallel and appear on the same row in bot images). See the green point? Do you see how it appears in the same row on both images?

Now, if we use this to calculate the dense disparity map:


There it is, much better results!!

To sum up, if you want to get the best out of stereo vision:

  1. Make sure that your cameras are as parallel as possible.
  2. Calibrate the stereo camera. See this post for instructions: http://blog.martinperis.com/2011/01/opencv-stereo-camera-calibration.html
  3. Tune the parameters of your stereo matching algorithm. See this post to get an example: http://blog.martinperis.com/2011/08/opencv-stereo-matching.html
  4. Have fun with it!
So you know, questions are welcome and any comments will be appreciated ;)

19 comments:

Mike said...

Hi,

I'm Mike. Nice work! :)

I saw that some people have coloured depth maps and not just shades of grey/black/white. If I want to have coloured depth maps how can I achieve this?

Thanks,
Mike.

Martin Peris said...

Hi Mike!

Thanks for your coment, you can see the answer to your question here: http://blog.martinperis.com/2011/09/opencv-pseudocolor-and-chroma-depth.html

By the way, thanks for the idea for a new post ;)

Best regards,
Martin

Sathya said...

Hi Martin,
Your article was so informative and helped me to understand a lot. My rectified image is cropped too much. Is that because of improper lens alignment? My disparity map has too mush of noise. What may be reason for that.

SathyaKumar P

Martin Peris said...

Hi Sathya!

I am glad that you found this useful :)
Regarding your question, if your rectified image is cropped too much it could mean that the lens is introducing a lot of distortion (cheap lens mounted on cheap camera), but the most likely reason could be that your stereo rig is not perfectly parallel. If your cameras are not completely parallel then the rectification process will produce images with very large black borders.

I hope this could help.

Best regards,
Martin

2bORnot2b said...

Hi Martin,
I am an avid reader of your blog, it seems very practical :D and that's why I like the way you discuss a topic.

I am having problem with one of the assignments I have got. Could you please tell me how I can calibrate my images for stereo case if I use an asymmetric camera pair, eg a Sony Nex camera + a cell phone camera in stead of two identical cameras. The cameras differ in resolution, FOV and color. Could you please give me some hints or some web examples.

Thanks in advance,
Kamrul

Martin Peris said...

Hi Kamrul,

Thanks a lot for your comment :)
In the case of an asymmetric stereo camera that you describe, I have never tried it but you should still be able to use the regular calibration method described here as long as the calibration pattern (the chessboard) completely appears inside the field of view of both cameras.

Maybe you should re-scale the images so they have the same size then algorithm should automatically calculate the rectification for each camera. In the case of a Sony Nex camera + cell phone camera, you can expect that the image from the Sony Nex camera to be heavily cropped (as it should have a broader field of view than the cellphone camera) or the image from the cell phone heavily expanded.

The best way to find it out is just going ahead and trying it :) Good luck!

Best regards,
Martin.

anju said...

Hai Martin,
I'm doing a project on stereovision. Can you please give the specification of cams that you used or suggest which cam I should use.
Which according to you provides a better toolbox for this project Matlab or OpenCV?

Thanking you
Anju

Martin Peris said...

Hi Anju!

In this experiments I used a Videre Design stereo camera. Usually they are quite expensive, but you can get a home-made stereo camera as cheap as using two webcams

Regarding whether to use Matlab or OpenCV... I have no experience with Matlab so I don't know if it provides a good toolbox. But if you are familiar with C/C++, I would recommend you to use OpenCV as it provides a lot of stereo-vision-related functionality.

Best regards,
Martin.

anju said...

Hai
Thank you so much for your response.

I’m new to OpenCV. I have tried camera calibration by simply copying the program. But i don’t actually know how it works. On what basis the chessboard corners are identified?

object_points is the physical position of the corners (in 3D space). image_points is the location of the corners on in the image (in 2 dimensions).” HOW TO MEASURE?

Martin Peris said...

Hi Anju!

object_points, as you say, is the physical position of the corners in 3D space, but those coordinates are relative to the origin of coordinates of the calibration object (the chessboard in this case). In the case of a chessboard, the Z coordinate will be always 0 (because it is a flat surface).

So object_points can be calculated knowing the size of the squares in the chessboard. For example, if the size is 10mm then the coordinates of the corners are:
(0,0,0),(10,0,0),(20,0,0)...
(10,0,0),(10,10,0),....
....

I would recommend you to read Chapter 12 of the book "Learning OpenCV" for a deeper understanding of what is under the hood.

Best regards,
Martin.

anju said...

Hai

In camera calibration estimation of intrinsic and extrinsic parameters are done using the projection matrix obtained from the image and object points. At whatever distance you keep the chessboard pattern, object points/co-ordinates are the same. But the image points will vary. In that case for different object positions projection matrix found will be different and intrinsic and extrinsic parameters will also vary. Then how will we finally find the intrinsic and extrinsic parameters based on all those images?

srujan said...

Hi Kamrul, I have tried with one web cam and one osb cam the results are quite weird, and the rectified image cannot be understood if they are not aligned parallel. But after cropping images Iam able to get results which are convincing. If both of the cameras are not at same height you ant even understand what you get because f cropping.
Thanks a lot for Martin for helping.I have got good results at last after 7 continuos days of work.

Best regards
Srujan

Anonymous said...

hi,

good explanation with example.

can you tell me if calibration is required for non-stereo [monocular?] camera ?

how about video frames from single camera? guess the video stabilization s/w runs something more complex - doesn't it?

thanks,
gary

Sai Swaraj Bangari said...

Hi Martin,

I'm Swaraj, Great work!

I was thinking of a stereo setup with the cameras having autofocus technology in it. How could we do well if that's the case?. And how bad it is if the same code provided by you is processed on them?

Thanks,
Swaraj

Anonymous said...

Hi Martin,
I'm Ahmad, Your work is really inspiring! Actually i took up STEREO VISION for my Semester project. I am doing it in MATLAB. But i'm stuck with rectification. I used MATLAB CAMERA CALIBRATION toolkit (Using Checkerboard) I've all my camera parameters. but i don't know to how to use them for rectification (Calculation of new PPM) can you please help me out in this regard??
Thanks
Ahmad

Anonymous said...

Anju, I am Gopakumar, and will try to answer your last question.
Each image has a different underlying extrinsic parameter matrix, and same intrinsic parameter matrix. So using all those images you solve for camera intrinsic parameters. and if you want relating the underlying homography of each individual image and computed intrinsic parametr matrix you find extrinsic parameter matrix.

Suppose you have 3 differently oriented/placed images of same checker board planar pattern with the size of one check box is known.
If they were imaged using the same camera, the underlying intrinsic camera parameter matrix (containing ratio of relative focal length to pixel width and pixel height, image center pixel coordinates) is going to be the same for all images but the extrinsic parameter matrices (which reflects how much translation and orientation is required to take the coordinate axes of each individual world coordinate system of 3 objects, i.e, the planar pattern, (Not that this we have set to different coordinate system by setting the the origin (0, 0, 0) and assuming z = 0 for all the points on the planar pattern) are going to be different. If we could find at least four point correspondences in a single image of the checker board, we can form 8 equations (one from x value and one from y value in each correspondence) and can find the underlying homography matrix (8 degrees of freedom). To find correspondences you can go for some hough transform based corner detection and see at what (x,y) pixel positions they are, to get the image point. As you know the size of each checker box and since you have assumed Z = 0, you have your world coordinate (X, Y, Z)s. From one planar homography you can find two equations using the property of orthogonality of rotation vectors (in Extrinsic Parameter matrix). As we have 3 different homography we could form six equations and can find intrinsic parameter matrix. (in fact we could find inverse(A.A'), from where we find (AA') and since AA' is symmetric we find A). Then use this intrinsic parametr matrix to find the extrinsic parameter matrix of each of the images that we have taken from each individual homography. If you have more than 3 differently oriented images or/and If you could find more than 4 corresponding points in each case, you could use least square technique (moore - penrose inverse) to get a good initial approximation to the parameter matrices, and then use some good optimisation algorithm (like levenberg-marquardt) to find more accurate result.

之涵pros said...

Hi Martin:
I'm a windows user . After installing GTK and compiling the your main.c in your stereoBMTunner prog. It executed succeed , but when I change the position of the slidder , the prog only change the value but didn't call the computeStereoBM function , even didn't run into on_adjustment1_value_changed() when I set a breakpoint here .
I was wondering is there any special tips should I taken if I run this prog in Windows.
I will be very glad to have your reply.
Thanks.

Anish Sukumar said...

Hi Martin
Iam Anish, i have a question, can i find the 3D coordinate of a point on the object from a 2D image with respect to the camera. so that i can easily navigate robot end effector to that point.

赵常凯 said...

Hi Martin
i am sort of confused about why we need apply calibration on single camera? (e.g. AR is that only for matching the image view and 3D viewport of openGL?)

Post a Comment

Facebook

Twitter

Friends