Homography for tensorflow

Jul 08, 2017

Homography could be used for data augmentation. Here I explain the math to myself and give an implementation in tensorflow

Homography


In the pinhole model, the points in space are projected on the image plane (of the idealized camera sensor) by using a linear transformation if we care to use homogeneous coordinates. We would need the 3x4 camera matrix C, which is obtained by putting the camera intrinsic and extrinsic together. The formula for projection will be

(u~v~w~)=(C11C12C13C14C21C22C23C24C31C32C33C34)(XYZ1) \left( \begin{array}{c} \tilde{u} \\ \tilde{v} \\ \tilde{w} \end{array} \right) = \left( \begin{array}{cccc} C_{11} & C_{12} & C_{13} & C_{14} \\ C_{21} & C_{22} & C_{23} & C_{24} \\ C_{31} & C_{32} & C_{33} & C_{34} \end{array} \right) \left( \begin{array}{c} X \\ Y \\ Z \\ 1 \end{array} \right)

To get the un-homogeneous coordinates u,vu, v on the image plane, we simply rescale everything by w~\tilde{w}

u=u~w~,v=v~w~ u = \frac{\tilde{u}}{\tilde{w}}, v = \frac{\tilde{v}}{\tilde{w}}

We can use this fact to rewrite the projection as

(λu~λv~λw~)=λ(C11C12C13C14C21C22C23C24C31C32C331)(XYZ1) \left( \begin{array}{c} \lambda\tilde{u} \\ \lambda\tilde{v} \\ \lambda\tilde{w} \end{array} \right) = \lambda\left( \begin{array}{cccc} C_{11} & C_{12} & C_{13} & C_{14} \\ C_{21} & C_{22} & C_{23} & C_{24} \\ C_{31} & C_{32} & C_{33} & 1 \end{array} \right) \left( \begin{array}{c} X \\ Y \\ Z \\ 1 \end{array} \right)

Here we chose λ\lambda to have the last parameter of the camera matrix as 1. Notice we will end up again with the same u,vu, v (λ\lambda cancels out) as you can see

u=λu~λw~,v=λv~λw~ u = \frac{\lambda\tilde{u}}{\lambda\tilde{w}}, v = \frac{\lambda\tilde{v}}{\lambda\tilde{w}}

Now consider a plane in the world space, and imagine we aligned the reference system in a way Z=0 for all points on this plane.

plane

For all the point laying on the plane, we can use a simplified version of the projection. The third column of the camera matrix will be going to be irrelevant (multiplied by Z=0 in fact). So we can simplify and write

(u~v~w~)=(C11C12C13C21C22C23C31C321)(XY1) \left( \begin{array}{c} \tilde{u} \\ \tilde{v} \\ \tilde{w} \end{array} \right) = \left( \begin{array}{ccc} C_{11} & C_{12} & C_{13} \\ C_{21} & C_{22} & C_{23} \\ C_{31} & C_{32} & 1 \end{array} \right) \left( \begin{array}{c} X \\ Y \\ 1 \end{array} \right)

The projection for a given plane in the world space to the image plane of the camera is called homography and can be expressed by a 3x3 matrix, with 8 degrees of freedom (the last parameter can be always one, due to scale invariance).

Now imagine a second camera, looking at the points on the same designated plane

two cameras

We could do something like the following: from the image plane of the first camera we could apply an inverse projection to get the points on the world plane, then project them on the second camera image plane by using its own camera matrix. We could, in other words, combine the two transformations and obtain a single homography H which will project co-planar points (in the world space) from one image to the other. Mind that this change of prospective will work only for points on that same plane, as the general case would need a 4x4 matrix M.

The mapping equation for the homography will be

(u~v~w~)=(H11H12H13H21H22H23H31H321)(uv1) \left( \begin{array}{c} \tilde{u} \\ \tilde{v} \\ \tilde{w} \end{array} \right) = \left( \begin{array}{ccc} H_{11} & H_{12} & H_{13} \\ H_{21} & H_{22} & H_{23} \\ H_{31} & H_{32} & 1 \end{array} \right) \left( \begin{array}{c} u' \\ v' \\ 1 \end{array} \right)

Homography calculation

Even if H is the result of combining two camera matrices, you don’t need to know them to calculate H. There are 8 degrees of freedom, so you would need just 8 equations for the 8 unknowns. Those equations could be written by using 4 couples of corresponding points in the given images planes.

How do we obtain those equations? So, we have x1 Hx2\mathbb{x_1} \sim\ H\mathbb{x_2} for points on a plane in world space. Written explicitly:

(x~2y~2z~2)=(H11H12H13H21H22H23H31H321)(x1y11) \left( \begin{array}{c} \tilde{x}_2 \\ \tilde{y}_2 \\ \tilde{z}_2 \end{array} \right) = \left( \begin{array}{ccc} H_{11} & H_{12} & H_{13} \\ H_{21} & H_{22} & H_{23} \\ H_{31} & H_{32} & 1 \end{array} \right) \left( \begin{array}{c} x_1 \\ y_1 \\ 1 \end{array} \right)

Taking into account the fact that non homogeneous coordinates are

x2=x~2z~2,y2=y~2z~2 x_2=\frac{\tilde{x}_2}{\tilde{z}_2}, y_2=\frac{\tilde{y}_2}{\tilde{z}_2}

then we can write

x2=H11x1+H12y1+H13H31x1+H32y1+1 x_2 = \frac{H_{11} x_1 + H_{12}y_1 + H_{13}}{H_{31}x_1 + H_{32}y_1 + 1} y2=H21x1+H22y1+H23H31x1+H32y1+1 y_2 = \frac{H_{21} x_1 + H_{22}y_1 + H_{23}}{H_{31}x_1 + H_{32}y_1 + 1}

Rearranging the above we get x2(H31x1+H32y1+1)=H11x1+H12y1+H13y2(H31x1+H32y1+1)=H21x1+H22y1+H23 \begin{array}{c} x_2(H_{31}x_1 + H_{32}y_1 + 1) = H_{11}x_1 + H_{12}y_1 + H_{13} \\ y_2(H_{31}x_1 + H_{32}y_1 + 1) = H_{21}x_1 + H_{22}y_1 + H_{23} \end{array}

And then x1x2H31+y1x2H32+x2=H11x1+H12y1+H13x1y2H31+y1y2H32+y2=H21x1+H22y1+H23 \begin{array}{c} x_1x_2H_{31} + y_1x_2H_{32} + x_2 = H_{11}x_1 + H_{12}y_1 + H_{13} \\ x_1y_2H_{31} + y_1y_2H_{32} + y_2 = H_{21}x_1 + H_{22}y_1 + H_{23} \end{array}

And then x1x2H31y1x2H32+x1H11+y1H12+H13=x2x1y2H31y1y2H32+x1H21+y1H22+H23=y2 \begin{array}{c} -x_1x_2H_{31} - y_1x_2H_{32} + x_1H_{11} + y_1H_{12} + H_{13} = x_2 \\ -x_1y_2H_{31} - y_1y_2H_{32} + x1H_{21} + y_1H_{22} + H_{23} = y_2 \end{array}

Now, if we define the following three vectors ax=(x1y11000x1x2y1x2),ay=(000x1y11x1y2y1y2),h=(H11H12H13H21H22H23H31H32) \mathbf{a_x} = \left( \begin{array}{c} x_1 \\ y_1 \\ 1 \\ 0 \\ 0 \\ 0 \\ -x_1 x_2 \\ -y_1 x_2 \end{array} \right) , \mathbf{a_y} = \left( \begin{array}{c} 0 \\ 0 \\ 0 \\ x_1 \\ y_1 \\ 1 \\ -x_1 y_2 \\ -y_1 y_2 \end{array} \right) , \mathbf{h} = \left( \begin{array}{c} H_{11} \\ H_{12} \\ H_{13} \\ H_{21} \\ H_{22} \\ H_{23} \\ H_{31} \\ H_{32} \end{array} \right)

Then the above equations become simply axTh=x2ayTh=y2 \begin{array}{c} \mathbf{a_x}^T\mathbf{h} = x_2 \\ \mathbf{a_y}^T\mathbf{h} = y_2 \end{array}

If we stack at least 8 equations, using 4 couples of corresponding points, then we can form the following linear system of equation

(ax1Tay1TaxNTayNT)h=(x2y2xNyN) \left( \begin{array}{c} \mathbf{a_{x_1}}^T \\ \mathbf{a_{y_1}}^T \\ \vdots \\ \mathbf{a_{x_N}}^T \\ \mathbf{a_{y_N}}^T \end{array} \right) \mathbf{h} = \left( \begin{array}{c} x_2 \\ y_2 \\ \vdots \\ x_N \\ y_N \end{array} \right)

Well, we are basically done, as we can use tf.matrix_solve_ls to solve the linear system. Hereafter the code

 1 def ax(p, q):   
 2     return [ p[0], p[1], -1, 0, 0, 0, -p[0] * q[0], -p[1] * q[0] ]
 3     
 4 def ay(p, q):
 5     return [ 0, 0, 0, p[0], p[1], 1, -p[0] * q[1], -p[1] * q[1] ]
 6 
 7 def homography(x1s, x2s):
 8 
 9     p = []
10     
11     # we build matrix A by using only 4 point correspondence. The linear
12     # system is solved with the least square method, so here 
13     # we could even pass more correspondence 
14     p.append(ax(x1s[0], x2s[0]))
15     p.append(ay(x1s[0], x2s[0]))
16 
17     p.append(ax(x1s[1], x2s[1]))
18     p.append(ay(x1s[1], x2s[1]))   
19     
20     p.append(ax(x1s[2], x2s[2]))
21     p.append(ay(x1s[2], x2s[2]))
22    
23     p.append(ax(x1s[3], x2s[3]))
24     p.append(ay(x1s[3], x2s[3]))
25     
26     # A is 8x8
27     A = tf.stack(p, axis=0)  
28     
29     m = [[x2s[0][0], x2s[0][1], x2s[1][0], x2s[1][1], x2s[2][0], x2s[2][1], x2s[3][0], x2s[3][1]]]
30 
31     # P is 8x1
32     P = tf.transpose(tf.stack(m, axis=0))  
33       
34     # here we solve the linear system
35     # we transpose the result for convenience
36     return tf.transpose(tf.matrix_solve_ls(A, P, fast=True))

I created also a jupyter notebook on github as an example.

That’s all folks.