{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# From Pixels to Meters"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Inverse perspective mapping"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Having detected which pixel coordinates $(u,v)$ are part of a lane boundary, we now want to know which 3 dimensional points $(X_c,Y_c,Z_c)^T$ correspond to these pixel coordinates $(u,v)$. First let us have a look at this sketch of the image formation process again:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{figure} tikz/camera_projection/CameraProjection.svg\n",
"---\n",
"name: camera_projection_again\n",
"width: 67%\n",
"align: center\n",
"---\n",
"Camera projection. Sketch adapted from [stackexchange](https://tex.stackexchange.com/a/323778/56455).\n",
"```\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Remember: Given a 3 dimensional point in the camera reference frame $(X_c,Y_c,Z_c)^T$, we can obtain the pixel coordinates $(u,v)$ via"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \n",
" \\lambda \\begin{pmatrix} u \\\\ v \\\\ 1 \\end{pmatrix} = \\mathbf{K} \\begin{pmatrix} X_c \\\\ Y_c \\\\ Z_c \\end{pmatrix} \n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But what we need to do now is solve the inverse problem. We have $(u,v)$ given, and need to find $(X_c,Y_c,Z_c)^T$. To do that, we multiply the above equation with $\\mathbf{K}^{-1}$:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$ \n",
" \\begin{pmatrix} X_c \\\\ Y_c \\\\ Z_c \\end{pmatrix} = \\lambda \\mathbf{K}^{-1} \\begin{pmatrix} u \\\\ v \\\\ 1 \\end{pmatrix} \n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our problem is that we do not know the value of $\\lambda$. This means that the 3d point $(X_c,Y_c,Z_c)^T$ corresponding to pixel coordinates $(u,v)$ is somewhere on the line defined by "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
" \\mathbf{r}(\\lambda) = \\lambda \\mathbf{K}^{-1} \\begin{pmatrix} u \\\\ v \\\\ 1 \\end{pmatrix}, ~ \\lambda \\in \\mathbb{R}_{>0} \n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But which $\\lambda$ yields the point that was captured in our image? In general, this question cannot be answered. But here, we can exploit our knowledge that $\\mathbf{r}(\\lambda)$ should lie on the road! It corresponds to a point on the lane boundary after all. We will assume that the road is planar. A plane can be characterized by a normal vector $\\mathbf{n}$ and some point lying on the plane $\\mathbf{r}_0$:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
" \\textrm{Point } \\mathbf{r} \\textrm{ lies in the plane} ~ \\Leftrightarrow ~ \\mathbf{n}^T (\\mathbf{r} - \\mathbf{r}_0) = 0\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{figure} images/surface.png\n",
"---\n",
"name: surface-fig\n",
"width: 67%\n",
"align: center\n",
"---\n",
"Equation for a planar surface\n",
"```\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the road reference frame the normal vector is just $\\mathbf{n} = (0,1,0)^T$. Since the optical axis of the camera is not parallel to the road, the normal vector in the camera reference frame is $\\mathbf{n_c} = \\mathbf{R_{cr}} (0,1,0)^T$, where the rotation matrix $\\mathbf{R_{cr}}$ describes how the camera is oriented with respect to the road: It rotates vectors from the road frame into the camera frame. The remaining missing piece is some point $\\mathbf{r}_0$ on the plane. In the camera reference frame, the camera is at position $(0,0,0)^T$. If we denote the height of the camera above the road by $h$, then we can construct a point on the road by moving from $(0,0,0)^T$ in the direction of the road normal vector $\\mathbf{n_c}$ by a distance of $h$: Hence, we pick $\\mathbf{r}_0 = h \\mathbf{n_c}$, and our equation for the plane becomes $0= \\mathbf{n_c}^T (\\mathbf{r} - \\mathbf{r}_0) = \\mathbf{n_c} ^T \\mathbf{r} - h$ or $h=\\mathbf{n_c}^T\\mathbf{r}$."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{figure} images/ipm.png\n",
"---\n",
"name: ipm-fig\n",
"width: 100%\n",
"align: center\n",
"---\n",
"Finding the correct $\\lambda$\n",
"```\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we can compute the point where the line $\\mathbf{r}(\\lambda) = \\lambda \\mathbf{K}^{-1} (u,v,1)^T$ hits the road, by plugging $\\mathbf{r}(\\lambda)$ into the equation of the plane $h=\\mathbf{n_c}^T\\mathbf{r}$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
" h = \\mathbf{n_c}^T \\lambda \\mathbf{K}^{-1} (u,v,1)^T ~ \\Leftrightarrow~ \\lambda = \\frac{h}{ \\mathbf{n_c}^T \\mathbf{K}^{-1} (u,v,1)^T}\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can now plug this value of $\\lambda$ into $\\mathbf{r}(\\lambda)$ to obtain the desired mapping from pixel coordinates $(u,v)$ to 3 dimensional coordinates in the camera reference frame"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"$$\n",
" \\begin{pmatrix} X_c \\\\ Y_c \\\\Z_c \\end{pmatrix} = \\frac{h}{ \\mathbf{n_c}^T \\mathbf{K}^{-1} (u,v,1)^T} \\mathbf{K}^{-1} \\begin{pmatrix} u \\\\ v \\\\ 1 \\end{pmatrix} \n",
"$$ (eq-inverse-perspective-mapping)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This equation is only true if the image shows the road at pixel coordinates $(u,v)$. It may look a bit ugly, but it is actually pretty easy to implement with python."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise: Inverse perspective mapping \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this exercise you will implement Eq. {eq}`eq-inverse-perspective-mapping` as well as the coordinate transformation between the camera reference frame and the road reference frame. For the latter part, you might look back into [](./CameraBasics.ipynb). Note that you should have successfully completed the exercise in [](./CameraBasics.ipynb) before doing this exercise.\n",
"\n",
"To start working on the exercise, open `code/tests/lane_detection/inverse_perspective_mapping.ipynb` and follow the instructions in that notebook."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Fitting the polynomial"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our aim is to obtain polynomials $y_l(x)$ and $y_r(x)$ describing the left and right lane boundaries in the road reference frame (based on ISO 8855)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{figure} tikz/iso8850/iso8850_crop.png\n",
"---\n",
"align: center\n",
"width: 80%\n",
"name: model_iso8850_again\n",
"---\n",
"Our aim is to find $y_l(x)$ and $y_r(x)$\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"remove-cell"
]
},
"outputs": [],
"source": [
"from IPython import display\n",
"display.set_matplotlib_formats('svg')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From [](./Segmentation.ipynb) we know that our semantic segmentation model will take the camera image as input and will return a tensor `output` of shape (H,W,3). In particular `prob_left = output[v,u,1]` will be the probability that the pixel $(u,v)$ is part of the left lane boundary. I saved the tensor `output[v,u,1]` that my neural net computed for some example image in a npy file. Let's have a look"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"prob_left = np.load(\"../../data/prob_left.npy\")\n",
"plt.imshow(prob_left, cmap=\"gray\")\n",
"plt.xlabel(\"$u$\");\n",
"plt.ylabel(\"$v$\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The image above shows `prob_left[v,u]` for each `(u,v)`. Now imagine that instead of triples `(u,v,prob_left[v,u])` we would have triples `(x,y,prob_left(x,y))`, where $(x,y)$ are coordinates on the road like in {numref}`model_iso8850_again`. If we had these triples we could filter them for all `(x,y,prob_left[x,y])` where `prob_left[x,y]` is large. We would obtain a list of points $(x_i,y_i)$ which are part of the left lane boundary and we could use these points to fit our polynomial $y_l(x)$! \n",
"But going from `(u,v,prob_left[v,u])` to `(x,y,prob_left[x,y])` is actually not that hard, since you implemented the function `uv_to_roadXYZ_roadframe_iso8855` in the last exercise. This function converts $(u,v)$ into $(x,y,z)$ (note that $z=0$ for road pixels)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That means we can start and write some code to collect the triples `(x,y,prob_left[x,y])`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import sys\n",
"sys.path.append('../../code')\n",
"from solutions.lane_detection.camera_geometry import CameraGeometry\n",
"cg = CameraGeometry()\n",
"\n",
"xyp = []\n",
"for v in range(cg.image_height):\n",
" for u in range(cg.image_width):\n",
" X,Y,Z= cg.uv_to_roadXYZ_roadframe_iso8855(u,v)\n",
" xyp.append(np.array([X,Y,prob_left[v,u]]))\n",
"xyp = np.array(xyp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```{margin}\n",
"I mention `flatten` here because it is well known. But in our case, it is actually [better](https://stackoverflow.com/questions/28930465/what-is-the-difference-between-flatten-and-ravel-functions-in-numpy) to use [`np.ravel()`](https://numpy.org/doc/stable/reference/generated/numpy.ravel.html).\n",
"```\n",
"\n",
"This double `for` loop is quite slow, but don't worry. The first two columns of the `xyp` array are independent of `prob_left`, and hence can be precomputed. The last column can be computed without a `for` loop: `xyp[:,2]==prob_left.flatten()`. You will work on the precomputation in the exercise.\n",
"\n",
"To restrict ourselves to triples `(x,y,prob_left[x,y])` with large `prob_left[x,y]` we can create a mask. Then, we can insert the masked `x` and `y` values into the [`numpy.polyfit()`](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html) function, to finally obtain our desired polynomial $y(x)=c_0+c_1 x+ c_2 x^2 +c_3 x^3$. The [`numpy.polyfit()`](https://numpy.org/doc/stable/reference/generated/numpy.polyfit.html) performs a least squares fit. But it can even do a [weighted least squares fit](https://en.wikipedia.org/wiki/Weighted_least_squares) if we pass an array of weights. We will just pass the probability values as weights, since `(x,y)` points with high probability should be weighted more. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x_arr, y_arr, p_arr = xyp[:,0], xyp[:,1], xyp[:,2]\n",
"mask = p_arr > 0.3\n",
"coeffs = np.polyfit(x_arr[mask], y_arr[mask], deg=3, w=p_arr[mask])\n",
"polynomial = np.poly1d(coeffs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's plot our polynomial:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.arange(0,60,0.1)\n",
"y = polynomial(x)\n",
"plt.plot(x,y)\n",
"plt.xlabel(\"x (m)\"); plt.ylabel(\"y (m)\"); plt.axis(\"equal\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Looks good!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": []
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Encapsulate the pipeline into a class"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You have seen both steps of our lane detection pipeline now: The lane boundary segmentation, and the polynomial fitting. For future usage, it is convenient to encapsulate the whole pipeline into one class. In the following exercise, you will implement such a `LaneDetector` class. For now, let's have a look at the sample solution for the `LaneDetector` in action.\n",
"First, we load an image"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import cv2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"img_fn = \"images/carla_scene.png\"\n",
"img = cv2.imread(img_fn)\n",
"img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
"plt.imshow(img);"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we import the `LaneDetector` class and create an instance of it. For that we specfiy the path to a model that we have stored with pytorch's `save` function."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"tags": [
"remove-output"
]
},
"outputs": [],
"source": [
"from solutions.lane_detection.lane_detector import LaneDetector\n",
"model_path =\"../../code/solutions/lane_detection/best_model_multi_dice_loss.pth\"\n",
"ld = LaneDetector(model_path=model_path)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"From now on we can get the lane boundary polynomial for any image (that is not too different from the training set) by passing it to the `ld` instance."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"poly_left, poly_right = ld(img)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"On Google Colab this call takes around 45 ms. This is not quite good enough for real time applications, where you would expect 10-30 ms or less, but it is close. The bottleneck of this sample solution is the neural network. Maybe you implemented a more efficient one? If you want to make the system faster, you could also consider feeding lower resolution images into the network - both during training and inference. This would trade off accuracy for speed. If you try it out let me know how well it works ;)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now let's have a look at the polynomials that `ld` has computed"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"x = np.linspace(0,60)\n",
"yl = poly_left(x)\n",
"yr = poly_right(x)\n",
"plt.plot(x, yl, label=\"yl\")\n",
"plt.plot(x, yr, label=\"yr\")\n",
"plt.xlabel(\"x (m)\"); plt.ylabel(\"y (m)\");\n",
"plt.legend(); plt.axis(\"equal\");\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This looks quite reasonable. In the next exercise, you will create a similar plot and compare it to ground truth data from the simulator."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Exercise: Putting everything together"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For the final exercise, you will implement polynomial fitting, and then encapsulate the whole pipeline into the `LaneDetector` class. To start, go to `code/tests/lane_detection/lane_detector.ipynb` and follow the instructions."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {},
"nbformat": 4,
"nbformat_minor": 4
}