Lane Boundary Segmentation

For our lane-detection pipeline, we want to train a neural network, which takes an image and estimates for each pixel the probability that it belongs to the left lane boundary, the probability that it belongs to the right lane boundary, and the probability that it belongs to neither. This problem is called semantic segmentation.

Prerequisites

For this section, I assume the following

  1. You know what a neural network is and have trained one yourself before

  2. You know the concept of semantic segmentation

If you do not fulfill prerequisite 1, I recommend the following free resource

CS231n: Convolutional Neural Networks for Visual Recognition

For this excellent Stanford course, you can find all the learning material online. The course notes are not finished, but you can read the slides when you click on detailed syllabus. You probably want to use the version from 2017 because that one includes lecture videos. However, for the exercises, you should use the 2020 version (very similar to 2017), since you can do your programming in Google Colab. Google Colab lets you use GPUs (expensive hardware necessary for deep learning) for free on Google servers. And even if you do not want to use Colab, the 2020 course has better instructions on working locally (including anaconda). For the exercises in which you can choose between tensorflow and pytorch I recommend you to use pytorch. If you are really eager to return to this course as quickly as you can, you can stop CS231n once you have learned about semantic segmentation.

Even if you fulfill prerequisite 2, please read this very nice blog post about semantic segmentation by Jeremy Jordan (which is heavily based on CS231n). Be sure that you understand the section about dice loss.

We will use dice loss for two reasons

  • Dice loss gives good results even if there is class imbalance: The classes in our problem are “none”, “left boundary”, and “right boundary”. Since lane boundaries are pretty thin, most of the pixels in our data set will be labeled “none”. This means our data set does inhibit class imbalance. A loss function like cross-entropy will not work that well, because the model can get a very low loss just by guessing that each pixel is “none”. This is not possible when using dice loss.

  • The dice loss is not only a good loss function, but we can also use it as a metric since its value is very intuitive.

Finally, you need to have access to a GPU in order to do the exercise. But owning a GPU is not a prerequisite. You can use Google Colab, which allows you to run your python code on google servers. To get access to a GPU on Colab, you should click on “Runtime”, then “change Runtime type”, and finally select “GPU” as “Hardware accelerator”. For more details on how to work with Colab, see the appendix.

Exercise: Train a neural net for lane boundary segmentation

The lane segmentation model should take an image of shape (512,1024,3) as an input. Here, 512 is the image height, 1024 is the image width and 3 is for the three color channels red, green, and blue. We train the model with input images and corresponding labels of shape (512,1024), where label[v,u] can have the value 0,1, or 2, meaning pixel \((u,v)\) is “no boundary”, “left boundary”, or “right boundary”.

The output of the model shall be a tensor output of shape (512,1024,3).

  • The number output[v,u,0] gives the probability that the pixel \((u,v)\) is not part of any lane boundary.

  • The number output[v,u,1] gives the probability that the pixel \((u,v)\) is part of the left lane boundary.

  • The number output[v,u,2] gives the probability that the pixel \((u,v)\) is part of the right lane boundary.

Gathering training data

We can collect training data using the Carla simulator. I wrote a script collect_data.py that

  • creates a vehicle on the Carla map

  • attaches an rgb camera sensor to the vehicle

  • moves the vehicle to different positions and

    1. stores an image from the camera sensor

    2. stores world coordinates of the lane boundaries obtained from Carla’s high definition map

    3. stores a transformation matrix \(T_{cw}\) that maps world coordinates to coordinates in the camera reference frame

    4. stores a label image, that is created from the lane boundary coordinates and the transformation matrix as shown in the exercise of the previous section

Note that from the four data items (image, lane boundaries, trafo matrix, label image), only the image and the label image are necessary for training our deep learning model.

All data is collected on the “Town04” Carla map since this is the only map with usable highways (“Town06” has highways which are either perfectly straight or have a 90-degree turn). For simplicity’s sake, we are building a system just for the highway. Hence, only parts of the map with low road curvature are used, which excludes urban roads.

One part of the map was arbitrarily chosen as the “validation zone”. All data that is created in this zone has the string “validation_set” added to its filename.

Now you will want to get some training data onto your machine! I recommend you to just download some training data that I created for you using the collect_data.py script. But if you really want to, you can also collect data yourself.

Just go ahead and open the starter code in code/exercises/lane_detection/lane_segmentation.ipynb. This will have a python utility function that downloads the data for you.

First, you need to run the Carla simulator. Regarding the installation of Carla, see the appendix. Then run

cd Algorithms-for-Automated-Driving
conda activate aad 
python -m code.solutions.lane_detection.collect_data

Now you need to wait some seconds because the script tells the Carla simulator to load the “Town04” map. A window will open that shows different scenes as well as augmented-reality lane boundaries. Each scene that you see will be saved to your hard drive. Wait a while until you have collected enough data, then click the close button. Finally, open the starter code in code/exercises/lane_detection/lane_segmentation.ipynb and follow the instructions.

Note

I do not advise you to read the actual code inside collect_data, since I mainly wrote it for functionality, and not for education. If you are really curious, you can of course read it, but first you should

  • have finished the exercise of the previous section

  • learned about Carla by studying the documentation and running some official python example clients

Building a model

To create and train a model, you can choose any deep learning framework you like. Regarding model performance:

Expected performance

You should achieve a dice loss of \(0.2\) or less on the validation data set!

If you want some guidance, I recommend using segmentation models pytorch (smp). You can modify the example on the smp github repo to work for lane segmentation. You should start with the code in code/exercises/lane_detection/lane_segmentation.ipynb (and leave it in its directory to make sure the utility imports keep working). Then copy what you need from the smp example notebook cell by cell. For each cell, read what it does and think whether it needs modifications.

This exercise is probably the hardest in this book. If you want, you can get some hints.

Ok, no hints for you. If you get stuck, try looking at the “Limited hints”, or the “Detailed hints”.

  • The smp example notebook is for binary segmentation, but we have 3 classes (background, left_boundary, right_boundary). Hence, some modifications will be necessary.

  • I would disable the “horizontal flip” image augmentation, because it exchanges left and right; something we want to distinguish!

  • I recommend you to use dice loss for training. You cannot use the library function smp.utils.losses.DiceLoss() directly, since it is for binary segmentation. However, you can view our multiclass segmentation problem as two binary segmentation problems. You can compute the dice loss for each using smp.utils.losses.DiceLoss() and then take the average. Write your own MultiClassDiceLoss python class based on this idea. Note that you should use smp.utils.base.Loss as a base class.

  • For this data set you can get very good results in around 5 epochs. So you do not need 40 like in the example.

  • I recommend the “FPN” architecure with the “efficientnet-b0” encoder.”

The smp example notebook is for binary segmentation, but we have 3 classes (background, left_boundary, right_boundary). Hence, some modifications will be necessary. Here are detailed instructions for modifications specific to each section in the smp example notebook

  • DataLoader: Ine the Dataset class, you need to change

    • CLASSES: We only have the three classes “background”, “left_boundary”, and “right_boundary”.

    • __init__ function: In the smp example notebook the input images and label images have the same name. In our case, if the input image is called something.png, the label is called something_label.png.

    • __getitem__: You can completely skip the two lines after “extract certain classes from mask (e.g. cars)”. This will be useful for the dice loss implementation later on.

    • visualize: You can just pass mask instead of mask.squeeze(-1) to the visualize function.

  • Augmentations: Disable the “horizontal flip” image augmentation, because it exchanges left and right; something we want to distinguish! The function get_validation_augmentation() can return None, since our image shape is already divisible by 32.

  • Create model and train: I would recommend to change

    • ENCODER: ‘efficientnet-b0’ performs well and is quite fast. But of course you can try others (see smp README)

    • CLASSES: Just use Dataset.CLASSES

    • ACTIVATION: Choose ‘softmax2d’, since we are doing multiclass segmentation.

    • loss: I recommend you to use dice loss for training. You cannot use the library function smp.utils.losses.DiceLoss() directly, since it is for binary segmentation. It expects a prediction tensor of shape (batch_size, W, H) and a ground-truth tensor of shape (batch_size, W, H) (ground-truth tensor is another term for label tensor). However, our multiclass prediction has shape (batch_size, 3, W, H) (where 3=number of classes), and our ground-truth tensor from the DataLoader is of shape (batch_size, W, H). Write your own MultiClassDiceLoss python class which uses smp.utils.base.Loss as a base class. The MultiClassDiceLoss should have two members: BinaryDiceLossLeft and BinaryDiceLossRight, which are of type smp.utils.losses.DiceLoss. Implement a function forward(self, y_pr, y_gt) for MultiClassDiceLoss, which computes a loss_left and a loss_right and returns 0.5*(loss_left+loss_right). You compute loss_left by passing the correct data into self.BinaryDiceLossLeft.forward().

    • metrics: You can set metrics=[], since our loss function already is a good metric here.

    • epochs: For this data set you can get very good results in around 5 epochs. So you do not need 40 like in the example.

Store your model

You will need your trained model for an upcoming exercise. Hence, please save your trained model to disk. In pytorch you do this via torch.save as shown in the smp example notebook.

Optional: Working on kaggle

The traing data I prepared for you can also be found on kaggle. If you like, you can create your model online with a kaggle notebook. They also offer free GPU access. Consider publishing your notebook on kaggle once you are happy with your solution. I would love to see it 😃.