Lane Boundary Segmentation

For our lane-detection pipeline, we want to train a neural network, which takes an image and estimates for each pixel the probability that it belongs to the left lane boundary, the probability that it belongs to the right lane boundary, and the probability that it belongs to neither. This problem is called semantic segmentation.

Prerequisites

For this section, I assume the following

  1. You know what a neural network is and have trained one yourself before

  2. You know the concept of semantic segmentation

If you do not fulfill prerequisite 1, I recommend to check out one of the following free resources

CS231n: Convolutional Neural Networks for Visual Recognition

For this excellent Stanford course, you can find all the learning material online. The course notes are not finished, but the ones that do exist, are really good! Note that you can see the slides for all lectures when you click on detailed syllabus. You probably want to use the version from 2017 because that one includes lecture videos. However, for the exercises, you should use the 2020 version (very similar to 2017), since you can do your programming in Google Colab. Google Colab lets you use GPUs (expensive hardware necessary for deep learning) for free on Google servers. And even if you do not want to use Colab, the 2020 course has better instructions on working locally (including anaconda). For the exercises in which you can choose between tensorflow and pytorch I recommend you to use pytorch. If you are really eager to return to this course as quickly as you can, you can stop CS231n once you have learned about semantic segmentation.

Practical Deep Learning for Coders using fastai

If your background is more in coding and less in math/science, then I recommend this course. You find video lectures here, and a book written in jupyter notebooks here (there is also a printed version if you like it). I would recommend to do the exercises using Google Colab. The fastai course is taught using the fastai library which helps you to train pytorch models with very few lines of code. Even if you choose not to look into the fastai course, I would recommend to check out the fastai library, since it makes training models really easy. Maybe start by just reading the computer vision tutorial first).

Regarding prerequisite 2, I recommend this very nice blog post about semantic segmentation by Jeremy Jordan (which is heavily based on CS231n).

Finally, you need to have access to a GPU in order to do the exercise. But owning a GPU is not a prerequisite. You can use Google Colab, which allows you to run your python code on google servers. To get access to a GPU on Colab, you should click on “Runtime”, then “change Runtime type”, and finally select “GPU” as “Hardware accelerator”. For more details on how to work with Colab, see the appendix.

Exercise: Train a neural net for lane boundary segmentation

The lane segmentation model should take an image of shape (512,1024,3) as an input. Here, 512 is the image height, 1024 is the image width and 3 is for the three color channels red, green, and blue. We train the model with input images and corresponding labels of shape (512,1024), where label[v,u] can have the value 0,1, or 2, meaning pixel \((u,v)\) is “no boundary”, “left boundary”, or “right boundary”.

The output of the model shall be a tensor output of shape (512,1024,3).

  • The number output[v,u,0] gives the probability that the pixel \((u,v)\) is not part of any lane boundary.

  • The number output[v,u,1] gives the probability that the pixel \((u,v)\) is part of the left lane boundary.

  • The number output[v,u,2] gives the probability that the pixel \((u,v)\) is part of the right lane boundary.

Gathering training data

We can collect training data using the Carla simulator. I wrote a script collect_data.py that

  • creates a vehicle on the Carla map

  • attaches an rgb camera sensor to the vehicle

  • moves the vehicle to different positions and

    1. stores an image from the camera sensor

    2. stores world coordinates of the lane boundaries obtained from Carla’s high definition map

    3. stores a transformation matrix \(T_{cw}\) that maps world coordinates to coordinates in the camera reference frame

    4. stores a label image, that is created from the lane boundary coordinates and the transformation matrix as shown in the exercise of the previous section

Note that from the four data items (image, lane boundaries, trafo matrix, label image), only the image and the label image are necessary for training our deep learning model.

All data is collected on the “Town04” Carla map since this is the only map with usable highways (“Town06” has highways which are either perfectly straight or have a 90-degree turn). For simplicity’s sake, we are building a system just for the highway. Hence, only parts of the map with low road curvature are used, which excludes urban roads.

One part of the map was arbitrarily chosen as the “validation zone”. All data that is created in this zone has the string “validation_set” added to its filename.

Now you will want to get some training data onto your machine! I recommend you to just download some training data that I created for you using the collect_data.py script. But if you really want to, you can also collect data yourself.

Just go ahead and open the starter code in code/exercises/lane_detection/lane_segmentation.ipynb. This will have a python utility function that downloads the data for you.

First, you need to run the Carla simulator. Regarding the installation of Carla, see the appendix. Then run

cd Algorithms-for-Automated-Driving
conda activate aad 
python -m code.solutions.lane_detection.collect_data

Now you need to wait some seconds because the script tells the Carla simulator to load the “Town04” map. A window will open that shows different scenes as well as augmented-reality lane boundaries. Each scene that you see will be saved to your hard drive. Wait a while until you have collected enough data, then click the close button. Finally, open the starter code in code/exercises/lane_detection/lane_segmentation.ipynb and follow the instructions.

Note

I do not advise you to read the actual code inside collect_data, since I mainly wrote it for functionality, and not for education. If you are really curious, you can of course read it, but first you should

  • have finished the exercise of the previous section

  • learned about Carla by studying the documentation and running some official python example clients

Building a model

To create and train a model, you can choose any deep learning framework you like.

If you want some guidance, I recommend using fastai. You can use the example for semantic segmentation from the fastai documentation, slightly modify it for the dataset at hand, and it should just work! If you want, you can get some hints:

Ok, no hints for you. If you get stuck, try looking at the “Limited hints”, or the “Detailed hints”.

I would recommend to read the whole tutorial section on semantic segmentation in the fastai docs. I would then copy the code from the tutorial that uses the datablock API. You will need to modify this code a little bit:

  • You need to modify the codes. You can just define codes = np.array(['back', 'left','right'], dtype=str)

  • get_items = get_image_files: This will not work for our dataset since the get_image_files function loads images from all subfolders (see documentation). We do not want to load images from the label folders! You can create a new function based on get_image_files by specifying the “folders” argument (see documentation).

  • label_func needs to be defined so that it works for the given dataset

  • splitter. Here you should use FuncSplitter() to only select those files as validation files which have the string valid inside their name.

  • batch_tfms: For the beginning just set this to None. The example from the documentation will not work since it contains image flips which will exchange left and right. This is problematic, since we do want to distinguish left and right. If you want, you can study the documentation to find how to do image augmentations withour vertical flips. You can also read this part of the documentation, where you can learn how to integrate the albumentations library with fastai.

  • When you create the unet_learner, you should ask it to compute some metrics for you: learn = unet_learner(dls, resnet34, metrics=[DiceMulti()]). The dice metric is pretty useful for this example, and your model should achieve a dice metric of at least 0.9.

Instead of creating a unet_learner you can import MobileV3Small from the fastseg library. This model is much faster. Once you defined your model, you just create a regular Learner: learn = Learner(dls, model, metrics=[DiceMulti()])

Store your model

You will need your trained model for an upcoming exercise. Hence, please save your trained model to disk. In pytorch you do this via torch.save. For fastai you can do torch.save(learn.model, './fastai_model.pth')

Optional: Working on kaggle

The traing data I prepared for you can also be found on kaggle. If you like, you can create your model online with a kaggle notebook. They also offer free GPU access. Consider publishing your notebook on kaggle once you are happy with your solution. I would love to see it 😃.