Lane Boundary Segmentation#
For our lane-detection pipeline, we want to train a neural network, which takes an image and estimates for each pixel the probability that it belongs to the left lane boundary, the probability that it belongs to the right lane boundary, and the probability that it belongs to neither. This problem is called semantic segmentation.
Prerequisites#
For this section, I assume the following
You know what a neural network is and have trained one yourself before
You know the concept of semantic segmentation
If you do not fulfill prerequisite 1, I recommend to check out one of the following free resources
- CS231n: Convolutional Neural Networks for Visual Recognition#
For this excellent Stanford course, you can find all the learning material online. The course notes are not finished, but the ones that do exist, are really good! Note that you can see the slides for all lectures when you click on detailed syllabus. You probably want to use the version from 2017 because that one includes lecture videos. However, for the exercises, you should use the 2020 version (very similar to 2017), since you can do your programming in Google Colab. Google Colab lets you use GPUs (expensive hardware necessary for deep learning) for free on Google servers. And even if you do not want to use Colab, the 2020 course has better instructions on working locally (including anaconda). For the exercises in which you can choose between tensorflow and pytorch I recommend you to use pytorch. If you are really eager to return to this course as quickly as you can, you can stop CS231n once you have learned about semantic segmentation.
- Practical Deep Learning for Coders using fastai#
If your background is more in coding and less in math/science, then I recommend this course. You find video lectures here, and a book written in jupyter notebooks here (there is also a printed version if you like it). I would recommend to do the exercises using Google Colab. The fastai course is taught using the fastai library which helps you to train pytorch models with very few lines of code. Even if you choose not to look into the fastai course, I would recommend to check out the fastai library, since it makes training models really easy. Maybe start by just reading the computer vision tutorial first).
Regarding prerequisite 2, I recommend this very nice blog post about semantic segmentation by Jeremy Jordan (which is heavily based on CS231n).
Finally, you need to have access to a GPU in order to do the exercise. But owning a GPU is not a prerequisite. You can use Google Colab, which allows you to run your python code on google servers. To get access to a GPU on Colab, you should click on “Runtime”, then “change Runtime type”, and finally select “GPU” as “Hardware accelerator”. For more details on how to work with Colab, see the appendix.
Exercise: Train a neural net for lane boundary segmentation#
The lane segmentation model should take an image of shape (512,1024,3) as an input. Here, 512 is the image height, 1024 is the image width and 3 is for the three color channels red, green, and blue.
We train the model with input images and corresponding labels of shape (512,1024), where label[v,u]
can have the value 0,1, or 2, meaning pixel \((u,v)\) is “no boundary”, “left boundary”, or “right boundary”.
The output of the model shall be a tensor output
of shape (512,1024,3).
The number
output[v,u,0]
gives the probability that the pixel \((u,v)\) is not part of any lane boundary.The number
output[v,u,1]
gives the probability that the pixel \((u,v)\) is part of the left lane boundary.The number
output[v,u,2]
gives the probability that the pixel \((u,v)\) is part of the right lane boundary.
Gathering training data#
We can collect training data using the Carla simulator. I wrote a script collect_data.py
that
creates a vehicle on the Carla map
attaches an rgb camera sensor to the vehicle
moves the vehicle to different positions and
stores an image from the camera sensor
stores world coordinates of the lane boundaries obtained from Carla’s high definition map
stores a transformation matrix \(T_{cw}\) that maps world coordinates to coordinates in the camera reference frame
stores a label image, that is created from the lane boundary coordinates and the transformation matrix as shown in the exercise of the previous section
Note that from the four data items (image, lane boundaries, trafo matrix, label image), only the image and the label image are necessary for training our deep learning model.
All data is collected on the “Town04” Carla map since this is the only map with usable highways (“Town06” has highways which are either perfectly straight or have a 90-degree turn). For simplicity’s sake, we are building a system just for the highway. Hence, only parts of the map with low road curvature are used, which excludes urban roads.
One part of the map was arbitrarily chosen as the “validation zone”. All data that is created in this zone has the string “validation_set” added to its filename.
Now you will want to get some training data onto your machine! I recommend you to just download some training data that I created for you using the collect_data.py
script. But if you really want to, you can also collect data yourself.
Just go ahead and open the starter code in code/exercises/lane_detection/lane_segmentation.ipynb
. This will have a python utility function that downloads the data for you.
First, you need to run the Carla simulator. Regarding the installation of Carla, see the appendix. Then run
cd Algorithms-for-Automated-Driving
conda activate aad
python -m code.solutions.lane_detection.collect_data
Now you need to wait some seconds because the script tells the Carla simulator to load the “Town04” map. A window will open that shows different scenes as well as augmented-reality lane boundaries. Each scene that you see will be saved to your hard drive. Wait a while until you have collected enough data, then click the close button. Finally, open the starter code in code/exercises/lane_detection/lane_segmentation.ipynb
and follow the instructions.
Note
I do not advise you to read the actual code inside collect_data
, since I mainly wrote it for functionality, and not for education. If you are really curious, you can of course read it, but first you should
have finished the exercise of the previous section
learned about Carla by studying the documentation and running some official python example clients
Building a model#
To create and train a model, you can choose any deep learning framework you like.
If you want some guidance, I recommend using fastai. You can use the example for semantic segmentation from the fastai documentation, slightly modify it for the dataset at hand, and it should just work! If you want, you can get some hints:
Ok, no hints for you. If you get stuck, try looking at the “Limited hints”, or the “Detailed hints”.
I would recommend to read the whole tutorial section on semantic segmentation in the fastai docs. I would then copy the code from the tutorial that uses the datablock API. You will need to modify this code a little bit:
You need to modify the
codes
. You can just definecodes = np.array(['back', 'left','right'], dtype=str)
get_items = get_image_files
: This will not work for our dataset since theget_image_files
function loads images from all subfolders (see documentation). We do not want to load images from the label folders! You can create a new function based onget_image_files
by specifying the “folders” argument (see documentation).label_func
needs to be defined so that it works for the given datasetsplitter
. Here you should useFuncSplitter()
to only select those files as validation files which have the stringvalid
inside their name.batch_tfms
: For the beginning just set this toNone
. The example from the documentation will not work since it contains image flips which will exchange left and right. This is problematic, since we do want to distinguish left and right. If you want, you can study the documentation to find how to do image augmentations withour vertical flips. You can also read this part of the documentation, where you can learn how to integrate the albumentations library with fastai.When you create the
unet_learner
, you should ask it to compute some metrics for you:learn = unet_learner(dls, resnet34, metrics=[DiceMulti()])
. The dice metric is pretty useful for this example, and your model should achieve a dice metric of at least0.9
.
Instead of creating a unet_learner you can import MobileV3Small
from the fastseg library. This model is much faster. Once you defined your model, you just create a regular Learner: learn = Learner(dls, model, metrics=[DiceMulti()])
Store your model
You will need your trained model for an upcoming exercise. Hence, please save your trained model to disk. In pytorch you do this via torch.save
. For fastai you can do torch.save(learn.model, './fastai_model.pth')
Optional: Working on kaggle
The traing data I prepared for you can also be found on kaggle. If you like, you can create your model online with a kaggle notebook. They also offer free GPU access. Consider publishing your notebook on kaggle once you are happy with your solution. I would love to see it 😃.