Pixel-Level Semantic Labeling Task
The first Cityscapes task involves predicting a per-pixel semantic labeling of the image without considering higher-level object instance or boundary information.
To assess performance, we rely on the standard Jaccard Index, commonly known as the PASCAL VOC intersection-over-union metric IoU = TP ⁄ (TP+FP+FN) , where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels, respectively, determined over the whole test set. Owing to the two semantic granularities, i.e. classes and categories, we report two separate mean performance scores: IoUcategory and IoUclass. In either case, pixels labeled as void do not contribute to the score.
It is well-known that the global IoU measure is biased toward object instances that cover a large image area. In street scenes with their strong scale variation this can be problematic. Specifically for traffic participants, which are the key classes in our scenario, we aim to evaluate how well the individual instances in the scene are represented in the labeling. To address this, we additionally evaluate the semantic labeling using an instance-level intersection-over-union metric iIoU = iTP ⁄ (iTP+FP+iFN). Again iTP, FP, and iFN denote the numbers of true positive, false positive, and false negative pixels, respectively. However, in contrast to the standard IoU measure, iTP and iFN are computed by weighting the contribution of each pixel by the ratio of the class’ average instance size to the size of the respective ground truth instance. It is important to note here that unlike the instance-level task below, we assume that the methods only yield a standard per-pixel semantic class labeling as output. Therefore, the false positive pixels are not associated with any instance and thus do not require normalization. The final scores, iIoUcategory and iIoUclass, are obtained as the means for the two semantic granularities.
Instance-Level Semantic Labeling Task
In the second Cityscapes task we focus on simultaneously detecting objects and segmenting them. This is an extension to both traditional object detection, since per-instance segments must be provided, and pixel-level semantic labeling, since each instance is treated as a separate label. Therefore, algorithms are required to deliver a set of detections of traffic participants in the scene, each associated with a confidence score and a per-instance segmentation mask.
To assess instance-level performance, we compute the average precision on the region level (AP ) for each class and average it across a range of overlap thresholds to avoid a bias towards a specific value. Specifically, we follow  and use 10 different overlaps ranging from 0.5 to 0.95 in steps of 0.05. The overlap is computed at the region level, making it equivalent to the IoU of a single instance. We penalize multiple predictions of the same ground truth instance as false positives. To obtain a single, easy to compare compound score, we report the mean average precision AP, obtained by also averaging over the class label set. As minor scores, we add AP50% for an overlap value of 50 %, as well as AP100m and AP50m where the evaluation is restricted to objects within 100 m and 50 m distance, respectively.
This challenge relies on the cityscapes dataset. To load train data you may here.
Here is the example of a code submission with a custom model written in tensorflow. It contains three files:
- .hdf5 pre-trained NN
In metadata you do not need to change anything. It needs for ingestion program at server to process your submission.
The model.py script takes pre-trained model, predicts on test data which is given as input in metadata and writes predictions into output.
There are 500 images in test dataset in total. The model locally you may train on gpu, but to submit you need to change it to cpu.
You should submit files in your repository on github. The info how to integrate github with codalab is here.
Also, you likely need to research cityscapes-dataset repository for pre-processing scripts that will help you to work with data.