Workplan

The project will have the following specific objectives:

O1. Interpreting sensorial information using CNNs. This objective aims at finding the best suited CNN topology for scene segmentation. First, several publically available CNNs will be trained, tested and evaluated for the specific scenario of traffic image segmentation. Their limitations and drawbacks will be observed, and their hyper-parameters or/and topology will be finely tuned to achieve best performance in this scenario.

To construct the training set we will merge together images from several publically available dataset (e.g. KITTI, Daimler dataset etc.) and annotated images from the SCABOR [1] framework.

O2. Combining CNNs with model based probabilistic methods. The proposed framework will efficiently combine the paradigms of deep learning and model based probabilistic methods in order to exploit their advantages and overcome their individual weaknesses. As detailed in section C1, CNNs have outperformed other methods in various perception tasks, but they don’t have any natural way of integrating prior knowledge and, for higher level inference, probabilistic methods are more flexible and well suited.
The proposed framework will use CNNs to understand the scene and its “perception” will be used as input by probabilistic inference methods (Kalman filters, Particle filters).

O3. Using the probabilistic estimate to improve the training of the neural network. This objective aims at establishing an efficient information exchange between the CNN and the probabilistic inference module.
More specifically, the convolutional neural network should “learn” from the examples it previously misclassified by updating the weights and biases of the network. We will investigate two approaches: (1) retrain only the last fully connected layer and consider the anterior layers of the network as a fixed feature extractor and (2) retrain the last layer and update the weights of all/several previous layers by back-propagation.
We will investigate the possibility of improving probabilistic methods from high confident CNN outputs.

O4. Sensor fusion with convolutional neural networks (metadata-enhanced CNNs) and probabilistic estimators: ADAS use data from multiple sensors – sensor fusion – to reduce the uncertainty and to optimize the system’s performance. In addition, several meta-data could be used to improve the results: the camera parameters, the driving scenario (city/rural/highway, daytime/nighttime) etc. Fusion strategies can be divided into early fusion and late fusion. Early fusion (fusion in the feature space) combines information extracted from multiple data-streams into a unified representation. On the other hand, late fusion (semantic level fusion) uses multiple classifiers trained with features in different representation domains, and combines their output using simple decision rules or other classifiers.
The proposed framework will integrate both types of fusion: early fusion by adding metadata and sensor information in the segmentation stage performed by the CNN and late fusion, by using sensor information in the tracking stage.
The efficiency of the proposed framework will be assessed by integrating into a demonstrator application for driving assistance; the application will tackle several tasks: lane detection, traffic sign recognition, obstacle detection and classification (vehicles, pedestrians, cyclists).

O5: Dissemination: articles describing the most important technological and theoretical achievements of the project will be written and submitted in journals with high impact factor and visibility (IEEE Transactions on Image Processing, Sensors etc.) and high quality conferences (IEEE Conference on Computer Vision and Pattern Recognition, European Conference on Computer Vision etc.).

[1] – Nedevschi, Sergiu, et al. “A sensor for urban driving assistance systems based on dense stereovision.” 2007 IEEE Intelligent Vehicles Symposium. IEEE, 2007.