Project Objectives
This project’s aim is to bring significant improvements to the field of stereovision-based perception of complex dynamic environments. The general objective of this project is an original system for multi-scale multi-modal perception and representation of structured and unstructured environments based on the fusion of dense stereo, dense optical flow and ego motion information. The effort will follow two main directions, one towards achieving significant contributions in extraction of low level features (O1), and the other towards using these high quality features for a higher level perception of the environment (O2). An additional objective O3 will follow the need for creating visibility and impact of our results through high quality dissemination.
O1. New methods for extraction of high density and high accuracy low level dynamic 3D features from stereo sequences – this objective aims to bring significant contributions in the field of stereovision and optical flow computation from image sequences, and to combine the results towards extracting accurate 3D features and their 3D speed vectors.
- SO.1.1. Real-time high accuracy dense stereovision. The objective is to increase the reconstruction density, improve the reconstruction accuracy and at the same time to facilitate real-time execution capability. The large interest for stereo reconstruction is confirmed by the Middlebury online evaluation of stereo algorithms set up by Scharstein and Szeliski [1] which currently ranks the disparity estimation accuracy of 107 different stereo reconstruction algorithms. More than half of the submissions are the results of research performed in the last two years. Research will focus on improving the mathematical model of SGM in order to reach these goals. Also, novel mathematical models for disparity refinement will be created. For improving sub-pixel accuracy [4, 5], we will devise sub-pixel estimation methods tailored for each stereo reconstruction algorithm by mathematically modeling the distribution of matching costs. Additional improvements to the accuracy of the computed disparity maps will be brought by formulating a novel scene geometry modeling paradigm based on segmentation and surface fitting [3, 6-8].
- SO.1.2. Real-time extraction of highly accurate optical flow. This project aims to improve the accuracy, density, reliability and speed of optical flow computation, in order to better discriminate, measure and track independently moving objects in the environment. Optical flow estimation is an actual problem: in the Middlebury benchmark [1] 90% of optical flow methods were developed in the last three years. Even if the progress is visible, there are still present problems that make the current state of the art methods not enough accurate, reliable and fast. The proposal addresses exactly these problems. Optical flow estimation is, in fact, a pure mathematical problem. The quality of the estimation depends highly on the base mathematical model. The L1 norm is a better choice for discrete signals [12]. Traditional CLGs use L2 norm and therefore are not accurate [11],[16]. The proposed model is a CLG method using the robust L1 norm. A very difficult mathematical problem results due to the non-smoothness property of L1. The objectives include also a complete mathematical solution for this class of problems, with the associated parallel numerical scheme that aims real-time performance.
- SO.1.3. Stereovision-based high accuracy ego motion estimation and extraction of accurate dense 3D motion vectors. High accuracy ego motion estimation is essential when establishing the dynamic relation between the observer and the environment is required. As a first research direction, the selection of static features is a key factor to accurately estimate the motion. Thus one research goal is to eliminate from the very beginning the point features that are likely to be situated on moving objects. A temporal filtering approach will be investigated in order to track in time the status for each feature. A second goal is to implement a module for handling illumination changes and to recover from errors caused by illumination changes. The third goal is to implement the ego motion algorithms on GPU in order to obtain a performance of more than 100 frames per second.
O2. New methods for high level perception based on high density and high accuracy low level dynamic 3D features – the increased accuracy and density of the dynamic 3D features will be exploited in new solutions for higher level perception of the environment, featuring new models for dynamic world representation, and new probabilistic perception and tracking techniques.
- SO.2.1. Multi-sensor multi-cue temporal information fusion. The low level processing methods, which can be seen as individual sensors, will provide the following cues: 3D position for image points, 3D motion vectors, and ego position and orientation. These cues will be accumulated and fused over time in order to increase the quality and density of the measurement data for the perception and tracking methods.
- SO.2.2. Multi-model and multi-scale environment perception and tracking. The project aims to design a novel and accurate model of the dynamic environment, able to represent the complexity of the 3D structures and their dynamic evolution, and novel tracking techniques for this model based on the processed sensorial data. Our solution will combine the probabilistic mechanisms of dynamic occupancy grids [29] [30] with the power of representation similar to the recently introduced multi level surface maps [33] and of the multi-scale Gaussian maps [34]. Our final aim is a dynamic multi-scale multi-layer grid tracking solution for the complex 3D environment.
- SO.2.3. Proof of concept on demonstrators. The original methods developed in this project will be implemented as real-time algorithms that will be deployed on mobile platforms for testing and demonstration in real life scenarios. The demonstrators will also have the role of data acquisition, for offline testing, evaluation and improvement.
O3. Result dissemination – The intermediate and final results will be described in scientific papers that will be submitted to relevant publications in the related fields. The dissemination will also include presentations to relevant conferences in the field, and some of the results also will be included in teaching materials for doctoral and master of science curricula.
References
[1]D. Scharstein, R. Szeliski, "A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms," IJCV, vol. 47, pp. 7-42, 2002.
[2]M. Z. Brown et al., "Advances in Computational Stereo," IEEE TPAMI, vol. 25, 2003.
[3]H. Hirschmuller, "Stereo Vision in Structured Environments by Consistent Semi-Global Matching," CVPR, Vol.2, pp. 2386-2393, 2006.
[4]S. K. Gehrig, et al, "Improving Stereo Sub-Pixel Accuracy for Long Range Stereo," ICCV 2007.
[5]Haller et al., "Statistical method for sub-pixel interpolation function estimation " in ITSC2010.
[6]M. Bleyer et al., "A Stereo Approach that Handles the Matting Problem via Image Warping," CVPR, pp. 501-508, 2009.
[7]G. Gale et. al, "A region-based randomized voting scheme for stereo matching," 6th Conf. on Advances in visual computing, pp. 182 - 191, 2010.
[8]M. Humenberger et al., "A Census-Based Stereo Vision Algorithm Using Modified Semi-Global Matching and Plane-Fitting to Improve Matching Quality," CVPR, 2010.
[9]B.K.P. Horn and B.G. Schunck, “Determining optical flow,” Artificial Intelligence, 1981.
[10]B. D. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision,” IJCAI 1981.
[11]Bruhn et al. “Lucas/Kanade Meets Horn/Schunck: Combining Local and Global Optic Flow Methods,” IJCV 2005.
[12]A. Chambolle, “An algorithm for total variation minimization and applications”, JMIV 2004.
[13]C. Zach et al., “A duality based approach for realtime TV-L1 optical flow,” DAGM 2007.
[14]M. Werlberger et al., “Anisotropic Huber-L1 optical flow,” BMVC 2009.
[15]L. I. Rudin et al., “Nonlinear total variation based noise removal algorithms,” Physica D 1992.
[16]Riadh Fezzani et al., “Clarifying the implementation of warping in the combined local global method for optic flow computation”, EUPISCO-2010.
[17]A. Talukder, L. Matthies, "Real-time detection of moving objects from moving vehicle using dense stereo and optical flow", IROS 2004.
[18]C. Golban, et al, “Linear vs. non linear minimization in stereo visual odometry”, IV 2011.
[19]A. Howard, “Real-time stereo visual odometry for autonomous ground vehicles”, IROS 2008.
[20]C. Golban et al., "Vision based three-dimensional vehicle motion detection by minimizing nonlinear functions", ICCP 2010.
[21]S.A. Rodríguez et al., "An Experiment of a 3D Real-Time Robust Visual Odometry for Intelligent Vehicles", ITSC 2009.
[22]S. Nedevschi et al., "Improving accuracy for Ego vehicle motion estimation using epipolar geometry", ITSC 2009.
[23]C.F. Olson, “Stereo Ego-motion Improvements for Robust Rover Navigation”, ICRA 2001.
[24]Franke, U, Rabe, C, Badino H, Gehrig, S, 6D-Vision: Fusion of Stereo and Motion for Robust Environment Perception, Pattern Recognition, DAGM Symposium 2005, pp. 216-223, 2005.
[25]S. Lacroix, et al, “Autonomous rover navigation on unknown terrains: Functions and integration”, International Journal of Robotics Research, 21(10-11):917-942, 2002.
[26]P. Pfaff, W. Burgard, “An efficient extension of elevation maps for outdoor terrain mapping”, in Proc. of FSR 2005, pp. 165-176.
[27]F. Oniga, S. Nedevschi, “Processing Dense Stereo Data Using Elevation Maps: Road Surface, Traffic Isle, and Obstacle Detection”, IEEE T Veh Technol, vol. 59, no. 3, 2010.
[28]A. Elfes, “Using Occupancy Grids for Mobile Robot Perception and Navigation”, Computer, vol. 22, No. 6, June 1989, pp. 46-57.
[29]C. Coue, et al, “Bayesian Occupancy Filtering for Multitarget Tracking: An Automotive Application”, The International Journal of Robotics Research, 25(1):19, 2006.
[30]C. Chen, et al, “Dynamic environment modeling with gridmap: a multiple-object tracking application”, in proc of ICARCV 2006, pp. 1-6.
[31]H. Badino, et al, “Free Space Computation Using Stochastic Occupancy Grids and Dynamic Programming”, Workshop on Dynamical Vision, ICCV, 2007.
[32]S. Pietzch, et al, "Results of a Precrash Application based on Laser Scanner and Short Range Radars", IEEE T INTELL TRANSP, Vol. 10, No. 4, 2009, pp. 584-593.
[33]R. Triebel, “Multi-level surface maps for outdoor terrain mapping and loop closing”, IROS2006.
[34]M. Yguel, et al, “Error-Driven Refinement of Multi-scale Gaussian Maps Application to 3-D Multi-scale map building, compression and merging”, in proc of ISRR 2009.
[35]R. Danescu, F. Oniga, S. Nedevschi, “Particle Grid Tracking System for Stereovision Based Environment Perception”, in proc of IEEE IV 2010.