The aim of the current research is to propose a model and a set of methods for multispectral and multimodal perception of complex environments for autonomous driving. It will use a redundant sensorial system consisting of: enhanced dense stereovision sensors composed by pairs of cameras/lenses with multispectral sensitivity in both visible and near-infrared spectrum, monocular far infra-red (thermal) cameras and multilayer LiDARs. The fusion of the multi sensorial information will provide the necessary redundancy and complementary characteristics that will improve the perception system’s reliability and accuracy in comparison with single sensorial approaches. The proposed system will be able to deal with various lighting conditions (day, dusk, down, night) and complex scenarios prone to false detections in single sensorial approaches. Firstly we will propose a set of semi-automatic extrinsic parameters calibration methods for each sensor in order to be able to align the multisensory data. Then the pairwise registration between the 2D/ 3D sensory data, representing a low level sensor fusion, will provide a 3D spatial-temporal and appearance based representation characterized by an increased density, redundancy, information quantity and aggregation power. Based on this representation original enhanced obstacle detection and classification algorithms, starting from state of the art machine learning and deep convolutional neural networks, will be developed in order to obtain accurate scene description, with increased confidence both for static and dynamic entities of the environment. The original methods developed in this project will be implemented as real-time algorithms that will be deployed on a mobile platform for testing and demonstration in real life scenarios. The demonstrators will also have the role of data acquisition, testing, evaluation and improvement.