Project Description and Objectives
Visual understanding of human behavior is a field of active research in image processing, pattern recognition, artificial inelligence, data mining. It has promising applications in areas such as visual surveillance, human performance analysis, computer-human interfaces, content based image retrieval/storage, and virtual reality. The aim of this project is to develop a simple scheme for understanding human behavior in a set of predefined situations like: actions of pedestrians in traffic, actions of patients in hospitals, etc. These actions include: running, walking, standing, turning around or sitting on a chair, laying , getting up. In our approach the human behavior is defined as a sequence of actions with a well known meaning. The humans are regarded as actors that perform these actions. The interconnection of more actions results in a behavior. We also distinguish a cooperative behavior that is formed by joint actions, in which two or more persons interact. Examples of singular actions, that characterize a single person, are: walking, running, standing, sitting down, laying down. Cooperative actions involve: having a conversation, crossing the street at a pedestrian cross, figthing etc. The tasks of this projects involve:- Environment for gathering visual information that result from the 2D analysis of images.
- Define a set of scenarios for which we will apply behavior recognition algorithms.
- Study of the attitudes that people have in the defined scenarios.
- Realize a methodology for extracting 2D image models that contain persons seen in the defined situations (e.g. pedestrians walking, running, standing )
- Study object recognition methods in general and apply them for recognizing different atitudes characteristic for a person.
- Spatio-temporal analysis for determining which objects are static and which are dynamic in the scene.
- Develop learning models for classifying persons' attitudes.
- Define semantic relations and concepts.
- Design semantic models that use visual data for extracting semantic concepts and relations.
- Implement probabilistic models like Support Vector Machines or Maximal Markov networks for structural learning of semantic concepts.
Project lifetime and stages
The project is divided into two main segments: Visual analysis and Semantic understanding. The two areas cover the expertise knowledge of the two partners.
The initial development plan contains the following stages:
- Study the methods for extracting and representing information from 2D images. During this stage, the Romanian partner will implement several algorithms for extracting visual features relevant for the objects to recognize. We will consider features like: histogram of oriented gradient, Gabor filters, Haar wavelets, directional derivatives, color information.
- Define a set of scenarios for which the recognition should be done (UTC + ISJ). During this stage the two partners will establish some scenarios in which the human behavior will be recognized. We will take video sequences that represent these scenarios. We will analyze the actions that persons perform in these scenarios and we will extract some semantic concepts.
- Extract pedestrian actions (UTC + ISJ) The Romanian partner will perform operations like: image segmentation, visual feature extraction to be used by learning algorithms. The experience of the Slovenian partner with classification methods, specially support vector machines, will be combined with visual recognition algorithms used by the Romanian research team. Another element of this stage is the spatio-temporal analysis that determines how the attitudes and the actions of people evolve in time. For this we can use Markov Models with which the Slovenian partner has a reach experience.
- Recognize pedestrian actions (UTC+ISJ) We will define a set of actions to be recognized. Each action as a label attached such that we can give it as input to the semantic program Cyc, implemented by the Slovenian partner.
- Prepare the micro-theory (ISJ). During this stage we will define the semantic concepts and relations in OpenCyc which can be used for semantically modeling scenes that are covered by the image dataset which will be used in the project.
- Manually annotate a subset of images with various scenes which will be used for automatic training of probabilistic models and evaluation of both rule based and probabilistic approaches (UTC+ISJ)
- Develop and evaluate probabilistic models such as Support Vector Machines or Maximal Margin Markov Networks (for structural learning). (ISJ)
- Testing and verifying the obtained product. It will be applied to traffic scenes for recognizing pedestrians walking on the pavement or crossing the street, persons running, or for recognizing actions done at the ATMs. (UTC+ISJ)
The four stages will have as results algorithms that detect and recognize the objects of interest (persons), track them for the whole video sequence and generate labels associated to actions. Behavior recognition will be done by analysing persons' actions. People may perform singular or cooperative actions, and each generates a meaning.