One important issue in the performance of motion segmentation techniques is the selection of the low-level features employed to characterize motion. For example, in segmentation using active models, low-level features are employed to define the external energy or image potential. Temporal derivatives work well in static background scenes (Paragios and Deriche, 2000), but not in the case of moving background. Optical flow estimations present many kinds of problems depending on the estimation technique (Barron et al., 1994). For example, differential methods are not very robust to noise, occlusions and small number or frames. Matching techniques, generally, consider only rigid transformations.
If segmentation is accomplished using active models, another important subject is how to initialize the model in each frame. A typical solution is to define the initial state of the model by user interaction for the first frame and, in subsequent frames, initialize the model with the segmentation of the previous one. In (Paragios and Deriche, 2000), instead of initialize the first frame manually, initialization of the first two frames is accomplished by motion detection, based on the temporal derivative, but this requires that the object moves on a static background. One of the main drawbacks of initialization with the previous segmentation appears with total occlusions, when the object disappears from the image for a number of frames, since there is no initial state when the object reappears. The case of small number of frames is also problematic. The object can be very distant from its previous position, so that the initial state could not be able converge to the new position.
When real-time tracking is not a demand, a good approach to deal with these problems is frequency analysis. Frequency patterns are robust to noise, occlusions and small number of frames. Furthermore, band-pass filtered versions of the sequence can be used to isolate motion patterns with different directions and velocities and, hence, to isolate different motion patterns.
Here, we propose a non-causal method for the detection and segmentation of motion patterns from video sequences, based on the use of composite frequency features to guide active model segmentation. In particular, we have applied our method for data-driven synthesis of composite-feature detectors (Dosil el al. 2006a, 2006b, 2005a, 2005b, 2005c, 2004c) to extract the most relevant spatio-temporal visual patterns. Such composite-features are a combination of log-Gabor band-pass features, tuned to different scales and orientations. The particular recombination log-Gabor features for a specific image is inferred from the input data by cluster analysis based on a criteria on alignment between energy maxima.
Composite-features are used here both in the definition of the external energy of a geodesic active model (Weickert and Kuhne, 2003) and for the initialization of the model in each frame. The process can be divided in two stages: isolation of visual patterns and segmentation of a given motion pattern using a geodesic active models.
For a particular image, we obtain a certain number of composite-features that are visually relevant static and/or moving patterns. In the particular example that follows we obtain two moving patterns with different velocities. You can find a detailed explanation of the decomposition method here. (Click the images in the diagrams to see the corresponding movies).
We select the visual pattern associated to the target object and proceed with segmentation using that information for the definition of initial state and image potential at each frame. The final segmentation is the result of stacking segmentations of individual frames.