Video Segmentation by Tracing Discontinuities in a Trajectory Embedding

Katerina Fragkiadaki 1       Geng Zhang 2       Jianbo Shi 1

1CIS, UPenn       2 X'ian University       

Abstract We want to segment and track objects occluding each other while navigating themselves in a crowded scene. We propose a tracking framework that mediates grouping cues from two levels of tracking granularities, (coarse) detection tracklets and (fine) point trajectories. We track objects in the joint detection tracklet and trajectory space, exploiting reliable detections when objects are visible while adapting to their changing visibility mask with trajectory clusters during partial occlusions. Each granularity proposes corresponding grouping cues: trajectories with similar long term motion and disparity attract to each other, detections overlapping in time repulse each other. Tracking is formulated as selection-clustering in the joint detection and trajectory space. We resolve contradictions between grouping cues from the two granularities in a RANSAC-clustering framework where sampled detections change the motion/disparity trajectory affinity cues, inducing appropriate repulsions between trajectoris claimed by repulsive detection tracklets.

Steered co-clustering of detection tracklets and trajectories

Detection tracklets and point trajectories: complementary for tracking/segmentation.

  • Detections capture objects when they are mostly visible. They may be sparse in time, may miss partially occluded or deformed objects, or contain false positives.
  • Point trajectories are dense in space and time. Their affinities integrate long range motion and 3D disparity information, useful for segmentation. Affinities may leak though across similarly moving objects.


Two-Granularity joint graph We establish:
1. Affinities between detection tracklets according to appearence similarity and motion smoothness.
2. Repulsions between detection tracklets overlapping in time.
3. Affinities between trajectories according to long term motion/ disparity similarity during their time overlap.
4. Detection-trajectory associations according to overlap of the detection mask with the trajectory, persistently in time.
Object tracking is formulated as co-clustering in the resulting joint graph of detection tracklets and point trajectories.
However, the joint graph suffers from: 1) false alarm detection tracklets that erroneously claim trajectories 2) affinity contradictions between trajectory affinities and detection tracklet repulsions in cases of accidental motion similarity, which confuse the co-clustering.


Steering Cut
We iteratively sample detection tracklets according to confidence. We steer trajectory affinities and associations to comply with the repulsions of the selected detectlets.


Clustering in the steered graph provides the space time object clusters.

Results-Code

UrbanStreet dataset release

The UrbanStreet dataset used in the paper can be downloadedhere [188M] . It contains 18 stereo sequences of pedestrians taken from a stereo rig mounted on a car driving in the streets of Philadelphia during rush hours. Image resolutions is 516x1024. The groundtruth is provided in the form of pedestrian segmentation masks only for the left view. All targets larger than 100 pixels are labelled every 4 frames (0.6 seconds) in each sequence. Groundtruth label samples are shown in the video below.


Please run script_showlabel.m to visualize all labelled frames.

Paper

Two Granularity Tracking: Mediating Trajectory and Detection Graphs for Tracking under Occlusions Katerina Fragkiadaki, Weiyu Zhang, Geng Zhang and Jianbo Shi. in ECCV, 2012 Paper | Poster | Bibtex


Last update: Dec, 2012.