Papers Published
Abstract:
A framework for unsupervised group activity analysis from a single video is
here presented. Our working hypothesis is that human actions lie on a union of
low-dimensional subspaces, and thus can be efficiently modeled as sparse linear
combinations of atoms from a learned dictionary representing the action's
primitives. Contrary to prior art, and with the primary goal of spatio-temporal
action grouping, in this work only one single video segment is available for
both unsupervised learning and analysis without any prior training information.
After extracting simple features at a single spatio-temporal scale, we learn a
dictionary for each individual in the video during each short time lapse. These
dictionaries allow us to compare the individuals' actions by producing an
affinity matrix which contains sufficient discriminative information about the
actions in the scene leading to grouping with simple and efficient tools. With
diverse publicly available real videos, we demonstrate the effectiveness of the
proposed framework and its robustness to cluttered backgrounds, changes of
human appearance, and action variability.