The object detection is done by a spatiotemporal separable
convolution (section 2). Hence the cost is:
, where
,
are the number of frames in the video, the number of
pixels in an image frame.
The complexity of the K-means algorithm is
, where
,
,
are respectively the number of prototype vectors, the size of the histograms (
) and the number of
iterations. Building
the co-occurrance and the similarity matrix has a cost of
.
Finally finding the second eigenvector of a symmetrical sparse
matrix takes
time, where
.
For example, the running times for the 20 hours road video are:
8 hours 40 minutes (object detection), 1 hours 36 minutes
(K-means) and 3 seconds (eigensolver) on a Pentium IV 2.4GHz.