Next: Conclusion
Up: Detecting Unusual Activity in
Previous: Computational running time
Experiments
In order to demonstrate the algorithm we conducted the following tests on
various test data. Table 1 gives a short summary of the different tests.
In the following experiments for vector quantization
we used prototypes, for segment length and is
defined as where is the biggest value in .
Table 1:
The test videos used in our experiments
Title |
Duration |
Type of test |
Road |
19h50min |
surveillance |
Poker game |
30min |
cheating detection |
Hospital |
12h |
patient monitoring |
Webcam |
3h |
analyzing the crowd |
|
The first testset is a video shot in a dinning room of a hospital.
After removing the motionless frames, we still had
frames. We tested our embedding algorithm to see if it provides
a good separation between different events. We observed that the
unusual activities are embedded far from the usual ones, as can be seen
in figure 7.
Figure 7:
Four unusual activities being discovered, corresponding
to four remote clusters in the embedding space: A: a patent eating
alone at the near table, B: a man on wheel chair slowing goes in
and out of the room while everyone else is eating, C: a patient
shaking, D: a nurse feeding a patient one-on-one with no one around.
E: the 2-D embedding of the video segments.
|
|
E |
|
|
To quantify the ``goodness'' of the embedding provided in our previous
experiment we used another video from a surveillance camera overlooking a
road adjacent to a fenced facility. We have tested our system on a
continuous video from 16:32pm till 12:22pm the next day, containing both
day time and night time videos (in total image frames).
We applied our embedding algorithm and classified the embedded segments
into two groups, i.e. usual and unusual. To measure the performance we
hand-labeled all the sequences (which contained motion) if they were
unusual or not and compared our results to the ground truth. The
promising results of this experiment are shown in figure 8.
Though, this surveillance sequence is somewhat limited in the type
of actions it contains (particularly it has just unusual sequences), we
would like to point out that even without motion
features, i.e. only with spatial histograms, we were able to detect events
such as cars making U-turns, backing off, and people walking on and off
the road.
Figure 8:
Results for 20 hours long road surveillance video. Usual events
consist of cars moving along the road. Correctly detected unusual
events include: (A) cars pulling off the road, (B) cars stopping and
backing up, (C) car making U-turns, and people walking on the road.
Undetected unusual events include: (D) cars stopping on
the far end, due to coarseness of spatial feature.
False-positives include mainly birds flying by, and
direct sunlight into camera (E). the Precision-Recall curve of the
results is shown in (F). The star indicates the operating
condition achieving the precision/false positive and the precision/recall trade-off shown
in (A)-(E).
|
|
(F) |
|
|
Next experiment was aimed to measure the performance
in a more complex setting: we recorded a minutes long poker game
sequence, where two players were asked to creatively cheat.
The video contains frames, and every second hand-labelled
with one of the activity labels. There is a wide variety of natural
actions, in addition to playing cards and cheating, players were
drinking water, talking, hand gesturing, scratching. Many of the
cheatings are among detected unusual events. To demonstrate we
can detect a specific cheating type, we find those unusual events
corresponding to a prototype feature chosen by us. The results of
detecting two cheating types are shown in figure 9.
Figure 9:
``Elbow" cheating detection. A1,B1,C1: examples of
detected cheatings, ``near player" reaches to his left elbow to
hide a card. D1: non-detected cheating, ``near player" reaching
to his elbow but doesn't hide anything; E1: false positives, the
``near player" makes different movement with his hand.
``Under" cheating detection. A2,B2,C2: example of detected
events, two players exchange cards under the table; D2:
non-detected cheatings, the exchange is mostly occluded. E2: false
positives - the near player is drinking, due to camera angle his
hand is in similar position. F1, F2: ROC curves of the two events:
The red stars indicate the operating condition for results shown
here.
|
|
|
|
|
A1 |
B1 |
C1 |
D1 |
E1 |
|
|
|
|
|
A2 |
B2 |
C2 |
D2 |
E2 |
|
|
F1 |
|
F2 |
|
|
To show that the algorithm can be used for categorizing usual events as well
we took 3 hours long
video from Berkeley Sproul Plaza webcam (http://www.berkeley.edu/webcams/sproul.html),
which contained frames. The embedding of video segments, and event category
representatives are shown in figure 10 (left).
The automatic categorization of events potentially
can allow us to develop a statistical model of activities, in an unsupervised fashion.
Figure 10:
(left) The embedding of the webcam video show videos are best organized by
two independent event types in the scene. The horizontal axis (A-D) represents crowd movements along the building:
many people walking (A), and few or no people walking (D).
In the vertical axis (B-F) events of walking in/out of Sproul Hall are detected, and are organized according to which orientation people entered/left: (B) along the bottom of image frame; (F) diagonally from the lower left corner. (E) and (C) are compound events: (E) is combination of event (F) and (D), (C) is combination of (B) and (D). (right) Given the classification of the video into distinct events, a transition model is estimated.
|
Next: Conclusion
Up: Detecting Unusual Activity in
Previous: Computational running time
Mirko Visontai
2004-05-13