next up previous
Next: Conclusion Up: Detecting Unusual Activity in Previous: Computational running time


Experiments

In order to demonstrate the algorithm we conducted the following tests on various test data. Table 1 gives a short summary of the different tests. In the following experiments for vector quantization we used $K=500$ prototypes, for segment length $T=4s$ and $\beta$ is defined as $1/M$ where $M$ is the biggest value in $S_p$.


Table 1: The test videos used in our experiments
Title Duration Type of test
Road 19h50min surveillance
Poker game 30min cheating detection
Hospital 12h patient monitoring
Webcam 3h analyzing the crowd


The first testset is a video shot in a dinning room of a hospital. After removing the motionless frames, we still had $169\,880$ frames. We tested our embedding algorithm to see if it provides a good separation between different events. We observed that the unusual activities are embedded far from the usual ones, as can be seen in figure 7.

Figure 7: Four unusual activities being discovered, corresponding to four remote clusters in the embedding space: A: a patent eating alone at the near table, B: a man on wheel chair slowing goes in and out of the room while everyone else is eating, C: a patient shaking, D: a nurse feeding a patient one-on-one with no one around. E: the 2-D embedding of the video segments.
\includegraphics[width = 0.15 \textwidth, height = 0.13 \textwidth]{resultimages/hospital/the0.eps} \includegraphics[width = 0.15 \textwidth, height = 0.13 \textwidth]{resultimages/hospital/the2.eps} \includegraphics[width = 0.15 \textwidth, height = 0.13 \textwidth]{resultimages/hospital/the3.eps} \includegraphics[width = 0.15 \textwidth, height = 0.13 \textwidth]{resultimages/hospital/negthe2.eps}
A B C D
\includegraphics[width = 0.25 \textwidth, height = 0.17 \textwidth]{resultimages/embedding/eig3vs4_hospital.eps}
E

To quantify the ``goodness'' of the embedding provided in our previous experiment we used another video from a surveillance camera overlooking a road adjacent to a fenced facility. We have tested our system on a continuous video from 16:32pm till 12:22pm the next day, containing both day time and night time videos (in total $1\,063\,802$ image frames). We applied our embedding algorithm and classified the embedded segments into two groups, i.e. usual and unusual. To measure the performance we hand-labeled all the sequences (which contained motion) if they were unusual or not and compared our results to the ground truth. The promising results of this experiment are shown in figure 8. Though, this surveillance sequence is somewhat limited in the type of actions it contains (particularly it has just $23$ unusual sequences), we would like to point out that even without motion features, i.e. only with spatial histograms, we were able to detect events such as cars making U-turns, backing off, and people walking on and off the road.

Figure 8: Results for 20 hours long road surveillance video. Usual events consist of cars moving along the road. Correctly detected unusual events include: (A) cars pulling off the road, (B) cars stopping and backing up, (C) car making U-turns, and people walking on the road. Undetected unusual events include: (D) cars stopping on the far end, due to coarseness of spatial feature. False-positives include mainly birds flying by, and direct sunlight into camera (E). the Precision-Recall curve of the results is shown in (F). The star indicates the operating condition achieving the precision/false positive and the precision/recall trade-off shown in (A)-(E).
Image detect1 Image detect2 Image detect3 Image ndet1 Image false1
(A) (B) (C) (D) (E)
\includegraphics[width=0.18\textwidth, height = 0.13\textwidth]{bigfigs/PRCroad}
(F)

Next experiment was aimed to measure the performance in a more complex setting: we recorded a $30$ minutes long poker game sequence, where two players were asked to creatively cheat. The video contains $17\,902$ frames, and every $4$ second hand-labelled with one of the $27$ activity labels. There is a wide variety of natural actions, in addition to playing cards and cheating, players were drinking water, talking, hand gesturing, scratching. Many of the cheatings are among detected unusual events. To demonstrate we can detect a specific cheating type, we find those unusual events corresponding to a prototype feature chosen by us. The results of detecting two cheating types are shown in figure 9.

Figure 9: ``Elbow" cheating detection. A1,B1,C1: examples of detected cheatings, ``near player" reaches to his left elbow to hide a card. D1: non-detected cheating, ``near player" reaching to his elbow but doesn't hide anything; E1: false positives, the ``near player" makes different movement with his hand. ``Under" cheating detection. A2,B2,C2: example of detected events, two players exchange cards under the table; D2: non-detected cheatings, the exchange is mostly occluded. E2: false positives - the near player is drinking, due to camera angle his hand is in similar position. F1, F2: ROC curves of the two events: The red stars indicate the operating condition for results shown here.
Image A1 Image A2 Image detect1 \includegraphics[width = 0.15\textwidth, height = 0.13\textwidth]{resultimages/elbow/nondet1.eps} \includegraphics[width = 0.15\textwidth, height = 0.13\textwidth]{resultimages/elbow/falsepos1.eps}
A1 B1 C1 D1 E1
\includegraphics[width = 0.15\textwidth, height = 0.13\textwidth]{resultimages/under/A1.eps} \includegraphics[width = 0.15\textwidth, height = 0.13\textwidth]{resultimages/under/A2.eps} \includegraphics[width = 0.15\textwidth, height = 0.13\textwidth]{resultimages/under/detected1.eps} \includegraphics[width = 0.15\textwidth, height = 0.13\textwidth]{resultimages/under/nondet1.eps} \includegraphics[width = 0.15 \textwidth,height = 0.13\textwidth]{resultimages/under/falsepos1.eps}
A2 B2 C2 D2 E2
Image ROCelbow
F1
Image ROCunder
F2

To show that the algorithm can be used for categorizing usual events as well we took 3 hours long video from Berkeley Sproul Plaza webcam (http://www.berkeley.edu/webcams/sproul.html), which contained $28\,208$ frames. The embedding of video segments, and event category representatives are shown in figure 10 (left). The automatic categorization of events potentially can allow us to develop a statistical model of activities, in an unsupervised fashion.

Figure 10: (left) The embedding of the webcam video show videos are best organized by two independent event types in the scene. The horizontal axis (A-D) represents crowd movements along the building: many people walking (A), and few or no people walking (D). In the vertical axis (B-F) events of walking in/out of Sproul Hall are detected, and are organized according to which orientation people entered/left: (B) along the bottom of image frame; (F) diagonally from the lower left corner. (E) and (C) are compound events: (E) is combination of event (F) and (D), (C) is combination of (B) and (D). (right) Given the classification of the video into distinct events, a transition model is estimated.

\includegraphics[width = 0.5 \textwidth, height = 0.30 \textwidth]{resultimages/webcam/embedding_examples_med.eps} \includegraphics[width = 0.36 \textwidth, height = 0.28 \textwidth]{resultimages/embedding/transition.eps}


next up previous
Next: Conclusion Up: Detecting Unusual Activity in Previous: Computational running time
Mirko Visontai 2004-05-13