Real-time processing is essential for the dynamic and unpredictable environments
such as earthquake
victims or people in a building on fire. It is important for visual sensing
to rapidly focus attention on important activity in the environment. Any
room or corridor should be searched quickly to detect people and fire. Thus,
we employ a camera with a panoramic lens to detect and track multiple objects
in motion in a full 360-degree view in real time.
Here is a list of vision functionality fulfilled, or to be incorporated (marked by *).
Cylindrical Image Unwarping - off-line camera calibration and on-line geometric transformation for removing the circularly spherical distortion.
Motion-Based Object Detection - from a moving or stationary robot (but ego-motion greatly increases the complexity of processing); updating the background and segmenting the moving object from the background.
Multiple Moving Object Tracking - after detection of moving objects, each is tracked in 2D and 3D; identify and analyze dynamics of moving objects; motion, texture, and shape cues used in tracking process.
3D Mapping* - stereo and motion processing used to construct 3D location of moving objects, significant events, obstacles to motion, and boundaries of rooms as necessary to goal-oriented task.
Object Identification* - finding people, fire, room entrances and exits; use appearance-based methods (using static color or gray-level patterns) and/or temporal-appearance-based methods (using motion patterns from tracking)
Human Detection* - humans can be detected even when stationary by focus-of-attention from previous motion, appearance-based matching, face detection, sound localization, etc.
Graphic User Interface(GUI): contours and tracks of moving objects superimposed on cylindrical video mosaic representation for system testing and interface to remote tele-presence visualizations. We have implemented a prototype version of the GUI.
The four moving objects are shown in the un-warped cylindrical image of Fig. 1 (b), a more natural panoramic representation for user interpretation. Each of the four people were completely extracted from the complex background as depicted by the bounding rectangle, direction, and distance to each object. The system tracks each object through the image sequence, even if there are overlap and occlusion between two people. The dynamic track, represented as a small circle and icon (elliptic head and body) for the last 30 frames of each person is shown in Fig. 1(c) in different colors. The final object image is depicted at the end of the corresponding track. Notice that the humans reversed directions, and that overlap and occlusion were successfully handled (see the blue and the green sequences). The system can detect the self-motion, change in the environment, illumination, and sensor failure, while refreshing the background accordingly.