Computer Sciences Seminar
Tuesday, April 9
12:30 PM, NAC 8/206

Parallel-Perspective Stereo Mosaics from Real-World Video

Zhigang Zhu
University of Massachusetts at Amherst

Abstract
Image-based modeling is an emerging area that has attracted lots of attention in computer vision, multimedia and computer graphics communities. Recently, there have been attempts in a variety of applications to add 3D information into image-based representations. The applications include geo-referenced scene/object modeling for surveillance, security and photogrammetry, image-based rendering for educational, architectural and entertainment purposes, and content-based video coding for high-ratio compression and 3D presentation. The key issues in image-based 3D scene modeling from real video are (1) compact representations that have large fields of view and super depth resolution and (2) efficient and robust algorithms to transform a video sequence into such representations. Unlike the previous work of generating stereo mosaics from a rotating camera, we attack the more difficult problem of creating stereo mosaics from a translating camera, which is a complex motion with six degrees of freedom (DOF), such as experienced in an airplane or car when performing surveillance tasks and for everyday navigation.

This talk summarizes our recent efforts in generating parallel-perspective stereo mosaics for large-scale 3D scenes with a single video camera mounted on a ground or airborne vehicle. Inspired by the technique in classical Chinese paintings, we propose to use multi-perspective geometry to represent large-scale 3D scene. Using a parallel-perspective representation, a pair of geometrically registered stereo mosaics can be constructed before we explicitly recover any 3D information under rather general motion. A PRISM (parallel ray interpolation for stereo mosaicing) algorithm is proposed to make seamless stereo mosaics from a rather sparse video sequence with obvious motion parallax. The epipolar geometry of parallel-perspective stereo mosaics generated under constrained 6 DOF motion is formulated, which shows optimal baselines and easy search for correspondence. We analyze the depth error characterization of the parallel stereo mosaics from real video, proving that the depth accuracy is a linear function of the absolute depth, instead of the commonly known constant depth resolution, which is still better than the second order function in two-view perspective stereo vision.

I will further discuss several lines of future research issues: stereo mosaics under arbitrary motion, occlusion analysis for seamless mosaicing, and a generalized layered representation. Finally, as an important application of the stereo mosaicing technique in remote education (e-learning), I will illustrate an ongoing 3D Virtual Campus and Classroom project featured virtual campus tour, automatic camera management and immersive teacher-student interaction.

Bio
Zhigang Zhu received his B.S., M.S. and Ph.D. degrees in computer science from Tsinghua University, Beijing, in 1988, 1991 and 1997 respectively. In 1991 he became a faculty member in the Department of Computer Science and Technology at Tsinghua University, and has been an Associate Professor since 1996. From 1997 to 1999 he was Director of the Information Processing and Application Division in the same department at Tsinghua. Since September 1998 he has been affiliated with the Department of Computer Science, the University of Massachusetts at Amherst and is now a Senior Research Fellow. His research interests include 3D computer vision, Human-Computer Interaction (HCI), virtual/augmented reality, video representation, and various applications in education, environment, robotics, surveillance and transportation. His recent research activities include: an integrated visual navigation approach with panoramic, omnidirectional and stereo vision sensors; new 3D layered representations for image-based rendering and robot navigation; a novel algorithm and system of stereo mosaics for airborne environment monitoring; a virtual stereo vision system using panoramic sensors for distributed and cooperative robots; and online 3D Virtual Classroom using image-based modeling/rendering, human tracking, and multi-modal information extraction for advanced e-learning. At Tsinghua University, he had conducted more than seven NSF, High Tech and Advanced Research projects as PI or Co-PI. His Ph.D. thesis On Environment Modeling for Visual Navigation was selected in 1999 as a special award in the top 100 dissertations in all China over the last three years, and a book based on his Ph.D. thesis is published by China Higher Education Press. Currently he is being involved in several DARPA, NSF and industrial projects at UMass, focusing on scene modeling, human tracking and advanced HCI for various applications.