Publication Abstracts

PUBLICATION ABSTRACTS

Contents (in reverse chronological order)

RGB-D Camera Calibration and Trajectory Estimation with Applications to Indoor Mapping
PhotoSketch: A Photocentric Urban 3D Modeling System
Chronic Hypertension Increases Aortic Endothelial Hydraulic Conductivity by Upregulating Endothelial Aquaprin-1 Expression
Multiscale 3D Feature Extraction and Matching with an Application to 3D Face Recognition
Image Warping for Retargeting Garments Among Arbitrary Poses
Page Turning Solutions for Musicians: A Survey
Multiscale 3D Feature Extraction and Matching
Lightweight 3D Modeling of Urban Buildings from Range Data
Integrating Automated Range Registration with Multiview Geometry for the Photorealistic Modeling of Large-Scale Scenes
Think Globally, Cluster Locally: A Unified Framework for Range Segmentation
LDV Sensing and Processing for Remote Hearing in a Multimodal Surveillance System
LDV Sensing and Processing for Remote Hearing in a Multimodal Surveillance System
Remote Voice Acquisition in Multitmodel Surveillance
Multiview Geometry for Texture Mapping 2D Images Onto 3D Range Data
3D Modeling Using Planar Segments and Mesh Elements
Dynamic 3D Urban Scene Modeling Using Multiple Pushbroom Mosaics
Content-Based 3D Mosaics for Dynamic Urban 3D Scenes
Image Registration Using Log-Polar Mappings for Recovery of Large-Scale Similarity and Projective Transformations
Content-Based 3D Mosaic Representation for Video of Dynamic 3D Scenes
Integrating LDV Audio and IR Video for Remote Multimodal Surveillance
3D and Moving Target Extraction from Dynamic Pushbroom Stereo Mosaics
Sampling, Reconstruction, and Antialiasing
Rendering Traditional Mosaics
An Energy Minimization Framework for Monotonic Cubic Spline Interpolation
One-Dimensional Resampling with Inverse and Forward Mapping Functions
Robust Image Registration Using Log-Polar Transform
Image Registration for Perspective Deformation Recovery
Monotonic Cubic Spline Interpolation
Image Morphing: A Survey
Polymorph: Morphing Among Multiple Images
Nonuniform Image Reconstruction Using Multilevel Surface Interpolation
Scattered Data Interpolation Using Multilevel B-splines
Image Metamorphosis With Scattered Feature Constraints
Recent Advances in Image Morphing
Restoration of Images Scanned in the Presence of Vibrations
Inverting Input Scanner Vibration Errors
Image Metamorphosis Using Snakes and Free-Form Deformations
Characterization of Vibration-Induced Image Defects in Input Scanners
Restoration of Images Degraded By Input Scanner Vibrations
Fast Convolution With Packed Lookup Tables
A Two-Pass Mesh Warping Implementation of Morphing
A Fast Algorithm for Digital Image Scaling
Local Image Reconstruction and Subpixel Restoration Algorithms
Correcting Chromatic Aberration Using Image Warping
Separable Image Warping With Spatial Lookup Tables
Skeleton-Based Image Warping
Image Warping Among Arbitrary Planar Shapes
A Syntactic Omni-font Character Recognition System
An Algorithm for the Segmentation of Bilevel Images
Restoration of Degraded Binary Images Using Stochastic Relaxation with Annealing

RGB-D Camera Calibration and Trajectory Estimation with Applications to Indoor Mapping

Autonomous Robots, vol. 44, no. 5, pp. 1485-1503, November 2020.

In this paper, we present a system for estimating the trajectory of a moving RGB-D camera with applications to building maps of large indoor environments. Unlike the current most researches, we propose a ‘feature model’ based RGB-D visual odometry system for a computationally-constrained mobile platform, where the ‘feature model’ is persistent and dynamically updated from new observations using a Kalman filter. In this paper, we firstly propose a mixture of Gaussians model for the depth random noise estimation, which is used to describe the spatial uncertainty of the feature point cloud. Besides, we also introduce a general depth calibration method to remove systematic errors in the depth readings of the RGB-D camera. We provide comprehensive theoretical and experimental analysis to demonstrate that our model based iterative-closest-point (ICP) algorithm can achieve much higher localization accuracy compared to the conventional ICP. The visual odometry runs at frequencies of 30 Hz or higher, on VGA images, in a single thread on a desktop CPU with no GPU acceleration required. Finally, we examine the problem of place recognition from RGB-D images, in order to form a pose-graph SLAM approach to refining the trajectory and closing loops. We evaluate the effectiveness of the system on using publicly available datasets with ground-truth data. The entire system is available for free and open-source online.
[Download ar20.pdf (4344599 bytes)]

PhotoSketch: A Photocentric Urban 3D Modeling System

Visual Computer, vol. 34, no. 5, pp. 605-616, May 2018.

Online mapping services from Google, Apple, and Microsoft are exceedingly popular applications for exploring 3D urban cities. Their explosive growth provides impetus for photorealistic 3D modeling of urban scenes. Although classical algorithms such as multiview stereo and laser range scanners are traditional sources for detailed 3D models of existing structures, they generate heavyweight models that are not appropriate for the streaming data that these navigation applications leverage. Instead, lightweight models as produced by interactive image-based tools are better suited for this domain. The contribution of this work is that it merges the benefits of multiview geometry, an intuitive sketching interface, and dynamic texture mapping to produce lightweight photorealistic 3D models of buildings. We present experimental results from urban scenes using our PhotoSketch system.
[Download vc18.pdf (4155317 bytes)]

Chronic Hypertension Increases Aortic Endothelial Hydraulic Conductivity by Upregulating Endothelial Aquaprin-1 Expression

American J. Physiology Heart and Circulatory Physiology , vol. 313, no. 5, pp. H1063-H1073, July 2017.

Numerous studies have examined the role of aquaporins in osmotic water transport in various systems, but virtually none have focused on the role of aquaporin in hydrostatically driven water transport involving mammalian cells save for our laboratory's recent study of aortic endothelial cells. Here, we investigated aquaporin-1 expression and function in the aortic endothelium in two high-renin rat models of hypertension, the spontaneously hypertensive genetically altered Wystar-Kyoto rat variant and Sprague-Dawley rats made hypertensive by two kidney, one clip Goldblatt surgery. We measured aquaporin-1 expression in aortic endothelial cells from whole rat aortas by quantitative immunohistochemistry and function by measuring the pressure-driven hydraulic conductivities of excised rat aortas with both intact and denuded endothelia on the same vessel. We used them to calculate the effective intimal hydraulic conductivity, which is a combination of endothelial and subendothelial components. We observed well-correlated enhancements in aquaporin-1 expression and function in both hypertensive rat models as well as in aortas from normotensive rats whose expression was upregulated by 2 h of forskolin treatment. Upregulated aquaporin-1 expression and function may be a response to hypertension that critically determines conduit artery vessel wall viability and long-term susceptibility to atherosclerosis.
[Download ajpheart17.pdf (635430 bytes)]

Multiscale 3D Feature Extraction and Matching with an Application to 3D Face Recognition

Graphical Models, vol. 75, pp. 157-176, July 2013.

We present a new multiscale surface representation for 3D shape matching that is based on scale-space theory. The representation, Curvature Scale-Space 3D (CS3), is well-suited for measuring dissimilarity between (partial) surfaces having unknown position, orientation, and scale. The CS3 representation is obtained by evolving the surface curvatures according to the heat equation. This evolution process yields a stack of increasingly smoothed surface curvatures that is useful for keypoint extraction and descriptor computations. We augment this information with an associated scale parameter at each stack level to define our multiscale CS3 surface representation. The scale parameter is necessary for automatic scale selection, which has proven to be successful in 2D scale-invariant shape matching applications. We show that our keypoint and descriptor computation approach outperforms many of the leading methods. The main advantages of our representation are its computational efficiency, lower memory requirements, and ease of implementation.
[Download gmod13.pdf (2960968 bytes)]

Image Warping for Retargeting Garments Among Arbitrary Poses

Visual Computer, vol. 29, no. 6, pp. 525-534, June 2013. DOI: 10.1007/s00371-013-0816-2.
Presented at Computer Graphics International, June 2013.

We address the problem of warping 2D images of garments onto target mannequins of arbitrary poses. The motivation for this work is to enable an online shopper to drag and drop selected articles of clothing onto a single mannequin to configure and visualize outfits. Such a capability requires each garment to be available in a pose that is consistent with the target mannequin. A 2D deformation system is proposed, which enables a designer to quickly deform images of clothing onto a target shape with both fine and coarse controls over the deformation. This system has retargeted thousands of images for retailers to establish virtual dressing rooms for their online customers.
[Download vc13.pdf (1539393 bytes)]

Page Turning Solutions for Musicians: A Survey

WORK: A Journal of Prevention, Assessment, and Rehabilitation, vol. 41, no. 1, pp. 37-52, January 2012.

Musicians have long been hampered by the challenge in turning sheet music while their hands are occupied playing an instrument. The sight of a human page turner assisting a pianist during a performance, for instance, is not uncommon. This need for a page turning solution is no less acute during practice sessions, which account for the vast majority of playing time. Despite widespread appreciation of the problem, there have been virtually no robust and affordable products to assist the musician. Recent progress in assistive technology and electronic reading devices offer promising solutions to this long-standing problem. The objective of this paper is to survey the technology landscape and assess the benefits and drawbacks of page turning solutions for musicians. A full range of mechanical and digital page turning products are reviewed.
[Download work12.pdf (2863525 bytes)]

Multiscale 3D Feature Extraction and Matching

Proc. 3D Data Imaging, Modeling, Processing, Visualization, and Transmission (3DIMPVT11), Hangzhou, China, May 2011.

Partial 3D shape matching refers to the process of computing a similarity measure between partial regions of 3D objects. This remains a difficult challenge without a priori knowledge of the scale of the input objects, as well as their rotation and translation. This paper focuses on the problem of partial shape matching among 3D objects of unknown scale. We consider the problem of face detection on arbitrary 3D surfaces and introduce a multiscale surface representation for feature extraction and matching. This work is motivated by the scale-space theory for images. Scale-space based techniques have proven very successful for dealing with noise and scale changes in matching applications for 2D images. However, efficient and practical scale-space representations for 3D surfaces are lacking. Our proposed scale-space representation is defined in terms of the evolution of surface curvatures according to the heat equation. This representation is shown to be insensitive to noise, computationally efficient, and capable of automatic scale selection. Examples in face detection and surface registration are given.
[Download 3dimpvt11a.pdf (1764745 bytes)]

Lightweight 3D Modeling of Urban Buildings from Range Data

Proc. 3D Data Imaging, Modeling, Processing, Visualization, and Transmission (3DIMPVT11), Hangzhou, China, May 2011, pp. 124-131.

Laser range scanners are widely used to acquire accurate scene measurements. The massive point clouds they generate, however, present challenges to efficient modeling and visualization. State-of-the-art techniques for generating 3D models from voluminous range data is well-known to demand large computational and storage requirements. In this paper, attention is directed to the modeling of urban buildings directly from range data. We present an efficient modeling algorithm that exploits a priori knowledge that buildings can be modeled from cross-sectional contours using extrusion and tapering operations. Inspired by this simple workflow, we identify key cross-sectional slices among the point cloud. These slices capture changes across the building facade along the principal axes. Standard image processing algorithms are used to remove noise, fill missing data, and vectorize the projected points into planar contours. Applying extrusion and tapering operations to these contours permits us to achieve dramatic geometry compression, making the resulting models suitable for web-based applications such as Google Earth or Microsoft Virtual Earth. This work has applications in architecture, urban design, virtual city touring, and online gaming. We present experimental results on synthetic and real urban building datasets to validate the proposed algorithm.
[Download 3dimpvt11b.pdf (4768294 bytes)]

Integrating Automated Range Registration with Multiview Geometry for the Photorealistic Modeling of Large-Scale Scenes

Intl. Journal on Computer Vision, vol. 78, no. 2-3, pp. 237-260, July 2008.

The photorealistic modeling of large-scale scenes, such as urban structures, requires a fusion of range sensing technology and traditional digital photography. This paper presents a system that integrates automated 3D-to-3D and 2D-to-3D registration techniques, with multiview geometry for the photorealistic modeling of urban scenes. The 3D range scans are registered using our automated 3D-to-3D registration method that matches 3D features (linear or circular) in the range images. A subset of the 2D photographs are then aligned with the 3D model using our automated 2D-to-3D registration algorithm that matches linear features between the range scans and the photographs. Finally, the 2D photographs are used to generate a second 3D model of the scene that consists of a sparse 3D point cloud, produced by applying a multiview geometry (structure-from-motion) algorithm directly on a sequence of 2D photographs. The last part of this paper introduces a novel algorithm for automatically recovering the rotation, scale, and translation that best aligns the dense and sparse models. This alignment is necessary to enable the photographs to be optimally texture mapped onto the dense model. The contribution of this work is that it merges the benefits of multiview geometry with automated registration of 3D range scans to produce photorealistic models with minimal human interaction. We present results from experiments in large-scale urban scenes.
[Download ijcv08.pdf (4740673 bytes)]

Think Globally, Cluster Locally: A Unified Framework for Range Segmentation

4th Intl. Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT08), Georgia Institute of Technology, Atlanta, GA, June 2008.

Modern range scanners can capture the geometry of large urban scenes on an unprecedented scale. While the volume of data is overwhelming, urban scenes can be approximated well by parametric surfaces such as planes. Piecewise planar representation can reduce the size of the data dramatically. Furthermore, it is ideal for rendering and other high-level applications. We present a segmentation algorithm that extracts a piecewise planar function from a large range image. Many existing algorithms for large datasets apply planar criteria locally to achieve efficient segmentations. Our novel framework combines local and global approximants to guarantee truly planar components in the output. To demonstrate the effectiveness of our approach, we present an evaluation method for piecewise planar segmentation results based on the minimum description length principle. We compare our method to region growing on simulated and actual data. Finally, we present results on large scale range images acquired at New York's Grand Central Terminal.
[Download 3dpvt08.pdf (1148341 bytes)]

LDV Sensing and Processing for Remote Hearing in a Multimodal Surveillance System

Chapter 4 in Multimodal Surveillance: Sensors, Algorithms and Systems, Z. Zhu and T. S. Huang (eds), ISBN-10: 1596931841, Artech House Publisher, July 2007, pp 59-90.

See description of book here.

LDV Sensing and Processing for Remote Hearing in a Multimodal Surveillance System

Proc. IEEE Conf. on Computer Vision and Pattern Recognition, June 2007.

Recent improvements in laser Doppler vibrometry (LDV), day/night infrared (IR), and electro-optical (EO) imaging technology have created the opportunity to create a long-range multimodal surveillance system. This multimodal capability would greatly improve security force performance through clandestine listening of targets that are probing or penetrating a perimeter defense. This system could also provide the feeds for advanced face and voice recognition systems. The study of the capabilities of these three types of sensors are critical to such surveillance tasks. IR and EO cameras have been studied and widely used in human and vehicle detection in traffic and surveillance applications. Laser Doppler vibrometers can effectively detect vibration within two hundred meters with a sensitivity on the order of 1μm/s. These instruments have been used to measure the vibrations of civil structures like high-rise buildings, bridges, towers, etc. at distances of up to 200m. However, literature on remote acoustic detection using the emerging LDVs is rare. Therefore, we mainly focus on the experimental study of the LDV-based voice detection, in the context of a multimodal surveillance system. This paper presents an overall picture of our technical approach: the integration of laser Doppler vibrometry and IR /color imaging for multimodal surveillance.
[Download cvpr07.pdf (95620 bytes)]

Remote Voice Acquisition in Multimodel Surveillance

Proc. IEEE Conf. on Multimedia and Expo (ICME 2006) , pp. 1649-1652, Toronto, Canada, July, 2006.

Multimodal surveillance systems using visible/IR cameras and other sensors are widely deployed today for security purposes, particularly when subjects are at a large distance. However, audio information as an important data source has not been well explored. One of the reasons is because audio detection using microphones needs installation close to the subjects in monitoring. In this paper, we investigate a novel ``optical'' sensor, called Laser Doppler Vibrometer (LDV), for capturing voice signals in a very large range to realize a truly remote and multimodal surveillance system. Speech enhancement approaches are studied based on the characteristics of LDV audio. Experimental results show that remote voice detection via an LDV is promising when choosing appropriate targets close to human subjects in the environment.
[Download icme06.pdf (447656 bytes)]

Multiview Geometry for Texture Mapping 2D Images Onto 3D Range Data

Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2293-2300, June 2006.

The photorealistic modeling of large-scale scenes, such as urban structures, requires a fusion of range sensing technology and traditional digital photography. This paper presents a system that integrates multiview geometry and automated 3D registration techniques for texture mapping 2D images onto 3D range data. The 3D range scans and the 2D photographs are respectively used to generate a pair of 3D models of the scene. The first model consists of a dense 3D point cloud, produced by using a 3D-to-3D registration method that matches 3D lines in the range images. The second model consists of a sparse 3D point cloud, produced by applying a multiview geometry (structure-from-motion) algorithm directly on a sequence of 2D photographs. This paper introduces a novel algorithm for automatically recovering the rotation, scale, and translation that best aligns the dense and sparse models. This alignment is necessary to enable the photographs to be optimally texture mapped onto the dense model. The contribution of this work is that it merges the benefits of multiview geometry with automated registration of 3D range scans to produce photorealistic models with minimal human interaction. We present results from experiments in large-scale urban scenes.
[Download cvpr06.pdf (236121 bytes)]

3D Modeling Using Planar Segments and Mesh Elements

3rd Intl. Symposium on 3D Data Processing, Visualization, and Transmission (3DPVT06), University of North Carolina, Chapel Hill, June 2006.

Range sensing technology allows the photorealistic modeling of large-scale scenes, such as urban structures. The generated 3D representations, after automated registration, are useful for urban planning, historical preservation, or virtual reality applications. One major issue in 3D modeling of complex large-scale scenes is that the final result is a dense complicated mesh. Significant, in some cases manual, postprocessing (mesh simplification, hole filling) is required to make this representation usable by graphics or CAD applications. This paper presents a 3D modeling approach that models large planar scene areas of the scene with planar primitives (extracted via a segmentation pre-process), and non-planar areas with mesh primitives. In that respect, the final model is significantly compressed. Also, lines of intersection between neighboring planes are modeled as such. These steps bring the model closer to graphics/CAD applications. We present results from experiments with complex range scans from urban structures and from the interior of a large-scale landmark urban building (Grand Central Terminal, NYC).
[Download 3dpvt06a.pdf (781883 bytes)]

Dynamic 3D Urban Scene Modeling Using Multiple Pushbroom Mosaics (3DPVT06)

3rd Intl. Symposium on 3D Data Processing, Visualization, and Transmission, University of North Carolina, Chapel Hill, June 2006.

In this paper, a unified, segmentation-based approach is proposed to deal with both stereo reconstruction and moving objects detection problems using multiple stereo mosaics. Each set of parallel-perspective (pushbroom) stereo mosaics is generated from a video sequence captured by a single video camera. First a color-segmentation approach is used to extract the so-called natural matching primitives from a reference view of a pair of stereo mosaics to facilitate both 3D reconstruction of textureless urban scenes and man-made moving targets (e.g., vehicles). Multiple pairs of stereo mosaics are used to improve the accuracy and robustness in 3D recovery and occlusion handling. Moving targets are detected by inspecting their 3D anomalies, either violating the epipolar geometry of the pushbroom stereo or exhibiting abnormal 3D structure. Experimental results on both simulated and real video sequences are provided to show the effectiveness of our approach.
[Download 3dpvt06b.pdf (3262183 bytes)]

Content-Based 3D Mosaics for Dynamic Urban 3D Scenes

Proc. SPIE: Defense and Security Symposium, Orlando, Florida, April 2006.

We propose a content-based 3D mosaic (CB3M) representation for long video sequences of 3D and dynamic scenes captured by a camera on a mobile platform. The motion of the camera has a dominant direction of motion (as on an airplane or ground vehicle), but 6 DOF motion is allowed. In the first step, a set of parallel-perspective (pushbroom) mosaics with varying viewing directions is generated to capture both the 3D and dynamic aspects of the scene under the camera coverage. In the second step, a segmentation-based stereo matching algorithm is applied to extract parametric representations of the color, structure and motion of the dynamic and/or 3D objects in urban scenes where a lot of planar surfaces exist. Multiple pairs of stereo mosaics are used for facilitating reliable stereo matching, occlusion handling, accurate 3D reconstruction and robust moving target detection. We use the fact that all the static objects obey the epipolar geometry of pushbroom stereo, whereas an independent moving object either violates the epipolar geometry if the motion is not in the direction of sensor motion or exhibits unusual 3D structures. The CB3M is a highly compressed visual representation for a very long video sequence of a dynamic 3D scene. More importantly, the CB3M representation has object contents of both 3D and motion. Experimental results are given for the CB3M construction for both simulated and real video sequences to show the accuracy and effectiveness of the representation.
[Download spie06.pdf (2221849 bytes)]

Image Registration Using Log-Polar Mappings for Recovery of Large-Scale Similarity and Projective Transformations

IEEE Trans. Image Processing, vol. 14, no. 10, pp. 1422-1434, October 2005.

This paper describes a novel technique to recover large similarity transformations (rotation/scale/translation) and moderate perspective deformations among image pairs. We introduce a hybrid algorithm that features log-polar mappings and nonlinear least squares optimization. The use of log-polar techniques in the spatial domain is introduced as a preprocessing module to recover large scale changes (e.g., at least four-fold) and arbitrary rotations. Although log-polar techniques are used in the Fourier-Mellin transformto accommodate rotation and scale in the frequency domain, its use in registering images subjected to very large scale changes has not yet been exploited in the spatial domain. In this paper, we demonstrate the superior performance of the log-polar transform in featureless image registration in the spatial domain. We achieve subpixel accuracy through the use of nonlinear least squares optimization. The registration process yields the eight parameters of the perspective transformation that best aligns the two input images. Extensive testing was performed on uncalibrated real images and an array of 10,000 image pairs with known transformations derived from the Corel Stock Photo Library of royalty-free photographic images.
[Download tip05.pdf (2405299 bytes)]

Content-Based 3D Mosaic Representation for Video of Dynamic 3D Scenes

IEEE/AIPR Workshop on Multi-Modal Imaging, Washington, DC, October 2005.

We propose a content-based three-dimensional (3D) mosaic representation for long video sequences of 3D and dynamic scenes captured by a camera on a mobile platform. The motion of the camera has a dominant direction of motion (as on an airplane or ground vehicle), but 6 degrees-of-freedom (DOF) motion is allowed. In the first step, a pair of generalized parallel- perspective (pushbroom) stereo mosaics is generated that captured both the 3D and dynamic aspects of the scene under the camera coverage. In the second step, a segmentation-based stereo matching algorithm is applied to extract parametric representation of the color, structure and motion of the dynamic and/or 3D objects in urban scenes where a lot of planar surfaces exist. Based on these results, the content-based 3D mosaic (CB3M) representation is created, which is a highly compressed visual representation for very long video sequences of dynamic 3D scenes. Experimental results will be given.
[Download aipr05.pdf (2571197 bytes)]

Integrating LDV Audio and IR Video for Remote Multimodal Surveillance

IEEE Workshop on Object Tracking and Classification In and Beyond the Visible Spectrum, in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, June 2005.

This paper describes a multimodal surveillance system for human signature detection. The system consists of three types of sensors: infrared (IR) cameras, pan/tilt/zoom (PTZ) color cameras and laser Doppler vibrometers (LDVs). The LDV is explored as a new non-contact remote voice detector. We have found that voice energy vibrates most objects and the vibrations can be detected by an LDV. Since signals captured by the LDV are very noisy, we have designed algorithms with Gaussian bandpass filtering and adaptive volume scaling to enhance the LDV voice signals. The enhanced voice signals are intelligible from targets without retro-reflective finishes at short or medium distances (100m). By using retro-reflective tapes, the distance could be as far as 300 meters. However, the manual operation to search and focus the laser beam on a target with both vibration and reflection is very difficult at medium and large distances. Therefore, infrared (IR) imaging for target selection and localization is also discussed. Future work remains in automatic LDV targeting and intelligent refocusing for long range LDV listening.
[Download otcbvs05.pdf (726400 bytes)]

3D and Moving Target Extraction from Dynamic Pushbroom Stereo Mosaics

IEEE Workshop on Advanced 3D Imaging for Safety and Security, in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition, San Diego, CA, June 2005.

In this paper, we propose a dynamic pushbroom stereo mosaic approach for representing and extracting 3D structures and independent moving targets from urban 3D scenes. Our goal is to acquire panoramic mosaic maps with motion tracking information for 3D (moving) targets using a light aerial vehicle equipped with a video camera flying over an unknown area for urban surveillance. In dynamic pushbroom stereo mosaics, independent moving targets can be easily identified in the matching process of stereo mosaics by detecting the "out-of-place" regions that violate epipolar constraints and/or give 3D anomalies. We propose a segmentation-based stereo matching approach with natural matching primitives to estimate the 3D structure of the scene, particularly the ground structures (e.g., roads) on which humans or vehicles move, and then to identify moving targets and to measure their 3D structures and movements.
[Download a3diss05.pdf (1431382 bytes)]

Sampling, Reconstruction, and Antialiasing

CRC Handbook of Computer Science and Engineering, CRC Press, 1997, 2004.

This chapter reviews the principal ideas of sampling theory, reconstruction, and antialiasing. Sampling theory is central to the study of sampled-data systems, e.g., digital image transformations. It lays a firm mathematical foundation for the analysis of sampled signals, offering invaluable insight into the problems and solutions of sampling. It does so by providing an elegant mathematical formulation describing the relationship between a continuous signal and its samples. We use it to resolve the problems of image reconstruction and aliasing. Reconstruction is an interpolation procedure applied to the sampled data. It permits us to evaluate the discrete signal at any desired position, not just the integer lattice upon which the sampled signal is given. This is useful when implementing geometric transformations, or warps, on the image. Aliasing refers to the presence of unreproducibly high frequencies in the image and the resulting artifacts that arise upon undersampling. Together with defining theoretical limits on the continuous reconstruction of discrete input, sampling theory yields the guidelines for numerically measuring the quality of various proposed filtering techniques. This proves most useful in formally describing reconstruction, aliasing, and the filtering necessary to combat the artifacts that may appear at the output.
[Download crc04.pdf (5827433 bytes)]

Rendering Traditional Mosaics

Visual Computer, vol. 19, pp. 67-78, 2003.

This paper discusses the principles of traditional mosaics, and describes a technique for implementing a digital mosaicing system. The goal of this work is to transform digital images into traditional mosaic-like renderings. We achieve this effect by recovering freeform feature curves from the image and laying rows of tiles along these curves. Composition rules are applied to merge these tiles into an intricate jigsaw that conforms to classical mosaic styles. Mosaic rendering offers the user flexibility over every aspect of this craft, including tile arrangement, shapes, and colors. The result is a system that makes this wonderful craft more flexible and widely accessible than previously possible.
[Download vc03.pdf (1282445 bytes)]

An Energy Minimization Framework for Monotonic Cubic Spline Interpolation

Journal of Computational and Applied Mathematics, vol. 143, no. 2, pp. 145-188, 2002.

This paper describes the use of cubic splines for interpolating monotonic data sets. Interpolating cubic splines are popular for fitting data because they use low-order polynomials and have C2 continuity, a property that permits them to satisfy a desirable smoothness constraint. Unfortunately, that same constraint often violates another desirable property: monotonicity. It is possible for a set of monotonically increasing (or decreasing) data points to yield a curve that is not monotonic, i.e., the spline may oscillate. In such cases, it is necessary to sacrifice some smoothness in order to preserve monotonicity.

The goal of this work is to determine the smoothest possible curve that passes through its control points while simultaneously satisfying the montonicity constraint. We first describe a set of conditions that form the basis of the monotonic cubic spline interpolation algorithm presented in this paper. The conditions are simplified and consolidated to yield a fast method for determining monotonicity. This result is applied within an energy minimization framework to yield linear and nonlinear optimization-based methods. We consider various energy measures for the optimization objective functions. Comparisons among the different techniques are given, and superior monotonic C2 cubic spline interpolation results are presented. Extensions to shape preserving splines and data smoothing are described.
[Download jcam02.pdf (370852 bytes)]

One-Dimensional Resampling with Inverse and Forward Mapping Functions

Journal of Graphics Tools, vol. 5, no. 3, pp. 11-33, 2001.

Separable resampling algorithms significantly reduce the complexity of image warping. Fant presented a separable algorithm that is well suited for hardware implementation [IEEE CG&A, 1986]. That method, however, is inherently serial, and applies only when the inverse mapping is given. Wolberg presented another algorithm that is less suited for hardware implementation, and applies only when the forward mapping is given [Digital Image Warping, 1990]. This paper demonstrates the equivalence of the two algorithms in the sense that they produce identical output scanlines. We derive a variation of Fant's algorithm that applies when the forward mapping is given, and a variation of Wolberg's algorithm that applies when the inverse mapping is given. Integrated hardware implementations that perform 1-D resampling under either forward or inverse mappings are presented for both algorithms based on their software descriptions. The Fant algorithm has the advantage of being simple when implemented in hardware, while the Wolberg algorithm has the advantage of being parallelizable and facilitates a faster software implementation. The Wolberg algorithm also has the advantage of decoupling the roundoff errors made among intervals since it does not accrue errors through the incremental calculations required by the Fant algorithm.
[Download jgt01.pdf (209030 bytes)]

Robust Image Registration Using Log-Polar Transform

Proc. IEEE Intl. Conf. on Image Processing, Vancouver, Canada, September 2000.

This paper describes a hierarchical image registration algorithm for affine motion recovery. The algorithm estimates the affine transformation parameters necessary to register any two digital images misaligned due to rotation, scale, shear, and translation. The parameters are computed iteratively in a coarse-to-fine hierarchical framework using a variation of the Levenberg-Marquadt nonlinear least squares optimization method. This approach yields a robust solution that precisely registers images with subpixel accuracy. A log-polar registration module is introduced to accommodate arbitrary rotation angles and a wide range of scale changes. This serves to furnish a good initial estimate for the optimization-based affine registration stage. We demonstrate the hybrid algorithm on pairs of digital images subjected to large affine motion.
[Download icip00.pdf (169773 bytes)]

Image Registration for Perspective Deformation Recovery

SPIE Conf. on Automatic Target Recognition X, Orlando, Florida, April 2000.

This paper describes a hierarchical image registration algorithm to infer the perspective transformation that best matches a pair of images. This work estimates the perspective parameters by approximating the transformation to be piecewise affine. We demonstrate the process by subdividing a reference image into tiles and applying affine registration to match them in the target image. The affine parameters are computed iteratively in a coarse-to-fine hierarchical framework using a variation of the Levenberg-Marquadt nonlinear least squares optimization method. This approach yields a robust solution that precisely registers image tiles with subpixel accuracy. The corresponding image tiles are used to estimate a global perspective transformation. We demonstrate this approach on pairs of digital images subjected to large perspective deformation.
[Download spie00.ps.gz (1595678 bytes)]

Monotonic Cubic Spline Interpolation

Proc. Computer Graphics Intl. '99, Canmore, Canada, June 1999.

The goal of this work is to determine the smoothest possible curve that passes through its control points while simultaneously satisfying the monotonicity constraint. We first describe a set of conditions that form the basis of the monotonic cubic spline interpolation algorithm presented in this paper. The conditions are simplified and consolidated to yield a fast method for determining monotonicity. This result is applied within an energy minimization framework to yield linear and nonlinear optimization-based methods. We consider various energy measures for the optimization objective functions. Comparisons among the different techniques are given, and superior monotonic cubic spline interpolation results are presented.
[Download cgi99.ps (282295 bytes)]

Image Morphing: A Survey

Visual Computer, vol. 14, pp. 360-372, 1998.

Image morphing has been the subject of much attention in recent years. It has proven to be a powerful visual effects tool in film and television, depicting the fluid transformation of one digital image into another. This paper surveys the growth of this field and describes recent advances in image morphing in terms of three areas: feature specification, warp generation methods, and transition control. These areas relate to the ease of use and quality of results. We describe the role of radial basis functions, thin plate splines, energy minimization, and multilevel free-form deformations in advancing the state-of-the-art in image morphing. Recent work on a generalized framework for morphing among multiple images is described.
[Download vc98.pdf (640853 bytes)]

Polymorph: Morphing Among Multiple Images

IEEE Computer Graphics and Applications, vol. 18, no. 1, pp. 58-71, January-February 1998.

This paper presents polymorph, a novel algorithm for morphing among multiple images. Traditional image morphing generates a sequence of images depicting an evolution from one image into another. We extend this approach to permit morphed images to be derived from more than two images at once.

We formulate each input image to be a vertex of a simplex. An inbetween, or morphed, image is considered to be a point in the simplex. It is generated by computing a linear combination of the input images, with the weights derived from the barycentric coordinates of the point. To reduce run-time computation and memory overhead, we define a central image and use it as an intermediate node between the input images and the inbetween image. Preprocessing is introduced to resolve conflicting positions of selected features in input images when they are blended to generate a nonuniform inbetween image. We present warp propagation to efficiently derive warp functions among input images. Blending functions are effectively obtained by constructing surfaces that interpolate user-specified blending rates.

The polymorph algorithm furnishes a powerful tool for image composition which effectively integrates geometric manipulations and color blending. The algorithm is demonstrated with examples that seamlessly blend and manipulate facial features derived from various input images.
[Download cga98.pdf (1241464 bytes)]

Nonuniform Image Reconstruction Using Multilevel Surface Interpolation

Proc. IEEE Intl. Conf. on Image Processing, Santa Barbara, California, October 1997.

This paper describes a fast algorithm for nonuniform image reconstruction. A multiresolution approach is formulated to compute a $C^2$-continuous surface through a set of irregularly spaced samples. The algorithm makes use of a coarse-to-fine hierarchy of control lattices to generate a sequence of surfaces whose sum approaches the desired interpolating surface. Experimental results demonstrate that high fidelity reconstruction is possible from a selected set of sparse and irregular samples.
[Download icip97.ps (873343 bytes)]

Scattered Data Interpolation Using Multilevel B-splines

IEEE Trans. Visualization and Computer Graphics, vol. 3, no. 3, pp. 228-244, July-September 1997.

This paper describes a fast algorithm for scattered data interpolation and approximation. Multilevel B-splines are introduced to compute a $C^2$-continuous surface through a set of irregularly spaced points. The algorithm makes use of a coarse-to-fine hierarchy of control lattices to generate a sequence of bicubic B-spline functions whose sum approaches the desired interpolation function. Large performance gains are realized by using B-spline refinement to reduce the sum of these functions into one equivalent B-spline function. Experimental results demonstrate that high fidelity reconstruction is possible from a selected set of sparse and irregular samples.
[Download tvcg97.pdf (1101623 bytes)]

Image Metamorphosis With Scattered Feature Constraints

IEEE Trans. Visualization and Computer Graphics, vol. 2, no. 4, pp. 337-354, December 1996.

This paper describes an image metamorphosis technique to handle scattered feature constraints specified with points, polylines, and splines. Solutions to the following three problems are presented: feature specification, warp generation, and transition control. We demonstrate the use of snakes to reduce the burden of feature specification. Next, we propose the use of multilevel free-form deformations (MFFD) to compute $C^2$-continuous and one-to-one mapping functions among the specified features. The resulting technique, based on B-spline approximation, is simpler and faster than previous warp generation methods. Furthermore, it produces smooth image transformations without undesirable ripples and foldovers. Finally, we simplify the MFFD algorithm to derive transition functions to control geometry and color blending. Implementation details are furnished and comparisons among various metamorphosis techniques are presented.
[Download tvcg96.pdf (2884596 bytes)]

Recent Advances in Image Morphing

Proc. Computer Graphics Intl. '96, Pohang, Korea, June 1996.

Image morphing has been the subject of much attention in recent years. It has proven to be a powerful visual effects tool in film and television, depicting the fluid transformation of one digital image into another. This paper reviews the growth of this field and describes recent advances in image morphing in terms of three areas: feature specification, warp generation methods, and transition control. These areas relate to the ease of use and quality of results. We will describe the role of radial basis functions, thin plate splines, energy minimization, and multilevel free-form deformations in advancing the state-of-the-art in image morphing. Recent work on a generalized framework for morphing among multiple images will be described.
[Download cgi96.pdf (426407 bytes)]

Restoration of Images Scanned in the Presence of Vibrations

Journal of Electronic Imaging, vol. 5, no. 1, pp. 50-65, January 1996.

Images scanned in the presence of mechanical vibration are subject to artifacts such as brightness fluctuation and geometric warping. The goal of the present study is to characterize these distortions and develop a restoration algorithm to invert them, hence producing an output digital image consistent with a scanner operating under ideal uniform motion conditions. The image restoration algorithm described in this paper makes use of the instantaneous velocity of the linear sensor array to reconstruct an underlying piecewise constant or piecewise linear model of the image irradiance profile. That reconstructed image is then suitable for resampling under ideal scanning conditions to produce the restored output digital image. We demonstrate the algorithm on simulated scanned imagery with typical operating parameters.
[Download jei96.ps (222812 bytes)]
[Download jei96_imgs.tar (1593344 bytes)]

Inverting Input Scanner Vibration Errors /
Restoration of Images Degraded By Input Scanner Vibrations

Proc. IEEE Intl. Conf. on Image Processing, Washington, D.C., Oct. 1995.
SPIE Conf. on Document Processing II, Proc. SPIE 2422, pp. 358-369, San Jose, CA, Feb. 1995.

Images scanned in the presence of mechanical vibrations are subject to artifacts such as brightness fluctuation and geometric warping. The goal of this work is to develop an algorithm to invert these distortions and produce an output digital image consistent with a scanner operating under ideal uniform motion conditions. The image restoration algorithm described in this paper applies to typical office scanners that employ a moving linear sensor array (LSA) or moving optics. The velocity of the components is generally not constant in time. Dynamic errors are introduced by gears, timing belts, motors, and structural vibrations.

In this work, we make use of the instantaneous LSA velocity to reconstruct an underlying piecewise constant or piecewise linear model of the image irradiance function. The control points for the underlying model are obtained by solving a system of equations derived to relate the observed area samples with the instantaneous LSA velocity and a spatially-varying sampling kernel. An efficient solution exists for the narrow band diagonal matrix that results. The control points computed with this method fully define the underlying irradiance function. That function is then suitable for resampling under ideal scanning conditions to produce a restored image.
[Download icip95.ps (67310 bytes)]

Image Metamorphosis Using Snakes and Free-Form Deformations

Proc. Siggraph '95, pp. 439-448, Los Angeles, CA, August 1995.

This paper presents new solutions to the following three problems in image morphing: feature specification, warp generation, and transition control. To reduce the burden of feature specification, we first adopt a computer vision technique called snakes. We next propose the use of multilevel free-form deformations (MFFD) to achieve $C^2$-continuous and one-to-one warps among feature point pairs. The resulting technique, based on B-spline approximation, is simpler and faster than previous warp generation methods. Finally, we simplify the MFFD method to construct $C^2$-continuous surfaces for deriving transition functions to control geometry and color blending.
[Download sig95.ps.gz (294427 bytes)]

Characterization of Vibration-Induced Image Defects in Input Scanners

SPIE Conf. on Document Processing II, Proc. SPIE 2422, pp. 350-357, San Jose, CA, Feb. 1995.

Typical office scanners employ a moving linear-sensor array or moving optics. the velocity of the components is generally not constant in time. It may be modulated directly (at one or more frequencies) by dynamic errors of gears, timing bets, and motors, and indirectly by structural vibrations induced by gears, fans, etc. Nonuniform velocity is known to cause undesirable brightness fluctuation and warping in the sampled image. The present paper characterizes the image defects induced by nonuniform velocity. A companion paper utilizes the degradation information to develop an algorithm to restore the degraded image.
[Download spie95a.ps (78906 bytes)]

Fast Convolution With Packed Lookup Tables

Graphics Gems IV, Ed. by P. Heckbert, Academic Press, 1994.

Convolution plays a central role in many image processing applications, including image resizing, blurring, and sharpening. In all such cases, each output sample is computed to be a weighted sum of several input pixels. This is a computationally expensive operation that is subject to optimization. In this gem, we describe a novel algorithm to accelerate convolution for those applications that require the same set of filter kernel values to be applied throughout the image. The algorithm exploits some nice properties of the convolution summation for this special, but common, case to minimize the number of pixel fetches and multiply/add operations. Computational savings are realized by precomputing and packing all necessary products into lookup table fields that are then subjected to simple integer (fixed-point) shift/add operations.
[Download ggIV94.pdf (141590 bytes)]

A Two-Pass Mesh Warping Implementation of Morphing

Dr. Dobb's Journal, no. 202, July 1993.

The files in this directory contain the code necessary to implement a morph (metamorphosis) sequence. The process is based on a mesh warping algorithm first introduced for a special effect sequence in the movie "Willow" in 1988 [Smythe 90]. The algorithm, described in [Smythe 90] and [Wolberg 90], has since been used in several films and commercials. The self-contained code given here is adapted from a program listing in [Wolberg 90].

The mesh warping algorithm is used to deform one image into another. The input includes a source image I1 and two meshes, M1 and M2. Mesh M1 is used to select landmark positions in I1, and M2 identifies their corresponding positions in the output image. In this manner, arbitrary points in I1 can be "pulled" to new positions. Although the use of a (parametric) mesh might seem to place unnecessary constraints on the positions of these points, a large class of useful transformations is possible. It is important, though, that the mesh not self-interesect in order to avoid the image from folding upon itself. The benefit of using a mesh derives from the simplicity in interpolating the new positions of intermediate points (between the mesh points). A bilinear or bicubic function can be used. We use a Catmull-Rom cubic spline to implement bicubic interpolation here.

There are two executables that the user can compile: warp and morph. They are created by typing "make warp" and "make morph", respectively. In "warp", I1 is simply deformed based on the correspondence points given in meshes M1 and M2. In "morph", a second image I2 is used to designate the target image. Not only is I1 deformed, but it simultaneously undergoes a cross-dissolve with a warped version of I2 to create the illusion of a metamorphosis. The user must specify the number of frames to generate in this transformation. The basic idea is that each frame in the transformation uses an interpolated mesh M3 as the set of target positions for the input mesh points. M3 is computed by performing linear interpolation between respective points in M1 and M2. The "warp" program actually plays an important role here since both I1 and I2 are each warped using M3 as the target mesh. Thus, I1 is warped using meshes M1 and M3. In addition, I2 is warped using meshes M2 and M3. Now that the landmarks of the source and target images are aligned, they are cross-dissolved to generate a morph frame.

 FILES: 
Makefile:	dependency rules for creating "warp" and "morph"
meshwarp.h	header file
warp.c:		main function for "warp"
morph.c:	main function for "morph"
meshwarp.c:	workhorse mesh warping code
util.c:		image I/O and memory allocation functions
catmullrom.c:	Catmull-Rom cubic spline interpolation.
face.bw:	source image
cat.bw:		target image
face.XY:	source mesh
cat.XY:		target mesh

RUNNING THE PROGRAMS:
WARP:
After you type "make warp", an executable file called "warp" will be created. You can invoke it by typing: warp face.bw face.XY cat.XY out.bw

You may notice that the output has a distorted grid-like pattern on it. This is not an artifact of the algorithm, but rather it is due to the grid pattern that appears in the input after scanning it from a magazine.

MORPH:
After you type "make morph", an executable file called "morph" will be created. You can invoke it by typing: morph face.bw cat.bw face.XY cat.XY 10 out

This will create a 10-frame animation stored in files out_000.bw, out_001.bw, out_002.bw, ... out_009.bw

COMMENTS:
This code works on grayscale images only. Extending the program to handle 3 RGB color channels is straightforward.

The code is missing a program to help the user create and edit meshes. A good mesh editor is a critical component to any mesh warping program. That code is not given here because it falls outside of the scope of this presentation. Instead, sample meshes face.XY and cat.XY are provided.

The reader should be aware that such an interface should allow the user to control the cross-dissolve schedule at each mesh point, as well as its position. This permits the intensities in different regions of the image to interpolate at different rates.

REFERENCES:

[Smythe 90]: Smythe, Douglas B., "A Two-Pass Mesh Warping Algorithm for Object Transformation and Image Interpolation," ILM Technical Memo #1030, Computer Graphics Department, Lucasfilm Ltd., 1990.
[Wolberg 90]: Wolberg, George, Digital Image Warping , IEEE Computer Society Press, Los Alamitos, CA, 1990.

[Download dobbs93.tar.gz (105951 bytes)]

A Fast Algorithm for Digital Image Scaling

Proc. Computer Graphics Intl. '93, Lausanne, Switzerland, June, 1993.

This paper describes a fast algorithm for scaling digital images. Large performance gains are realized by reducing the number of convolution operations, and optimizing the evaluation of those that remain. We achieve this by decomposing the overall scale transformation into a cascade of smaller scale operations. As an image is progressively scaled towards the desired resolution, a multi-stage filter with kernels of varying size is applied. We show that this results in a significant reduction in the number of convolution operations. Furthermore, by constraining the manner in which the transformation is decomposed, we are able to derive optimal kernels and implement efficient convolvers. The convolvers are optimized in the sense that they require no multiplication; only lookup table and addition operations are necessary. This accelerates convolution and greatly extends the range of filters that may be feasibly applied for image scaling. The algorithm readily lends itself to efficient software and hardware implementation.
[Download cgi93.ps (406608 bytes)]

Local Image Reconstruction and Subpixel Restoration Algorithms

CVGIP: Graphical Models and Image Processing, vol. 55, no. 1, pp. 63-77, January 1993.

This paper introduces a new class of reconstruction algorithms that are fundamentally different from traditional approaches. We deviate from the standard practice that treats images as point samples. In this work, image values are treated as area samples generated by nonoverlapping integrators. This is consistent with the image formation process, particularly for CCD and CID cameras. We show that superior results are obtained by formulating reconstruction as a two-stage process: image restoration followed by application of the point spread function (PSF) of the imaging sensor. By coupling the PSF to the reconstruction process, we satisfy a more intuitive fidelity measure of accuracy that is based on the physical limitations of the sensor. Efficient local techniques for image restoration are derived to invert the effects of the PSF and estimate the underlying image that passed through the sensor.

The reconstruction algorithms derived herein are local methods that compare favorably to cubic convolution, a well-known local technique, and they even rival global algorithms such as interpolating cubic splines. Evaluations are made by comparing their passband and stopband performances in the frequency domain, as well as by direct inspection of the resulting images in the spatial domain. A secondary advantage of the algorithms derived with this approach is that they satisfy an imaging-consistency property. This means that they exactly reconstruct the image for some function in the given class of functions. Their error can be shown to be at most twice that of the "optimal" algorithm for a wide range of optimality constraints.
[Download cvgip93.ps (139657 bytes; without figures)]

Correcting Chromatic Aberration Using Image Warping

Proc. IEEE Conf. on Computer Vision and Pattern Recognition, June 1992.

Chromatic aberration is due to refraction affecting each color channel differently. This paper addresses the use of image warping to reduce the impact of these aberrations in vision applications. The warp is determined using edge displacements which are fit with cubic splines. A new image reconstruction algorithm is used for nonlinear resampling. The main contribution of this work is to analyze the quality of the warping approach by comparing it with active lens control. Two different imaging systems are tested.

Separable Image Warping With Spatial Lookup Tables

Computer Graphics (Proc. Siggraph '89), vol. 23, no. 3, pp. 369-378, Boston, MA, July 1989.

Image warping refers to the 2-D resampling of a source image onto a target image. In the general case, this requires only 2-D filtering operations. Simplifications are possible when the warp can be expressed as a cascade of orthogonal 1-D transformations. In these cases, separable transformations have been introduced to realize large performance gains. The central ideas in this area were formulated in the 2-pass algorithm by Catmull and Smith. Although that method applies over an important class of transformations, there are intrinsic problems which limit its usefulness.

The goal of this work is to extend the 2-pass approach to handle arbitrary spatial mapping functions. We address the difficulties intrinisic to 2-pass scanline algorithms: bottlenecking, foldovers, and the lack of closed-form inverse solutions. These problems are shown to be resolved in a general, efficient, separable technique, with graceful degradation for transformations of increasing complexity.

Skeleton-Based Image Warping /
Image Warping Among Arbitrary Planar Shapes

Proc. Computer Graphics Intl. '88, Geneva, Switzerland, June, 1988.

Image warping refers to the 2D resampling of a source image onto a target image. Despite the variety of techniques proposed, a large class of image warping problems remains inadequately solved: mapping between two images which are delimited by arbitrary, closed, planar curves, e.g., hand-drawn curves. This paper describes a novel algorithm to perform image warping among arbitrary planar shapes whose boundary correspondences are known. A generalized polar coordinate parameterization is introduced to facilitate an efficient mapping procedure. Images are treated as collections of interior layers, extracted via a thinning process. Mapping these layers between the source and target images generates the 2D resampling grid that defines the warping. The thinning operation extends the standard polar coordinate representation to deal with arbitrary shapes.

A Syntactic Omni-font Character Recognition System

Intl. Journal of Pattern Recognition and Artificial Intelligence , vol. 1, no. 3 & 4, December 1987, pp. 303-322.
Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 168-173, June 1986.

This paper introduces a syntactic omni-font character recognition system. The "omni-font" attribute reflects the wide range of fonts that fall within the class of characters that can be recognized. This includes hand-printed characters as well.

A structural pattern matching approach is employed. Essentially, a set of loosely constrained rules specify pattern components and their interrelationships. The robustness of the system is derived from the orthogonal set of pattern descriptors, location functions, and the manner in which they are combined to exploit the topological structure of characters.

By virtue of the new pattern description language, PDL, developed in this paper, the user may easily write rules to define new patterns for the system to recognize. The system also features scale-invariance and user-definable sensitivity to tilt orientation.

An Algorithm for the Segmentation of Bilevel Images

Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 570-575, Miami, FL, June 1986.

When bilevel images (such as printed text) are digitized, the result is a gray scale image due to the averaging function of the scanner. This paper presents a method for recovering the binary image. It is based on the observation that the convolution of a step function h(t) with a bell shaped function (such as the point spread function of digitizers) is convex where h(t) is high and concave where h(t) is low. A first pass assigns the value "black" or "white" to pixels that are in regions where the intensity image is clearly concave or convex. Subsequent iterations move the remaining pixels to these two values according to their neighbors. Examples of implementation are shown and a memory management scheme is described that makes the algorithm feasible even when the size of the image exceeds the available memory.

Restoration of Degraded Binary Images Using Stochastic Relaxation with Annealing

Pattern Recognition Letters, vol. 3, no. 6, pp. 375-388, December 1985.

This paper investigates the application of variations of Stochastic Relaxation with Annealing (SRA) as proposed by Geman and Geman [1] to the Bayesian restoration of binary images corrupted by white noise. After a general review we present some prior models and show examples of the application. It appears that a proper selection of the prior model is critical for the success of the method. We obtained better results on artificial images which fitted the mode closely than on real images for which there was no precise model.