Remote Hearing: a Novel Use of the LDV Sensor

Back to Zhigang's Homepage | Computer Science | School of Engineering | CCNY

Laser Doppler vibrometer (LDV) is a non-contact, remote and high resolution voice detector. Vibration of the objects caused by voice reflects the voice itself. After the enhancement with a Gaussian bandpass filtering, Wiener filtering  and an adaptive volume scaling, the LDV voice signals were mostly intelligible from targets without retro-reflective finishes at short or medium distances (< 100m). By using retro-reflective tapes, the distance could be as far as 300 meters.

Here  a few audio clips (in mp3 or wav formats) captured by the LDV,   before and after processing.

Experiment 1.

The waveform of the original signal and the results of fixed scaling and adaptive scaling, after using suitable filtering. The short audio clip reads “I am whispering…(noise)… OK … Hello (noise)”, which was captured by the LDV OFV-505 from a metal cake-box carried by a person at a distance of about 30 meters from the LDV. The surface of the target was treated by a piece of retro-tape.

LDV Basics

(a) Original LDV signal
(b) x1 after band-pass filtering
(c) x8 after band-pass filtering
(d) adaptive scaling after band-pass filtering

Experiment 2.  Long range LDV listening experiment. A metal cake box (left) is used, with a piece of 3M traffic retro-tape pasted. The laser spot can be clearly seen. The signal return of the LDV is insensitive to the incident angles of the laser beam, thanks to the retro-tape finish.  Both normal speech volumes and whispers have been successfully detected.  The size of the laser spot changed from less than 1 mm to about 5-10 mm when the range changed from 30 to 300 meters. The noise levels also increased from 2 mV to 10 mV out of the total range of 20 V analogous LDV signals. The 260-meter measurement was obtained when the target was behind trees and bushes. With longer ranges, the laser is more difficult to localize and focus, and the signal return becomes weaker.  Therefore, the noise levels become larger. Within 120 meters, the LDV voice is obviously intelligible; at 260-meter distance, many parts of the speech could be identified, even with some difficulty. For all the distances, the signal processing plays a significant role in making the speech intelligible. Without processing, the audio signal is buried in the low-frequency large-amplitude vibration and high-frequency speckle noises.

Long Range Hearing Experiment


Table 1. Long range LDV listening via retro-tape on a cake box and Gaussian band-pass filtering


30 m

120 m

260 m

Original audio




Processed audio








Experiment 3.
LDV voice enhancement comparison (please click the spectrograms to hear the corresponding audio clips). The LDV audio signal was captured 100 feet away by aiming the laser beam at a metal cake box (without retro-reflective finish), and the clean signal was captured using the wireless microphone connected to a laptop placed next to the target (i.e., the metal box).

(a)Original LDV signal(b)Gaussian Only

(c)Wiener Only(d)Wiener + Gaussian

(e)Wiener + Hann(f) Clean signal

The spectrogram of (a) original LDV signal (b) Gaussian bandpass filtered signal (c) Wiener filtered signal (d) Wiener filtered + Gaussian bandpass filtered signal (e) Wiener filtered + Hann bandpass filtered signal (f) clean signal. All correspond to the speech of  "Hello, Hello".

Experiment 4.  Comparison of the SNR values of LDV audio signals enhanced by various methods, namely Gaussian bandpass only, Wiener filter only, and the combined approach. Two possible combination strategies, i.e., bandpass filter followed by Wiener filter (BW) and Wiener filter followed by bandpass (WB), are conducted and the results are shown in the last column of  Table 2. Three different types of reflecting surfaces are tested: the small empty mental cake box (with retro-tape), the mental box surface itself (without retro-tape), and the wood hose box surface. They are all 100 feet away from the sensor head.

Table 2: The segmental SNRs (dB) - click on numbers to hear the audio clips





BW / WB *

Box with tape




92.1 / 83.4

Box without tape




85.7 / 85.2

Hose box w/o tape




80.6 / 78.4

*BW: Gaussian Bandpass followed by Wiener filter;   WB: Wiener filter followed by Gaussian Bandpass