Image filtering with an approximate Gabor wavelet or Morlet wavelet using a cascade of sparse convolution kernels

Kragen Javier Sitaker, 2019-08-31 (updated 2019-09-08) (28 minutes)

On the bus, thinking about vision, there occurred to me a simple way to convolve an image with a fairly precisely approximated (real) Gabor filter or Morlet wavelet, in fixed point, with no multipliers, and a cascade of a small number of sparse FIR and IIR filters. This is probably known, but not to me.

Gabor filters

The general form of a Gabor wavelet in two dimensions has five parameters: an angle θ, a wavelength λ, a phase ψ, a radius σ, and an aspect ratio γ.

WIKIPEDIA SAY

g(x, y; λ, θ, ψ, σ, γ) = exp(−(x2 + γ2y2)/(2σ2)) exp(i(2πx′/λ + ψ))

where

x′ = x cos θ + y sin θ
y′ = -x sin θ + y cos θ

Here the first exp gives you the Gaussian envelope and the second exp gives you the oscillation. As you might or might not guess, its Fourier transform is also a Gabor wavelet.

You convolve this thing with an image and it detects edges at its given angle at or near its given frequency, in the area selected by its Gaussian window, which is nearly zero when you reach displacements of several times σ, or σ/γ2 in the y′ direction.

Sparse approximations

As mentioned in Sparse filters, I’m looking for ways to get good approximations of convolution filters using networks of small, sparse kernels, to reduce the total amount of computation. In particular I’m interested in Hogenauer filters, which is discussed in Some speculative thoughts on DSP algorithms, Recurrent comb cascade, and Cheap frequency detection, The Bleep ultrasonic modem for local data communication, and similar things. In Using the Goertzel algorithm, the Minsky algorithm, PLLs, and prefix sums for frequency detection I describe how you can use precise resonators that are not only non-FIR but non-BIBO to very efficiently compute precise finite infinite response filters, and I’ll use that here.

I think a sparse Gabor-filter approximation is particularly interesting because of the filter’s importance and because of its relatively great computational cost. (But maybe someone already knows this algorithm.)

Overview

The filter is a composition or pipeline of five major stages, each composed of a cascade of small, sparse linear filters. Because of the convenient properties of convolution, the order of all of the individual pieces can be reordered as desired, and in particular if you’re computing several different Gabor filters over the same image with some parameters in common, it may be desirable to move stages that are in common between them to the beginning of the processing pipeline.

The stages are a resonating feedback comb to create the oscillations, an oscillation window to confine them to a Gaussian along the direction of oscillation, a low-pass filter along the direction of oscillation to remove frequencies higher than the desired frequency, an antialiasing filter to deal with pixels a fraction of a sample off the oscillation axis, and a transverse window blur to spread the impulse response along a Gaussian at right angles to the direction of resonation.

In what follows, I will suppose that the angle θ can be adequately approximated as a ratio of small integers, as examples of which I will use 3 and 4: we are looking for waves whose phase varies not at all in the direction (x+3, y-4), and varies fastest in the direction (x+4, y+3). Below I will refer to this (4, 3) displacement as the “stride”. And I will suppose that the wavelength λ we’re looking for can be adequately approximated as an integer multiple of twice that displacement; that is, at some integer n, (x+8n, y+6n) has a phase precisely 2π advanced from the phase at (x, y). There are some tricks to handle waves of other wavelengths, but they are not as well developed.

The resonating feedback comb and oscillation window

To window the resonance over a given distance, we first use a feedback comb filter whose impulse response is an infinite non-decaying oscillation:

y[i, j] = x[i, j] - y[i-4n, j-3n]

That is, the pixel (i, j) of the output y is the corresponding pixel x[i, j] of the input, minus a previous output pixel positioned exactly half an oscillation away along the line of oscillation, at (i-4n, j-3n). You can easily verify that the impulse response of this filter is an alternating sequence of positive and negative impulses leading away from the impulse in the desired direction of oscillation.

To tame this wildly unstable filter, we simply use a window over some finite number m of oscillations, a feedforward comb filter; here our x input is the y output of the previous filter:

y[i, j] = x[i, j] - x[i - 8nm, j - 6nm]

When using exact arithmetic, the composition of these two filters has a finite impulse response; in its impulse response, the alternating positive and negative impulses generated by the first filter continue on for m oscillations, then are canceled completely and precisely by the subtraction.

This alternating impulse train contains our desired spatial frequency as well as all of its odd harmonics, and the second filter just brutally windows it with a rectangular window after some integer number of oscillations. The first, feedback, comb filter plays the role of the integrator in a Hogenauer downsampling CIC filter, while the second, feedforward, comb plays the role of the comb in the Hogenauer filter.

Unlike the integrator in a Hogenauer filter, the response of the feedback comb at DC is identically 0, and its amplification factor for frequencies that don’t precisely align with its resonant frequency is finite; it only fails to act BIBO if the input data actually does contain nonzero energy at precisely the resonant frequency (necessarily over infinite space, at least in one direction). So you might be able to get reasonable results using floating point, but there are no guarantees there.

That window doesn’t look very Gaussian yet, so to fix that, we repeat the process two or three more times, for a total of six to eight atomic kernels each consisting of a single subtraction. The window will have the most Gaussian shape if m remains the same in all of the stages, but this trades off against undesirable quantization in the available window sizes.

This is one of the places where you have a little flexibility to go beyond the limitations of the basic method; you can use resonators in slightly different directions and frequencies to get an intermediate overall direction and frequency of oscillation. But the windowing functions must be matched to those precise strides to prevent the overall system impulse response from becoming infinite.

The oscillation-axis low-pass filter

The above is not yet satisfactory for two reasons. First, we still have the odd harmonics to deal with — the third, fifth, seventh, and so on — which are exactly the same amplitude as the fundamental, since it’s an impulse train. Second, aside from those harmonics, looked at precisely along the oscillation axis, the two stages described above give us a very nice Gabor wavelet, but at any other angle it looks like an impulse — all frequencies pass unchanged.

For these two purposes, we need to do some low-pass filtering, some of which is transverse to the oscillation axis and is what gives us the roundness or ellipticity of our two-dimensional Gaussian window, and some of which is parallel to it. And in particular, if it’s possible, we may want to use another feedforward comb filter to stab the third harmonic in the heart precisely, because it’s going to be the most troublesome harmonic to handle with general-purpose low-pass filtering, since it’s only 1.58 octaves above the fundamental, and a filter with its first precise zero at that harmonic will suppress it completely, as well as most of its spectral leakage. The fifth harmonic is 2.3 octaves above the fundamental, so a generic low-pass filter can separate it from the fundamental pretty easily.

Our fundamental has a period (8n, 6n), so our third harmonic has a period (⅓8n, ⅓6n). We can either subtract pixels at this displacement or add pixels at half this displacement (⅓4n, ⅓3n) to completely suppress the third harmonic. If these offsets are not precise, for example because 4n isn’t divisible by 3, the suppression won’t be precise either, and a lot more of the third harmonic will survive to be dealt with by the other more generic low-pass filtering. (But we’ll get a little bit of bonus transverse low-pass filtering.)

So the simplest form of our heart-stabbing filter would look like this:

y[i, j] = x[i, j] + x[i - ⅓4n, j - ⅓3n]

That taken care of, we can proceed to the oscillation-axis low-pass filter, which is mostly important if n is larger than 2 or 3. Suppose n = 5; now a period of the full oscillation is (40, 30). We can use an orthodox Hogenauer filter along the axis of oscillation to suppress harmonics higher than the third; first an integrator (calculating a prefix sum along each strided diagonal, also known as a sum table, scan, or integral image):

y[i, j] = x[i, j] + y[i - 4, j - 3]

and then a feedforward comb of, for example, two strides:

y[i, j] = x[i, j] - x[i - 8, j - 6]

This amounts to a rectangular window, whose frequency response is a sinc; its first null is where a full oscillation fits precisely into the window, which in this case would be a period of (12, 9). It has a 6 dB per octave frequency rolloff.

We probably want a better rolloff than that; if we repeat it two more times, we get 18 dB per octave, which attenuates the fifth harmonic by almost 42 dB, and higher harmonics by more.

The usual CIC-filter concerns about passband flatness don’t apply here, since we are only trying to select a single frequency and frequencies very close to it.

At higher spatial frequencies, as long as the stride is a pair of integer number of pixels, the higher harmonics disappear because they alias harmlessly back down into the lower harmonics.

So the oscillation-axis low pass filters end up being seven more filter stages each consisting of a single addition or subtraction. But, as we see in the next section, we will reduce this to five.

The antialiasing filter

So far, pixels have only ever been combined with other pixels at a multiple of the basic stride (4, 3) from them. This means that every pixel in the first three lines of an image is in a separate, noninteracting signal, so far; even very high-frequency components will survive and may be aliased down. We’d like to sort of “fill in” the other pixels, at least along the axis of oscillation, rather than skipping over them completely as if they belonged to an entirely different image.

There are a variety of different ways this can be achieved. For example, we could use a couple of simple feedforward combs to get reasonably good fill-in without widening the line of the OTF much:

y[i, j] = x[i, j] + x[i-1, j-1]
y[i, j] = x[i, j] + x[i-2, j-1]

But let’s not do that; we can perhaps get a bit more mileage out of the antialiasing filter in the case where the oscillation wavelength and both dimensions of the Gaussian window and are much larger than this; we could use, for example, a simple Gaussian blur, which also takes some of the load off the low-pass filter along the oscillation axis. Again, this can be done as an orthodox Hogenauer or CIC or box filter, but this time on the usual pixel rows and columns:

y[i, j] = x[i, j] + y[i-1, j]
y[i, j] = x[i, j] - x[i-8, j]
y[i, j] = x[i, j] + y[i, j-1]
y[i, j] = x[i, j] - x[i, j-6]

The cascade of those four filters has an impulse response of an 8x6 constant-1 rectangle, and in particular it has the same low-pass effect along the axis of oscillation as the -x[i - 8, j - 6] filter proposed in the section above, as well as providing antialiasing fill-in along the axis of oscillation and some amount of transverse window. If you were to iterate this filter two more times you would have a second-order approximation to a squished Gaussian, but let’s not — let’s just run it once more and be satisfied, and reduce the high-pass filtering from the previous section by one filtering stage.

So this stage is a cascade of eight tiny kernels, each consisting of a single addition or subtraction.

Since this, in effect, reduces the resolution of the image, it might be wise to do it early on in the pipeline and then decimate the image so that later stages can run much faster, operating on a reduced-resolution image.

The transverse window blur

So at this point our impulse response is a fairly precise sinusoidal oscillation along the correct axis, with a fairly precise Gaussian envelope along that axis, and some sort of relatively crude smooth falloff about 10 pixels to either side of that axis. Now we want to widen out that transverse axis into a Gaussian envelope, either round or elliptical; the existing falloff may help us a bit, but we still need to widen it out considerably.

We can do this with, again, a strided CIC filter, consisting of a cascade of an integrator and a feedforward comb, but this time along the transverse axis, using a stride rotated 90 degrees:

y[i, j] = x[i, j] + y[i - 3, j + 4]
y[i, j] = x[i, j] - x[i - 3p, j + 4p]

Here p gives the number of strides in the width of (one stage of) our window, as nm gave the dimension of the Gaussian window in the perpendicular direction.

Because of the existing falloff, we may be able to get away with one more stage of this CIC filter at this point, but we’ll probably need two more.

The direction of blurring can be chosen as either (-3, 4) or (3, -4); I chose the first here in order to use previous scan lines rather than previous columns, in the interest of making pipelining possible (see below.)

So implementing the transverse window requires another six stages, each consisting of an addition or subtraction.

Summary

The overall Gabor filter, then, requires a cascade of around 8 + 5 + 8 + 6 = 26 stages, each performing a single integer addition or subtraction, followed by some final scaling by a constant. This is quite small compared to the millions of pixels in the support of the approximate Gabor wavelet that is the system’s finite impulse response, although downsampling the image on its way into the pipeline would reduce this disparity somewhat.

Python smoke test of the algorithm

I did try it in IPython tonight (notebook viewer), and got a pretty round-looking kernel with n = 5, m = 3, p = 30, and the (4, 3) stride suggested above. In plots it looks just fine, but of course that’s not strong evidence. I haven’t calculated its error (that would require figuring out the diameters of the Gaussians), but I estimate that in essentially this form it should be able to deliver worst-case errors of less than 1% (-40 dB) and average-case errors that are smaller still.

The code in the notebook boils down to something like this very crude code, with all but the last plot stripped out:

from numpy import zeros, dtype
from matplotlib import imshow, colorbar

n = 5
m = 3
p = 30

impulse = zeros((700, 700), dtype=dtype('float16'))
impulse[300, 100] = 1.0
ir1 = impulse.copy()
for i in range(4*n, len(ir1)):
    ir1[i, 3*n:] -= ir1[i-4*n, :-3*n]
fr1 = ir1.copy()
for i in range(8*n*m, len(fr1)):
    fr1[i, 6*n*m:] -= ir1[i-8*n*m, :-6*n*m]
ir2 = fr1.copy()
for i in range(4*n, len(ir2)):
    ir2[i, 3*n:] -= ir2[i-4*n, :-3*n]
fr2 = ir2.copy()
for i in range(8*n*m, len(fr2)):
    fr2[i, 6*n*m:] -= ir2[i-8*n*m, :-6*n*m]
ir3 = fr2.copy()
for i in range(4*n, len(ir3)):
    ir3[i, 3*n:] -= ir3[i-4*n, :-3*n]
fr3 = ir3.copy()
for i in range(8*n*m, len(fr3)):
    fr3[i, 6*n*m:] -= ir3[i-8*n*m, :-6*n*m]
hs = fr3.copy()
for i in range(int(4*n/3), len(hs)):
    hs[i, 3*n/3:] += fr3[i-int(4*n/3), :-3*n/3]
ai1 = hs.copy()
for i in range(4, len(ai1)):
    ai1[i, 3:] += ai1[i-4, :-3]
ff1 = ai1.copy()
ff1[8:, 6:] -= ai1[:-8, :-6]
ai2 = ff1.copy()
for i in range(4, len(ai2)):
    ai2[i, 3:] += ai2[i-4, :-3]
ff2 = ai2.copy()
ff2[8:, 6:] -= ai2[:-8, :-6]
bf1i = ff2.copy()
for j in range(1, len(bf1i[0])):
    bf1i[:, j] += bf1i[:, j-1]
bf1c = bf1i.copy()
bf1c[:, 8:] -= bf1i[:, :-8]
bf2i = bf1c.copy()
for j in range(1, len(bf2i[0])):
    bf2i[:, j] += bf2i[:, j-1]
bf2c = bf2i.copy()
bf2c[:, 8:] -= bf2i[:, :-8]
bf3i = bf2c.copy()
for i in range(1, len(bf3i)):
    bf3i[i] += bf3i[i-1]
bf3c = bf3i.copy()
bf3c[6:] -= bf3i[:-6]
bf4i = bf3c.copy()
for i in range(1, len(bf4i)):
    bf4i[i] += bf4i[i-1]
bf4c = bf4i.copy()
bf4c[6:] -= bf4i[:-6]
tvi1 = bf4c.copy()
for j in range(4, len(tvi1[0])):
    tvi1[:-3, j] += tvi1[3:, j-4]
tvc1 = tvi1.copy()
tvc1[:-3*p, 4*p:] -= tvi1[3*p:, :-4*p]
tvi2 = tvc1 / 256
for j in range(4, len(tvi2[0])):
    tvi2[:-3, j] += tvi2[3:, j-4]
tvc2 = tvi2.copy()
tvc2[:-3*p, 4*p:] -= tvi2[3*p:, :-4*p]
tvi3 = tvc2 / 256
for j in range(4, len(tvi3[0])):
    tvi3[:-3, j] += tvi3[3:, j-4]
tvc3 = tvi3.copy()
tvc3[:-3*p, 4*p:] -= tvi3[3*p:, :-4*p]
imshow(tvc3[200:500, 300:600], origin='lower'); colorbar()

Reducing memory usage

As is standard practice, you can pipeline these stages (see Evaluating DSP operations in minimal buffer space by pipelining) so that you don’t need 26 modified copies of the entire image floating around in memory (some at increased precision). But doing this in the straightforward way, scan line by scan line, you still need a pretty big buffer to do this in, because some of the 26 stages need to look pretty far back into the past. If we suppose n = 5, m = 3, and p = 60, for example, the number of scan lines of memory needed is as follows:

stage scan lines y[i, j] = x[i, j] +
resonator 1 15 -y[i - 4n, j - 3n]
oscillation window 1 90 -x[i - 8nm, j - 6nm]
resonator 2 15
oscillation window 2 90
resonator 3 15
oscillation window 3 90
resonator 4 15
oscillation window 4 90
heart-stabbing filter5 x[i - ⅓4n, j - ⅓3n]
axis LPF integrator 13 y[i - 4, j - 3]
axis LPF comb 1 6 -x[i - 8, j - 6]
axis LPF integrator 23
axis LPF comb 2 6
antialias filter 1 7 (4 kernels)y[i - 1, j]; -x[i - 8, j]; y[i, j - 1]; -x[i, j - 6]
antialias filter 2 7 (4 kernels)
transverse integrator 14 y[i - 3, j + 4]
transverse comb 1 240 -x[i - 3p, j + 4p]
transverse integrator 24
transverse comb 2 240
transverse integrator 34
transverse comb 3 240

Whew. That’s 1189 scan lines of memory in total, plus some fractional scan lines I’m not considering. Is there any way to reduce this?

(Well, of course there’s decimation. But I mean aside from decimation.)

I thought about tiling. It doesn’t help, because you just switch from having to buffer previous scan lines to having to buffer previous rows of tiles. In fact it hurts a little because you can’t discard fractional tiles. Maybe there’s still a way for it to work but I don’t see it.

I thought about maintaining resonator state in a different way, using per-pixel Minsky or Goertzel resonators, spatially shifted by varying integer amounts per scan line, rather than a feedback comb (which is basically a Karplus-Strong oscillator). This might help but it only saves you the memory needed by the resonator, which is relatively small compared to that needed to window the oscillations. And it makes the assertion about the precise cancellation of windowing more dubious, since I don’t think there’s a precise way to calculate Minsky or Goertzel resonation. (See Using the Goertzel algorithm, the Minsky algorithm, PLLs, and prefix sums for frequency detection.)

I thought about using more stages for the transverse window. That will give you a more precise Gaussian but uses more memory, not less.

If you try to use exponential blur for the transverse blur instead of an honest Gaussian blur, you completely lose not only finite impulse response but also zero-phase behavior. To get zero-phase behavior back you need to do a second pass backwards, which requires keeping the whole result image in memory instead of just part of it.

Multiple Gabor sharing

So, if you’re going to do a bunch of Gabor convolutions on the same image, what parts of the pipeline should you share between them?

The most computationally intensive (heh) parts of the pipeline are those dependent on the frequency and the angle: the resonating feedback comb, the oscillation-axis low-pass filter, and (more loosely) the antialiasing filter. These depend on λ and θ, but not σ or γ. It might make sense to move these stages, as well as the transverse blur’s integrators, to the beginning of the pipeline so that they can be shared between different window sizes and shapes, running the oscillation window and the transverse blur window later, which are only six of the 26 stages. This probably is not practical to do in floating-point because of the large magnitudes needed to feed the Hogenauer-style cascade of six windowing combs.

The antialiasing filter is applicable over an octave or so of frequencies at any angle (although in the above example it’s a bit longer in one direction than the other), and the transverse-blur integrators are applicable for any frequency at a given angle. They can’t both go first, though; putting the antialiasing filter first is probably better because it allows you to decimate the image immediately.

Dealing with invalid pixels ("NA")

Sometimes we can identify certain pixels as containing invalid data — for example, they don’t work on the sensor, or they’re suffering salt-and-pepper noise in this frame, or they’re saturated (perhaps due to a lens flare). Some numerical-computation environments have special facilities for such situations; Octave and R have “NA”, while Numpy has “masked arrays” which support most of the same operations as ordinary arrays.

Dealing with such invalid pixels is a particularly tricky problem for FFT-based convolution algorithms, since the structure of the FFT doesn’t have a reasonable way to incorporate validity information. R’s fft function, for example, will simply return an array of NA values if asked to transform an array with a single NA value in it. Octave, by contrast, returns an array whose values are NA only in the phase or phases affected by the NA value — so they may have a NA real magnitude, a NA imaginary magnitude, or both. In either case, you can’t use FFT convolution on a signal containing even a single invalid pixel; the signal comes back entirely NA.

By contrast, a direct implementation of convolution has three straightforward ways to handle NA values: it can propagate them to the affected neighborhood, turning each single NA pixel into a giant NA hole in the result; it can omit the NA pixels from the weighted sum, increasing the weights on the other pixels to compensate, unless all pixels with nonzero weight are NA; or it can switch between these two strategies at some threshold of invalidity, such as 50%. (Both Octave’s conv and R’s filter(method=’convolution’) take the first approach.)

This kind of sparse filtering using two-input kernels could take any of these three approaches, but in the recursive case (integrators) any of them would lead rapidly to disaster. Probably in that case the least bad approach is to treat NA pixels as 0 in recursive filters.

Conclusions

Well, really, beginnings... but it seems like this approach to approximating convolution with a Gabor wavelet probably works and probably is efficient (although my IPython notebook prototype certainly is not). It provides obvious ways to set five of the six parameters, but not ψ, the phase offset. This is a little alarming because without the phase offset there’s no way to get the complex Gabor wavelet. It would be surprising if there turned out to be no way to do this, but none is obvious to me at the moment.

(The issue is that you want to phase-shift the oscillation by a quarter cycle, but without moving the Gaussian window along with it. Maybe the solution could be something as simple as a differentiator in the direction of oscillation, perhaps by a half cycle, and a compensating constant factor.)

Among the possible applications are approximating an arbitrary convolution kernel as a sum of Gabor wavelets. One possible approach to this is analyzing it in different directions and at different scales using the Gabor transform or the one-dimensional wavelet transform with the Morlet wavelet, choosing a sparse subset of these basis functions that could possibly approximate the target kernel well, then optimizing the weights of that sparse subset further, under a loss function that drives small weights toward zero.

Moreover, there’s no particular reason to limit yourself entirely to Gabor wavelets when optimizing that approximation; you can include other kernels that can be computed efficiently using such sparse cascades, such as separable kernels (see The miraculous low-rank SVD approximate convolution algorithm), flat and especially polygonal kernels (see Real-time bokeh algorithms, and other convolution tricks), and, using the techniques used above to propagate the Gabor’s oscillation along a line, line segments.

Another possible application is the design of transmission-line RF filters, including stripline/microstrip transmission lines built into printed circuit boards, and waveguide filters — though I’m not sure how much that will buy you, given the need for matching networks to interconnect the signal paths. An open-ended transmission line spur is in some sense a unity-positive-weight feedforward comb filter whose lag is twice the length of the line, while a closed-ended one amounts to a unity-negative-weight feedforward comb filter, similarly.

Topics