Using the Goertzel algorithm, the Minsky algorithm, PLLs, and prefix sums for frequency detection

Kragen Javier Sitaker, 2019-06-16 (updated 2019-07-05) (39 minutes)

Windowing the results of the Goertzel algorithm, or the Minsky circle algorithm applied as a frequency detector, over different-sized windows, can provide frequency detection with variable levels of precision. Some kinds of PLLs can be used in the same way, and you can adapt the Goertzel algorithm or the Minsky algorithm into a PLL. The Hogenauer-filter approach can provide very nice windows very cheaply.

The Minsky algorithm applied as a frequency detector in the most obvious way turns out to be precisely equivalent to the Goertzel algorithm applied to the backward differences of the input signal.

This is a continuation of the exploration I began in Cheap frequency detection, but this note stands on its own, which is to say that it duplicates a lot of the material in that other note.

The Goertzel algorithm and windowing it with prefix sums

The Goertzel algorithm accumulates a sequence of values with a resonant frequency: sₙ = (2 cos ω) sₙ₋₁ - sₙ₋₂ + xₙ. You can verify that, if xₙ = 0, this holds true of the sequence sₙ = a sin (ωn + φ) for any a and φ. First, consider the case a = 1, φ = 0, and suppose that it’s true of sⱼ for all j < n; does it then hold true at n?

sₙ = (2 cos ω)(sin (ω(n - 1))) - sin (ω(n - 2))

The formula above is a little more symmetrical if we rewrite it with m = n - 1:

sₘ₊₁ = (2 cos ω)(sin (ωm)) - sin (ω(m - 1))

If you don’t remember angle-sum formulas from high-school trigonometry, you can trivially rederive them from Euler’s formula eⁱᵗ = cos t + i sin t; if t = h + j, this becomes eⁱʰeⁱʲ = (cos h + i sin h)(cos j + i sin j) = (cos h cos j - sin h sin j + i (cos h sin j + sin h cos j)), so cos (h + j) = cos h cos j - sin h sin j, and sin (h + j) = cos h sin j + sin h cos j.

Here, we’re interested in these identities:

sin (ω(m + 1)) = sin (ωm + ω) = sin (ωm) cos ω + cos (ωm) sin ω
sin (ω(m - 1)) = sin (ωm - ω) = sin (ωm) cos ω - cos (ωm) sin ω

The second one gives us

sₘ₋₁ = (2 cos ω)(sin (ωm)) - sin (ωm) cos ω + cos (ωm) sin ω
= (2 cos ω)(sin (ωm)) - cos ω sin (ωm) + cos (ωm) sin ω
= cos ω sin (ωm) + cos (ωm) sin ω

which above is established as the value of sin (ω(m + 1)), so sₘ₊₁ = sin (ω(m + 1)).

To establish that this is true for all values of a and φ, we can simply note that this is a linear time-invariant recurrence:

sₙ = (2 cos ω) sₙ₋₁ - sₙ₋₂ + xₙ

m or n does not enter into it except as an index, and the output on the left is linear in all the inputs on the right.

This is the unique way to get a sine wave of a fixed angular frequency ω as such a linear recurrence on the two previous terms, sin (ωn) = sₙ = asₙ₋₁ + bsₙ₋₂; that is, if you solve that equation for a and b, the only possible values for a and b are 2 cos ω and -1.

I think it’s somewhat trickier to show is that, if s₀ = s₁ = 0, sₙ and sₙ₋₁ form a linear encoding of Σⱼxⱼcis (ωj) for j ∈ [2, n]. But it’s true; unless I’m confusing something, Σⱼxⱼcis (ω(n-j)) = yₖ = sₖ - exp(-i ω) sₖ₋₁. sₖ is, of course, a linear function of any such previous sequence of sₙ, sₙ₋₁ and the xⱼ since that point, since it’s a linear function of xₖ and sₖ₋₁ and sₖ₋₂, which themselves are such a linear function.

The impulse response of this system makes it clear — an impulse in x will kick off a proportional sinusoidal oscillation in s.

So at any point along the accumulation of this sequence, we can extract the dot product between the complex exponential eⁱʲ and the x samples so far. In effect, the Goertzel algorithm computes the prefix sum of a particular frequency component of our signal, just in a slightly obtusely-encoded form.

(Note that this is precisely one of the components of the Fourier transform of the signal over that interval.)

Using sₖ as a prefix sum to rectangularly window a frequency component

This means that, if we want to know how much of a particular frequency was present during a given time interval, and we’ve saved off the sₖ values for that time interval, we can just do the yₖ calculation for the two points above, perhaps apply a rotation to bring them into phase (if they aren’t separated by an integer number of cycles), and take the difference. (This assumes that we aren’t getting rounding errors, but since we’re essentially computing a sum table here, our s values will grow without bound if the signal x contains a nonzero frequency component at the frequency of interest. As with Hogenauer filters, it might be wise to do the calculation in purely integer math to avoid this.)

This subtraction amounts to temporally windowing the signal with a rectangular window. So it convolves the spectral response of the filter with the Fourier transform of that rectangular window, which is a sinc. So, by using different widths of window on the same sₖ signal, we can get different levels of frequency selectivity. And in fact we can do this with only a single multiply-subtract and addition per sample (since the factor 2 cos ω is fixed as long as we don’t change the frequency), even though we’re pulling out a number of different window widths from that single resonator.

Goertzel without multiplies and A440 even temperament

The multiply-subtract (2 cos ω)sₙ₋₁ - sₙ₋₂ can, for some angles, be done with a couple of subtracts and bit shifts: (sₙ₋₁ << 1) - (sₙ₋₁ >> p) - sₙ₋₂. This works when 2 cos ω = 2 - (1 >> p) and thus ω = cos⁻¹ (1 - (1 >> (p + 1))). This gives the following periods for shift lengths from 0 to 20 bits:

>>> 360/(numpy.arccos(1-2**-(1 + numpy.arange(20.0)))*180/numpy.pi)
array([    6.        ,     8.69363162,    12.43307536,    17.67813872,
          25.06699928,    35.49668062,    50.23272124,    71.06297418,
         100.51459792,   142.16068241,   201.05374803,   284.33872284,
         402.11976897,   568.68612356,   804.245674  ,  1137.37658591,
        1608.49441598,  2274.75534121,  3216.99036595,  4549.51176711])

For the particular case of 44.1-ksps audio, these work out to the following frequencies in Hz:

>>> 44100/(360/(numpy.arccos(1-2**-(1 + numpy.arange(20.0)))*180/numpy.pi))
array([ 7350.        ,  5072.67870839,  3546.99048555,  2494.60651377,
        1759.28516646,  1242.3696873 ,   877.91381613,   620.57633403,
         438.74224154,   310.21235444,   219.34433171,   155.09670846,
         109.6688186 ,    77.54717088,    54.83399094,    38.77343753,
          27.41694317,    19.38670028,    13.70846505,     9.69334783])

These are far closer to A440 musical note frequencies than we would have any right to expect; 438.7 Hz is about 5 cents flat of A₄ (A above middle C), and 877.9 is only about 4.1 cents flat of A₅, and the frequencies that are not As are even-tempered D♯/E♭ notes. It crosses over from being sharp to being flat in between 2494 and 3546 Hz, precisely where the human ear is most perceptive.

Better windows through Nth-order prefix sums

Usually we don’t think of sinc as being a very good filter frequency response, because well into the stopbands, you keep getting these response peaks where an odd number of half-waves fit into your window. The Hogenauer-filter approach to solving this problem is to use N levels of sum tables to get an Nth-order approximation of a Gaussian window, which in theory has the optimal tradeoff between frequency precision and temporal precision — the minimal joint uncertainty, according to the uncertainty principle. If we consider the yₖ given above as representing values in a complex-valued sum table, we could compute running sums (prefix sums) of them to get Nth-order prefix sums, which we can then differentiate N times in the usual CIC-filter way to get a given frequency component with that window.

But calculating the yₖ is considerably more expensive than calculating the sₖ: yₖ = sₖ - exp(-i ω) sₖ₋₁, so while calculating sₖ required a single real multiply, addition, and subtraction (presuming all real xⱼ), calculating yₖ requires a complex–real multiply and then a complex–real subtraction: two real multiplies and a real subtraction, which is twice the multiplies. Normally we sweep this under the rug by presuming that we only calculate y values occasionally, so this expense doesn’t matter, but to compute the prefix sum (+\y, in APL notation) we would need to do those multiplications for every sample, and then it would matter. Also, computing complex prefix sums would involve twice as many real additions as computing real prefix sums.

But in fact we can avoid this hassle; Σₖyₖ = Σ(sₖ - exp(-i ω) sₖ₋₁) = Σₖsₖ - exp(-i ω)Σₖsₖ₋₁, so we can do that whole computation using a sum table for s which we use twice, instead of computing separate real and imaginary Nth-order sums. Then we can compute a y value windowed with an Nth-order approximation to a Gaussian window using N real subtractions, a complex–real multiply, and a real subtraction. The same Nth-order sum table will serve for any window size. XXX is this right? Or do we need to either do N subtractions per sample or N(N-1)/2 subtractions per output? Or maybe decimate by the window width M and then do the N subtractions once every M samples?

Roundoff

However, this surely requires special measures to avoid roundoff for the Hogenauer-filter part of the computation; the values in a third-order sum table, over an interval where the signal is roughly constant, will grow proportional to the cube of the constant value. The standard approach is to implement Hogenauer filters using integer arithmetic with wraparound, thus entirely eliminating rounding error. Maybe an alternative exists, representing the sum tables using the tree constructed in the standard parallel prefix-sum algorithm, so that the values being added at each tree node are of comparable size and roundoff lossage is insignificant; but I suspect you need to traverse the tree at subtraction time, adding a logarithmic slowdown, and I’m not sure how this generalizes to second-order and higher sum tables.

Even without sum tables, it’s a frequently noted observation that the Goertzel algorithm is prone to numerical error.

Frequency detection

So you could imagine using this approach to narrow down the frequency range where a particular signal might be in your input: first use very short windows to see if there’s anything in the frequency range at all, then use longer windows at a larger number of frequencies (some of which might be the same frequency — just using the same prefix sums with a longer window width) to figure out where the signal or signals might be inside that window.

Alternatively, if your signal is sparse enough in frequency space, you can use a temporal window size long enough that only one significant frequency component will be within the corresponding frequency window, and then use its successive phases in successive overlapping windows to determine its real frequency, the way a phase vocoder does. So, for example, in The Bleep ultrasonic modem for local data communication, I want to detect tones at 17640 Hz and 19110 Hz. I could very reasonably use the Goertzel algorithm with a frequency window centered on 18375 Hz, whose beat frequency with either 17640 or 19110 Hz is 735 Hz; that means that every 1.36 milliseconds (60 samples at 44.1ksps) the phase relationship between the reference frequency and the real frequency will shift by 90°. So if I perform Goertzel for that frequency over nominally 15-sample windows, I should get a nice clear phase that I can unwrap. I haven’t tried this yet.

(18375 Hz is 2.6180 radians or 150° per sample, so the Goertzel recurrence becomes roughly sₙ = -1.732 sₙ₋₁ - sₙ₋₂.)

Alternatively, suppose I’m trying to detect the tune I’m whistling as in Whistle detection, in the band 600Hz to 1600Hz. Typically if I’m whistling it’ll be the loudest sound in that band, maybe 20 to 40 dB louder than anything else nearby. Suppose we have a whistle at 1477Hz and we’re trying to detect it by windowing a 1400Hz Goertzel filter; we’ll get a phase spinning around the circle at 77Hz. So if our window mostly averages over, say, 3 ms (4.2 cycles of the 1400Hz reference), the phase will rotate around almost one quadrant, attenuating the phasor’s average magnitude by about √2, or 1.5 dB. By unwrapping the phase (again, like a phase vocoder) we can determine the beat frequency and phase direction, and thus the real frequency. So windowing a single Goertzel filter about 500 times a second covers about 200Hz of bandwidth, so about five or ten of them should cover the whole band of interest.

As long as the window is pretty close to Gaussian, there’s a relatively precise reciprocal relationship between the bandwidth and necessary number of windows per second for each filter. Using a window that’s half as long requires us to use twice as many windows per second, but also covers twice the bandwidth in a single filter, so we need half as many filter center frequencies. In the limit, where our windowed filter kernel is just an impulse, we’re just trying to “unwrap the phase” of individual samples (or perhaps pairs of samples, since yₖ uses sₖ₋₁ as well). At the other extreme, with a very large number of very narrow frequency bands, each with windows tens or hundreds of milliseconds long, we only need check each band every hundred or more milliseconds, but we start losing temporal precision.

But at that point, wouldn’t it be faster to window a downconverted signal and do a Fast Fourier Transform? Then we only have to do O(lg N) work per sample to get each of N frequency bands instead of O(N) work.

(Perhaps the best way to understand the CIC-windowed Goertzel filter in this context is precisely that it’s providing a windowed downconverted signal, which we can then Fourier-transform if we like?)

In one line of C

Here’s a one-line obfuscated C program from Cheap frequency detection that generates a sine wave using Goertzel’s algorithm:

main(s,t,u){for(t=32;u=s,1+putchar(128+(s-=t-s+s/8));t=u);}

If you compile it to an executable called goertzel on a Linux machine with ALSA or an ALSA emulation, you can run it with the command ./goertzel | aplay.

(arecord and aplay default to 8000 unsigned 8-bit samples per second, which is convenient since it means we can use getchar() and putchar() to read and write samples.)

The Minsky algorithm

One way to think of the Goertzel algorithm’s oscillator is by linearly extrapolating to the current sample, then subtracting a second-order correction to make it curve back towards zero at the desired frequency.

sₙ = (2 cos ω) sₙ₋₁ - sₙ₋₂
= sₙ₋₁ + (sₙ₋₁ - sₙ₋₂) - 2(1 - cos ω)sₙ₋₁

In essence, it’s calculating the desired second derivative by multiplying a small negative number, -2(1 - cos ω), by the latest sample. While this is in theory exactly correct, it’s easy to see that it could give rise to significant roundoff errors for sufficiently low frequencies. In the limit, where cos ω rounds to 1, it will stop oscillating entirely and merely linearly extrapolate. For angular velocities ω approaching 0, 1 - cos ω = ½ω² + O(ω⁴), which is why changing this correction by factors of 2 in the above note about multiplication-free Goertzel resulted in changing the frequency by roughly half an octave.

(However, don’t be misled into thinking that this means Goertzel is merely a quadratic approximation that will fail at high frequencies. As shown earlier, if performed with exact arithmetic, it’s exact.)

The Minsky algorithm is similar to the Goertzel algorithm, but while the Goertzel algorithm effectively calculates the current derivative of the oscillation from the difference between the last two samples, the Minsky algorithm reifies the first derivative as a separate variable:

sₙ = sₙ₋₁ + εcₙ-₁ + xₙ
cₙ = cₙ₋₁ - εsₙ

XXX why does LaTeX give you a lunate epsilon ϵ by default, reserving normal epsilon for \varepsilon? Should I be using ϵ here?

This requires two real multiplications per input sample instead of Goertzel’s one multiplication, but I hypothesize that it should have a smaller roundoff error; ε ≈ sin ω, which means that for small angular velocities, it’s proportional to the angular velocity rather than (as in Goertzel) the square of the angular velocity. So sometimes you can get more accurate results with Minsky.

ε is not precisely sin ω; more precisely, ω = 2 sin⁻¹(½ε), so ε = 2 sin(½ω), as explained in Cheap frequency detection and Minskys and Trinskys and roughly in HAKMEM. For small angles, the approximation is very close.

One reason for roundoff error in Goertzel is that the energy of the resonator s alternates between being in the (square root of the) velocity — that is, the difference from one sample to the next — and being in the (square root of the) displacement — that is, the value of the latest sample. When the frequency is in the neighborhood of one radian per second, this doesn’t cause much error, but for very low frequencies, the velocities, measured from one sample to the next, can be very small compared to the sample values. For one milliradian per sample, for example, at angles π/4 + nπ/2, the velocity and the displacement are both at √2̄ of their maximum value, but the velocity there is 1000 times smaller than the displacement. So if you have eight decimal digits of precision, like a 32-bit IEEE-488 float, your velocity only has about five decimal digits of that precision, which means that every half cycle your signal has been rounded to five decimal digits. The Minsky algorithm doesn’t have this problem in this form, since s and c are of similar magnitude, but it might have a similar problem in that the increments being added to s and c are potentially very small compared to their magnitude.

Both Minsky and Goertzel are stable, in the limited sense that they don’t go to infinity over an infinite interval when the input is zero, if computed without roundoff. (They are not BIBO stable, of course.) The argument that Minsky is stable is that we can rewrite the above system as follows when xₙ = 0:

sₙ = sₙ₋₁ + εcₙ-₁
cₙ = (1 - ε²)cₙ₋₁ - εsₙ₋₁

In Unicode-art matrix form:

⎡ sₙ ⎤ = ⎡  1   ε   ⎤ ⎡ sₙ₋₁ ⎤
⎣ cₙ ⎦   ⎣ -ε  1-ε² ⎦ ⎣ cₙ₋₁ ⎦

The determinant of that matrix is precisely 1. (XXX but its L norm is 1+ε; what’s up with that?)

In a sense, sₙ and cₙ are temporally offset from one another by half a sample, and we are thus doing leapfrog integration of the harmonic-oscillator ODE = -εs. For slow oscillators, this is a small difference, so sₙ² + cₙ² is a fairly precise estimate of the energy in the system, but for fast oscillators it can be a large one. I think Minskys and Trinskys has calculated the exact expression for the correction, but I can’t remember.

(It’s easy enough to add an exponential decay to either Goertzel or Minsky, which results in the system state at any given time being an exponentially-windowed filter of the frequency of interest, but here I’m focusing on the prefix-sum-based approaches because I think they can produce nicer windows almost as cheaply.)

The Minsky algorithm in one line of C

Here’s a one-liner obfuscated C Minsky-algorithm audio oscillator I wrote in 2017:

main(x,y){for(y=100;1+putchar(x+128);x-=y/4,y+=x/4);}

You can pipe it to aplay just as with the above Goertzel program. The tone is agreeable.

Varying ε, and efficiency

The canonical form of Minsky’s algorithm above uses the same ε in both steps, but as is done in Minskys and Trinskys, you can scale c up and down relative to s by using a separate δ:

sₙ = sₙ₋₁ + εcₙ-₁ + xₙ
cₙ = cₙ₋₁ - δsₙ

(Here I have the ε and δ backwards from the use in Cheap frequency detection.)

If we’re only interested in how s behaves, and c₀ = 0, then it turns out that only the product δε matters; if we vary δ while holding δε constant, the cₙ values get scaled up or down by δ, but the sequence of sₙ remains constant, barring roundoff errors. In particular, as explained in Cheap frequency detection, we can choose δ = 1 or ε = 1. Say we choose δ = 1; to get the same angular velocity of ω radians per sample, ε = 4 sin²(½ω). So our new equations are:

sₙ = sₙ₋₁ + (4 sin²(½ω))cₙ-₁ + xₙ
cₙ = cₙ₋₁ - sₙ

So now we need one subtraction, two additions, and a multiplication per sample. The Goertzel algorithm requires one subtraction, one addition, and a multiplication. So our multiplication penalty has disappeared, but I think the rounding-error advantage has disappeared with it: c can now have peak values much larger or smaller than s, so we can lose precision when energy moves from one to the other.

If we instead choose ε = 1, we get this equivalent version:

sₙ = sₙ₋₁ + cₙ-₁ + xₙ
cₙ = cₙ₋₁ - (4 sin²(½ω))sₙ

We can reverse the order of the updates, as is usually done, which shifts the indices of c by 1:

cₙ = cₙ₋₁ - (4 sin²(½ω))sₙ₋₁
sₙ = sₙ₋₁ + cₙ + xₙ

We can expand this out a bit:

cₙ = cₙ₋₁ - (4 sin²(½ω))sₙ₋₁
sₙ = (1 - (4 sin²(½ω)))sₙ₋₁ + cₙ₋₁ + xₙ

This is looking suspiciously familiar! What is 4 sin²(½ω), anyway? Well, sin(½ω) = ±√(½(1 - cos ω)), so it ends up being 2 - 2 cos ω. Which means that this is actually:

sₙ = (-1 + 2 cos ω)sₙ₋₁ + cₙ₋₁ + xₙ

If we want to express this purely in terms of s, we need to eliminate cₙ₋₁. Since sₙ = sₙ₋₁ + cₙ + xₙ, sₙ₋₁ = sₙ₋₂ + cₙ₋₁ + xₙ₋₁, so cₙ₋₁ = sₙ₋₁ - sₙ₋₂ - xₙ₋₁. So we have

sₙ = (-1 + 2 cos ω)sₙ₋₁ + sₙ₋₁ - sₙ₋₂ - xₙ₋₁ + xₙ
sₙ = (2 cos ω)sₙ₋₁ - sₙ₋₂ + xₙ - xₙ₋₁

We have the not entirely unexpected result that using Minsky’s algorithm in this way as a signal filter is equivalent to using Goertzel’s algorithm on the backward differences of the input signal, which in this context can be understood as a high-pass prefilter introducing a 90° phase shift.

Since the tricks we were playing with scaling c up and down don’t affect the sequence of values in s, this result applies to using the vanilla form of Minsky’s algorithm, too.

This means that if we want to precisely reproduce the results of Goertzel’s algorithm using Minsky’s algorithm in the straightforward way, we need to run Minsky’s algorithm on the prefix sum of the signal. This may be a practical problem in situations where using a prefix sum may result in roundoff errors. Perhaps adding the input samples into c instead, while still reading the output from s (updated after c) would solve that problem? XXX investigate further.

A PLL in one line of C

In 2012, after struggling with software PLLs (“SPLLs”) for a while, I wrote this obfuscated one-line first-order PLL in C:

main(a,b){for(;;)putchar(b+=16+(a+=(b&256?1:-1)*getchar()-a/512)/1024);}

If you compile it to an executable called tinypll on a Linux machine with ALSA or an ALSA emulation, you can run it with the command arecord | ./tinypll | aplay; it tries to emit a tone that tracks the pitch of your voice, but one octave higher. This works better with headphones, so that the microphone isn’t picking up the tone it’s emitting.

This isn’t a very good PLL, but I think it’s an excellent way to show the anatomy of a PLL. It contains two continuously-varying state variables, a and b. b, or rather the low 9 bits of b, is the phase accumulator of an oscillator, while a is the low-pass-filtered error from the phase detector.

When the program gets an input sample with getchar(), it feeds it into a “chopper”:

(b&256 ? 1 : -1) * getchar()

Since the low 9 bits of b are the phase accumulator for the oscillator, its bit 8 tells us which half of the oscillation we’re in. And, as the oscillator alternates between the halves, it either inverts the input signal, or it doesn’t.

How does that chopper compute a phase error? Well, if we suppose that the input signal is an oscillation whose frequency is close to that of the PLL’s oscillator, then the sum of that chopped signal over a whole cycle tells us the relative phase. If the two oscillations are perfectly in phase, then we’re inverting precisely the negative part of the input signal, and so we get a large positive number. If they’re perfectly out of phase, then we’re inverting precisely its positive part, so we get a large negative number. And if they’re in perfect quadrature, then the samples we inverted are half negative and half positive (and cancel each other out), and so are the samples we didn’t invert, so we get zero. If there’s a small phase error from quadrature, say so that we’re inverting a little more of the negative signal than the positive one, we’ll get a small negative number. And any DC bias cancels out.

So a sum over a whole oscillation of this chopped signal gives us some kind of indication of how far away we are from quadrature, and close to quadrature it’s linear. So naturally the next thing we do is to add that phase-error sample into our phase-error accumulator:

a += … - a/512

To sum over precisely a whole oscillation would require keeping in memory an array of values to run a rectangular window over, so here we just use an exponential filter with a time constant of 512 samples, which is 64 milliseconds. (This is guaranteed to be several cycles long over the frequency range we can reach here.) Note that this also means that the magnitude of a is about 512 times bigger than the chopped input samples getting fed into it — if our average phase error is about 3 per sample, say, then a reaches a steady state at a ≈ 1536.

So, then we want to drive our oscillator at a frequency that depends on this filtered phase error:

b += 16 + (a …)/1024

If our input signal is always 0, a will decay to 0, and this simply reduces to b += 16, which means it will reach 512 and its low 9 bits will wrap around every 32 samples, which is 250 Hz. So that’s the “natural frequency” of the chopping. But if a is in the range, say, 1024 to 2047, then this is b += 17 and it pitches up to 265 Hz. On most systems C division rounds towards 0, so if a is slightly negative, it doesn’t lower the pitch, but once it gets to -1024, the pitch drops to 234 Hz.

So if you’re screeching at your computer at 234 Hz, the chopper will initially drift in and out of phase with your screech 16 times a second, but if it ever manages to accumulate a phase error of -1024 in a, it enters a steady state where it’s chopping your signal just enough out of quadrature to maintain that -1024. If it falls behind your screeching, a will decay and the chopping will speed up until a is growing again, and if it gets ahead of your screeching, a will grow to an even more negative value and its chopping will slow down.

This sounds ridiculously crude, but there’s a certain amount of noise on a from the regular increases and decreases of the phase error accumulator four times per cycle (about every 8 samples), plus actual signal noise, so it works better than you would think. Not that well, but better than you’d think.

Finally, it emits an output tone:

for (;;) putchar(b…);

This is the low 8 bits of the 9-bit phase accumulator in b, so it produces a triangle wave at twice the chopping frequency, so, in the neighborhood of 500 Hz. Triangle waves sound really harsh, but because they’re so harmonics-rich, they make it easier to hear pitch changes than smoother waveforms would.

Notes on SPLLs in general, and using real sinusoids with them

The ramp up and down of the phase error accumulator at twice the frequency being detected is called “VCO control line ripple” or “reference spurious” in analog PLLs. This is often a thing you want to minimize, though in this case the noise it adds probably improves system performance through stochastic resonance. You could notch-filter this frequency out with a simple feedforward comb filter by the simple expedient of averaging two samples from the phase error signal half an oscillation apart (adjusting this, ideally, to the current oscillation frequency); or, since doing this requires a table of past phase-error samples anyway, use a prefix sum to calculate a simple moving average, a box filter, of that width, and dispense with the exponential. Box filters have a better noise-suppression/tracking-latency tradeoff than exponentials anyway.

I think there’s an even simpler solution, though: in a digital system, it probably isn’t necessary to update the phase error after every input sample anyway; you can do that processing at a lower sample rate, which filters out fast oscillations by construction.

Although the overall system isn’t linear, it has a lot of linear pieces in it. This phase detector, for example, produces an output signal that’s proportional to the amplitude of the waveform it’s locking onto. If the waveform is near a local extremum where it gets chopped from negative to positive, then the phase-detector output is also locally linear in the phase error. The chopper frequency shift is linear in the filtered phase error signal, and the exponential filter on that signal is linear and time invariant.

Each of these components — the phase detector, the feedback-loop low-pass filter, the variable-frequency oscillator, and the feedback path from the oscillator to the phase detector — has lots of possible realizations. This particular phase detector is called a “type I” phase detector, but there are also “type II” phase detctors. And there are higher-order PLLs, which I don’t understand at all. I’m pretty much just sticking to basic PLLs here because that’s all I understand at all.

Chopping with a square wave means your PLL is sensitive not only to its nominal frequency but also its odd harmonics. There are cases where this doesn’t matter, but in particular if you’re doing this on a sampled signal, the odd harmonics can alias down to other strange frequencies. In my 18375-Hz example above, for example, the third harmonic would be 55125 Hz — probably heavily attenuated by the antialiasing filter on your sound card, but at 44.1ksps, it aliases down to 11025 Hz. The fifth harmonic is even worse: it aliases to 3675 Hz. (Subsequent harmonics repeat this cycle backwards and forwards.)

And, indeed, that’s what you see if you analyze the output of this program:

#include <stdio.h>
int i;
int main(){for(;;)putchar(i++*18375*16/44100&8?0:128);}

Feed it to head -c 44100 | sox -r 44100 -t raw -u1 - sqwv.wav, hit ^C, run audacity sqwv.wav, and plot the spectrum, and there are three sharp peaks at 18375 Hz, 11025 Hz, and 3675 Hz, just as predicted. And if you were multiplying that oscillation pointwise by an input signal to try to do phase detection, you’d be detecting all three of those frequencies, too.

So you may actually prefer to multiply by a real sine wave at the frequency of oscillation, which of course is what the Goertzel and Minsky resonators do; they also both have the advantage that they can give you a phase readout fairly directly, rather than using these subtle arguments about halves of chopped waves canceling each other out. But to use them for this application, you need to be able to adjust their resonant frequency.

A thing to notice is that, in this single-line SPLL, there’s nothing left over that tells you if you actually have a signal or just silence. a will be 0 if you’re perfectly locked onto a strong 250-Hz signal or if the signal is all zeroes. a might be 2048 because it’s locked onto a really strong signal around 281 Hz, or a weak one, or because it’s randomly being buffeted by strong random noise. To get the signal amplitude, you can chop the input signal with a second chopper in quadrature with the one you use for phase detection; this chopper will be in phase with the input signal instead of in quadrature with it, and so it amounts to synchronous rectification of the input signal. (The Goertzel or Minsky approach does this implicitly.)

This in-phase-chopped signal gives you the amplitude of the oscillation at the frequency of interest (and, as noted above, potentially some other frequencies too). Earlier I had suggested using a box filter in the feedback path to eliminate phase-error ripple, using a prefix sum calculated over some segment of the quadrature-chopped signal. By computing an analogous prefix sum of the in-phase-chopped signal, you can enable the same kind of after-the-fact frequency selectivity described in the earlier sections about fixed-frequency Minsky and Goertzel resonators: by subtracting two nearby samples on the I-chopped and Q-chopped prefix sums, you can detect signals over a wider bandwidth, or by subtracting prefix-sum samples with a large lag between them, you can detect them over a very narrow bandwidth. For better or worse, your detection band chirps along with your PLL oscillator.

There is presumably no PLL equivalent to the phase-vocoder-like use of the fixed-frequency oscillators that I mentioned above, since the whole point of using a PLL is that it finds the frequency of the signal of interest and follows it.

Altering the frequency has some subtle effects which may introduce numerical errors here. As mentioned earlier, for fast Minsky oscillators, the half-sample offset between the two state variables becomes significant; altering the frequency will alter that relationship, and might rob energy from the system or add energy to it. Similarly, with Goertzel, lowering the frequency can add energy to the system, but won’t always — it depends on what part of the oscillation s is in. (I don’t know if it can also rob energy from it.) Presumably we can work out how to correct these errors.

Related work; contributions?

I ran across a 2015 paper by Sridharan, Chitti Babu, MuthuKannana, and Krithika entitled “Modelling of Sliding Goertzel DFT (SGDFT)…” which seems to have some things in common with the above. They’re using a Goertzel oscillator in their PLL, but it isn’t the oscillator; it’s in the feedback path between the oscillator and the phase detector. They’re also using a moving-average filter (a box filter). Moreover, they’re using a PI controller in the loop to set the oscillator frequency in order to drive the phase error to zero (since their application is power grid synchronization), rather than the proportional control I used in the example above. As far as I can tell, by “SGDFT” they just mean the Goertzel algorithm.

I haven’t found anybody talking about the relationship between the Goertzel algorithm and the Minsky algorithm, about using prefix sums to get variable tradeoffs between precision in the frequency domain and in the time domain without having to redo the multiplications, about using the Goertzel or Minsky algorithm as a combination phase-detector and oscillator in a PLL, or about the fortuitously nearly-A440 Goertzel frequencies we get with just a bit shift. I’ve never seen anybody talk about doing the Goertzel algorithm in wrapping integer math to avoid roundoff error (though, admittedly, it probably only makes sense in the context of Hogenauer-style windowing). I’ve never seen a discussion of using the Goertzel algorithm to detect a frequency near the target frequency and precisely identify it by unwrapping the phase. But it’s possible that that’s just because I’m just not that familiar with the literature. Or that the ideas are wrong.

However, most of the PLL and Minsky-algorithm and Goertzel-algorithm stuff is very well known. As far as I can tell, the relationship between the Minsky algorithm and the Goertzel algorithm hasn’t been published, but quite possibly it’s just too obvious to those skilled in the art to be considered novel.

Topics