Bar-time Dynamics II – Making waves

Of Fourier transforms and spectro-temporal dynamics
In my previous blog Bar-Time Dynamics, I had taken an audio file of a song, and applied my methods of laser analysis – spatio-temporal dynamics – to it. It became a visual representation of the song structure, and with a nifty phase-flip also revealed the differences in the way the centre and side channels were engineered.
But when we think sound, we think sound frequencies – actually hear them. For audio engineers, it is also helpful to think of audio in this way. The simplest way to move from time to the frequency perspective is via Fourier Transforms. Let’s see how they work.
This is a computer-generated sine wave.

We can use the Fast Fourier Transform algorithm (FFT) to find out what is the frequency of this sine wave. The FFT of the sine wave looks like this –

The location of the peak tells us that the frequency of the sine wave is 50 Hz. The height of the peak tells us that the amplitude of the sine wave is 1.8.
The unique shape of this peak is due to the length of the signal taken – longer the duration of the signal, narrower the peak. The ripples can be minimized if we vignette our signal.
What happens if we have a slightly more complicated signal, say two sine waves added together?

Here the FFT reveals the frequencies of the two waves making up the signal.
Here we have illustrated an important characteristic of Fourier Transforms – if we have a linear superposition (i.e. simple, straightforward addition) of waves, the Fourier Transform will give you their frequencies. Thus, the FFT of an audio signal will give the audio frequency spectrum.
Let’s now apply the FFT to ARR’s Sevandhu Pochu Nenje.
The FFT of the intro synth section looks like this.

Just like in our computer-generated example, the FFT shows that the synth is made of two frequencies.
If you remember, I had mentioned the bar-time dynamics gave an impression that the synth frequency was commensurate with the BPM. Turns out that it indeed is. The higher frequency is its 72,000th harmonic (the 72,000.00000000001th harmonic, actually)!
Now that I think of it, it actually makes sense to ensure that the synth wobble falls in beat with the track tempo. The trick is in ensuring your track tempo is commensurate with the scale of your song. Nice.
Now let’s move on to the kick+rimshot section.

The top row gives the raw audio signal over half a bar. You don’t have to take an FFT to see that you will have a low frequency component that is coming from the drum kick. You will also see a fairly strong high frequency peak around 18 kHz. This is from a rimshot that coincides with the drum hit.
The FFT also shows a small peak around 750 Hz. This is coming from the small ripples riding atop the drum kick – those are the male high-end vocals.
Let’s see what happens when the female vocals enter –

The previous half-bar signal is also shown for comparison. Most of the low frequency components are retained, as the kicks remain unchanged. The 18 kHz rimshot is also preserved. But now you have so much more going on across the whole audible spectrum (<22 kHz). The human voice falls between about 100 Hz (~ C3) to about 1 kHz ( ~ C6), with harmonics going as high as 8 kHz (~ C9). You can see a bit of that above, but the frequency spread also arises from other elements in the music.
A skilled audio engineer can now pick apart the different instruments and their frequency spreads, and equalise them to give an overall pleasing, clean sound that will sound good on all speakers, big and small. That, is the pinnacle of the art form that is ‘EQ-ing’.
But we are not going to do that here.
What we will do next is to move over to our spatio-temporal approaches and see what that springs on us.
Specifically, we will now plot the spectro-temporal evolution. Here, it is shown for the chorus part starting a bit after the 2:00 minute mark, for both the left and right channels. Let’s focus on the vocal frequencies for now.

Each row corresponds to a bar, where the colours are indicative of the strengths of the different frequencies. We see a lot of variation around 0.9 kHz, which is the female vocals part. The left and right channels have fairly similar dynamics. Nothing fancy.
Let’s now apply our L+R and L-R trick to get the centre and side mixes. If you recall, the L+R will give you an impression of what you would hear on both channels, simultaneously. The L-R operation will impose a slight time delay between the channels – and this would reveal any signals that are relatively delayed. Delaying copies of sounds on left and right channels is a common trick used by sound engineers to give an impression of sound-stage (the Haas effect).
When we do this to our time signals, and then apply the FFT – things get interesting.

The centre mix still has most of the female vocals. The side mix however seems to catch the peaks of the vocal frequencies. It is possible to do this – software plugins allow you to set amplitude threshold gates that pick the peaks, and only process those gated signals to generated the delayed L and R channel signals.
It gets more interesting with the synth sweeps towards the end (4:00 minutes), which are impressively massive.

It is quite tree-like – the branches are the synth frequency ‘sweeps’ that show how the notes change continuously over bars. The vertical dashes are parts where the synth is held at the same note for four bars.
Here is the centre and side mix.

Look at the side mix – rich in harmonics, and making full use of the time-delay Haas effect. So if you ever wondered how the synths sound so massive – here you go.
I have just scratched the surface of what we can do with FFTs. You also have the inverse Fourier and Fast Fourier Transforms, which allow us to generate new signals by clumping together a bunch of sine waves – i.e. the inverse of what we do above. This forms the basis of sound synthesis, and much of our beloved sounds of the 80’s. And I haven’t even started talking about phase. Or about other signals – biomedical, lasers, telecoms, weather, stock trends, images – where you can apply FFTs.
In short, Fourier transforms are cool.
The blog above used a legitimately procured electronic version of the song, Sevandhu Pochu Nenjae, purely for non-commercial educational purposes.

Bar-time Dynamics II – Making waves by Srikanth Sugavanam is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://www.srikanthsugavanam.com/in-academia/bar-time-dynamics/.