Roja was released twenty-five years ago. Let that sink in.
Twenty-five years ago, the world witnessed Mani Ratnam and A. R. Rahman coming together for the first time. And the world hasn’t been the same since.
There is no question when it comes to the musical brilliance of A. R. Rahman (ARR, as he is known), and likewise, the story-telling genius of Mani Ratnam. They remain pioneers in their respective domains, with a working chemistry that stokes jealousy. Sparks fly when Mani Ratnam and A. R. Rahman come together, and the world awaits in eager anticipation. In the past two decades, they have given us timeless classics – Thiruda Thiruda, Bombay, Dil Se (Uyire), Guru, Indira, Kannathil Mutthamittal, Alai Paayuthe (Saathiya), Ayutha Ezhuthu (Yuva). If there is to be an equivalent of the Songbook for contemporary Indian cinema, I will strongly argue to search no further than the Mani Ratnam/Rahman anthology.
Chekka Chivantha Vaanam saw them coming together again last year. Needless to say, it didn’t disappoint. The musical score spanned genres from Indian classical to trap, which added a strong, guttural sub-text to the movie (the movie is about the internal conflicts of a South Indian mafia family).
For the audio engineer aficionado, it is a case study in the state of the art. There is generous use of deep kicks, growling basses, ultra-massive synths, with use of headspace tech that is just shy of witchcraft.
The case in point here is the track Sevandhu Pochu Nenje.
Admittedly it is quite reminiscent of Ayutha Ezhuthu’s Jana Gana Mana, with a steady pulsating pace-setting rhythm. The kick is killer, and the audio mixing is flawless.
By the way, Ayutha Ezhuthu was released in thirteen years ago. So this point onwards, I will stop measuring time in Mani Ratnam/Rahman music release dates, as they don’t make sense. And they make me feel old.
As I was listening to the track’s assertive four-on-the-floor kick while running some MATLAB code, I thought – well, what would the spatio-temporal dynamics of this ARR song look like?
So, making the most of the extra hour today, I ended up with some pretty pictures. And, maybe, just maybe, a peep under the hood of how the Maestro works.
Here’s how the track looks like.
What you see here are the left and right audio channels of the song. The green markers indicate the different prominent sections. This is essentially the signal that is converted to a voltage signal to be fed into your headphones or speakers. The song running time is about 5:10 minutes, set to a tempo of about 91 beats per minute. The song is sampled at the standard 44.1 kHz.
Here’s a close up of one bar of the song, just when the drums enter.
The low frequency modulation at the beginning of the bar is the kick (four-on-the-floor, starts on the first beat). Note how the hit/strike of the kick that occurs first is more compressed, followed by a trailing low frequency rumble. The tinnier rimshot is bang on count 3.
Now I proceed to apply the spatio-temporal methodology I use routinely to study lasers. Basically, as the song is periodic (at 91 bpm), I can segment the recording into bars, and stack these on atop the other. The video below shows what I mean.
Technically, for audio signals there is no ‘spatial’ co-ordinate. So, heretofore, I call the result of this segmentation/stacking process, ‘bar-time’ dynamics. 😉
Here are the bar-time dynamics I got after applying the spatio-temporal methodology to the left and right channels of the song.
In the above you are seeing a two dimensional representation of the song. The time flows from the top row to the bottom. Each row is one bar of the song, and houses one kick. The colours are an indication of the loudness (quantified in logarithmic units). The different sections of the songs are indicated on the left.
The electronic nature of the composition is very evident from the regularity of the sound elements (no surprises here). The strong similarity of the kicks indicate that most likely an electronic sample was used for it. For instance, the kicks can be instantly recognised. Even the rimshots show up as a clear vertical line. It is very clear from this representation when the different sections of the song start, so we can start looking at them up close.
Here is the intro –
The intro comprises of some granular synths and bells. The regularity is self-evident. There is also some panning happening here, as indicated by the alternating dark patches in the left and right channels. The samples are periodic over bars, i.e. they occur at the same instants along the bar over successive bars – which can be attributed to the grid-based electronic composition highlighted above. However, it seems as if the notes played (i.e. their frequencies) are commensurate with the bpm itself. Does this lead to an overall pleasing listening experience?
Here are the kicks –
And here is when the female vocals enter –
Note how the kicks seem to get distorted. This is because the female vocals are riding atop the kicks. Linear superposition is at play – so our ears are still able to resolve the overlapping kicks and the female vocals (which is amazing in itself, but that’s another story).
Now let’s come to the synths. When it comes to enhancing head-space, audio engineer have a nifty set of tricks – the Haas effect for example. Without getting into too much details, this entails generating a copy of sound you want to ‘widen’, and playing them on the two channels with a delay of 10-35 ms. This minor delay has the dazzling effect of a wider sound.
So, do we have something akin to a Haas effect there? We can check that very quickly.
We have two raw audio matrices, one for the left (say, L), and one for the right (say, R). So let’s carry out two operations – L + R, and L-R.
Here’s how they look like –
The titles of the figure are a give-away. The addition operation tells us which signals are present on both left and right channels. The kicks are dominant, and so are the rimshots. The vocals are also present on both channels. This is a classic approach
Things get interesting in the subtraction operation. The minus sign entails changing the peaks of the right channel into troughs. This means, when we do the subtraction, we cancel out the kicks completely. The minus sign in mathematical terms also means we have introduced a ‘pi’ phase shift, or equivalently, we have introduced a temporal shift between the left and right channels. This then goes on to minimise any relative temporal shifts that may have already been present, leading to constructive sums. In other words, we see signatures of time-shifted delays in the subtracted sum.
Here is just the synth section –
See how the kicks completely disappear in the side channels, leaving behind a thick coat of noise. This out-of-phase content gives the synth its wideness. Clearly some form of delay has been used, the exact nature of which requires further analysis.
The bar-time dynamics is proving to be a nifty tool for quick visual inspection of song structure and mixing approaches. Of course, it has its limitations. For instance, it does not tell us much about the frequency distribution. However, it has neatly set the stage for it.
Coming up next – Audio frequency resolved Bar-time dynamics. Keep your eyes peeled for some sound analysis.
The blog above used a legitimately procured electronic version of the song, Sevandhu Pochu Nenjae, purely for non-commercial educational purposes.
The original methodology was proposed and published in the Nature Communications publication, https://www.nature.com/articles/ncomms8004, which is licensed under a Creative Commons, CC-BY license.
Bar-time Dynamics by Srikanth Sugavanam is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Based on a work at https://www.nature.com/articles/ncomms8004.