Tuesday, July 4, 2023

Dynamic Range in Recordings is not a well defined concept

Dynamic Range in audio equipment seems relatively easy to understand.  It is the difference between the peak level and the noise.  

OR, it's the diffrence between the peak level and the lowest resolvable signal, which can sometimes be 20dB or so BELOW the noise level.  (Signals can be resolved from noise using spectrum and/or gestalt analysis, and the human auditory process does both.)  This is a little less well defined.  How capable is the spectrum analysis?  And the gestalt analysis is not very well defined at all.  Gestalt analysis is how we can identify discriminate different sound sources by other aspects of their quality than frequency, such as by their rhythm or randomness or apparent location.

 Either way, dynamic range in audio equipment is now specified as being done in the presence of signal, thereby requiring the signal to be filtered out from the product before analysis.  This is not hard to do now that we have ready access to things like FFT.

Sometimes this kind of Dynamic Range is called Signal to Noise Ratio in the context where we are looking at the noise of a particular component instead of an entire system.  And also perhaps when were are not measuring it in the presence of signal, which was the traditional way of measuring it (and what I usually do on my test bench).

But where it really gets thorny is when we are talking about the Dynamic Range in audio recordings.  It bugs the heck out of me when people just don't get how poorly defined this concept is (even though we now have 'Standards' and tools for measuring it) and how much, therefore, how much it depends on assumptions, heuristics, algorithms, psychoacoustic research, and the like.

For example, you could rightly claim that nearly every DDD recording has at least 96dB dynamic range, if you defined dynamic range as the difference between the peak level (which is almost always near the maximum level, defined as 0dB, on digital recordings) and the lowest level in the recording (such as in a fade out near the end of a recording).  

Also, since sound is wave-like, and audio is based on alternating currents, there will be zero crossings in every small section of recorded audio.  So it's always going down to zero somewhere and actually quite frequently!

Even if you removed silence-before-the-first-downbeat, silence or mechanical fade outs, and zero crossings, you may often find tiny near-silence gaps even within in highly compressed program material.  Even ruling out the two considerations above (fade outs and zero crossings) we can just look at the low level those relatively small gaps (for example 250ms, enough for 250 cycles at 1000 Hz and about where we hear gaps as gaps and not just clicks) and say "see, there is plenty of dynamic range between these 250ms intervals and the peak levels."

What the whole family of standards, methodologies, and tools for measuring "Dynamic Range" in recordings is really about is determining how much compression has ruined particular recordings, or alternatively how much compression can be used without making things sound bad.

It's a matter of heuristics.  They do this by comparing some quantity of peak levels to some quantity of less-than-peak levels.

But the quantities involved are not something that it intuitively obvious, but rather related to what is found to be 'interesting' in that it correlates to things sounding over-compressed.

So it is that some Dynamic Range methodologies use histograms and others use percentiles, things that are not intuitively obvious to self appointed audio gurus.

For example, the famous DR scale is based on a comparison between the peak level (fine, that's intuitive enough) and the average of all levels that are in the 20th percentile of highest levels or above (rather counter-intuitive).  IOW, it's comparing loud vs loudest, whereas you might think that dynamic range should compare loudest and softest (but then we are back to the 96dB of digital recording systems, etc).  But the presumption seems to be that "almost of the recording is at or below this level" and it should have at least one much higher peak to be "dynamic."  It seems to me there are other ways of being dynamic.

Why not the 50th percentile, or the 80th percentile???  Apparently because choosing the 20th percentile makes the tool identify recordings that sound overcompressed as compared to the genre of recordings they come from (which score differently even if no compression were used).

Furthermore, the DR tool analysis uses short non-overlapping segments of 3 seconds each.  You could get significantly different results if you defined those segments as shorter, longer, or overlapping.  You could argue for the merit of 250 msec intervals, which could produce a radically different result.  Likewise for 10 second intervals.

You might also think that something like Equal Loudness spectral curves (such as the famous Fletcher-Munson) ought to be applied...and indeed some tools use that...but the DR tool does not.

I was inspired to research and write about this after reading a poster (Tank) at Hoffman's discussion site talk about the difference in "dynamic range" between the 30th and 40th anniversary reissues of classic King Crimson albums.  I very very much like the 40th anniversary series, released on high resolution DVD-Audio, but this poster claimed the dynamic range on many albums in the 30th anniversary series was higher (and he like them better).  (I have not listened to the 30th anniversary series, and it's not "high resolution" so I might not bother.)

It really bugged me that this simpleton poster pushed back against a recording engineer (Plan 9) who was arguing with him, claiming that the dynamic range issues were easy to understand and the engineer was just full of BS.  The poster showed tiny graphs of different songs and said "See!"  But anyone who has done any amount of audio editing knows that things can look very different at different levels of zooming in.  

Sadly the engineer didn't do a great job spelling out his case, and after being denounced by Tank he just shut up.  But he did mention micro-dynamics and macro-dynamics and as far as I can tell, these are real things and not just BS.  Right now I couldn't define either one and I still need to read and understand more in this area.

[I previously wrote similar but less complete comments about the R128 used by Roon just a few posts back.  R128 does make slightly more sense to me than DR, perhaps just because I misunderstood DR.  It still seems to me highly arbitrary, though Roon's system works pretty well in practice for keeping a constant "level," which I noted then is not immediately obvious to me how that is done from the R128 dynamic range rating which is about range and not level.]


No comments:

Post a Comment