Saturday, July 15, 2023

Thinking about HDCD again

Using DVD-5000 to decode HDCD on played on PD-75 as digital transport

 I've been testing a pair of the DVD-5000's I originally bought for use as living room DACs.  They are both for sale on ebay as I write this.  I am reminded of the fact they are very good, and according to objectivist standards should not be audibly different from the very best DACs.  They also have the ultimate R2R chip made by Burr Brown, the PCM 1704, which was discontinued in 2012.  By objectivist standards, it should be no different than a good Sigma Delta chip.  But it is different in objective ways, for what that's worth.  I love the DVD-9000 for HDCD decoding but don't use it for anything else.  I would have been tempted to use DVD-9000 for midrange DAC until I discovered that I needed 3 DACs of exactly the same design for my tri-amplified system.  Even if I adjusted the delay times for one particular sampling rate, when I changed sampling rates those adjustments would no longer be valid, because the latency of nearly every device varies with sampling rate.

Faced with that problem, and the (at the time) skyrocketing price of DVD-9000's, I opted to get 3 DVD-5000's instead.  Those would be basically the same thing, I predicted.

But for probably no good reason, I just never fell in love with the DVD-5000's the way I did for the DVD-9000.  And it was much more convenient to have multiple DACs with level controls, and smaller.  So I ended up with 3 Emotiva Stealth DC-1's, which are quite fine also (technically they measure much better than the DVD-5000's, but once again that difference should not be audible).

I still use DVD-9000 for decoding HDCD's (because my Oppo BDP-205 doesn't do that).  So I tried using the DVD-5000 (which doesn't actually play discs, but connected as a DAC to the digital output of a Pioneer PD-75, optically) and I found it does indeed do very well with HDCD's, just like my DVD-9000.

But it is different.  When playing an HDCD the output level never exceeds the output level from an ordinary CD.  The DVD-9000 actually plays the level expanded portions of an HDCD up to 6dB louder than an ordinary CD.  In contrast, the DVD-5000 does what every other HDCD player I've ever tested (other than the DVD-9000), it lowers the average level, just so so the most expanded peaks of HDCD reach the maximum CD level, but no more than that.

When my brother-in-law George first heard the HDCD on a DVD-9000, he was (unusually I'd say, because he never matches levels by measuring) shocked and appalled that that HDCD boosted levels 6dB above normal CD levels.  He felt that was unfair.  HDCD was all just a cheat, he insisted.

Well, as I've started to explain in many other previous essays, it's much more complicated than George and almost everyone thinks, because of inter sample overs.

(And of course, George was also wrong that HDCD players boost the level beyond that of CD's.  Though for the longest time I wondered if earlier HDCD players did do that boosting, and it changed when players stopped using the PMI chips.  I had it reversed.  The DVD-5000 uses PMI chips and doesn't do boosting beyond CD levels.  The DVD-9000 uses a software implementation--like all later players--and does boos the level.  So it does not necessarily have anything to do with using the chip, it was just a particular approach the designers of the DVD-9000 took, which possibly was available to the users of the chip as well, but I haven't seen one that does.)

The ISO's on regular digital recordings can match those of even peak level expanded HDCDs.  And what's more, HDCD's don't seem to have such as big ISOs.  So it comes out about as a wash, the HDCD is just giving a kind of engineered peak, and regular CD's are giving us extrapolated peaks.  That dynamic range in the analog output was there and needed anyway, HDCD's (when boosted) are just using that dynamic range for real music dynamics, rather than extrapolations from what may be just high frequency ringing.

Now I also know, that SACDs may in fact be the worst offender of all.  They can have the highest peak output above the nominal 2V level.

But I'm wondering if it was the boosted HDCD level (compared to other players) that led me to fall in love with the DVD-9000 in the first place.

(I think HDCD done as intended (and with the filter control*) would have been a great idea...as an open standard, and would have enabled a way to bypass the loudness wars, using the HDCD encoding to provide the uncompressed version which for which the compressed version would be heard without it.  Now we're simply stuck with HDCD as necessary for reproduction of a significant number of fabulous sounding recordings, which are better heard when available in high resolution.)

(* The filter control somehow seems less necessary as HDCD's don't seem to have the transients that provoke excessive ISO's.  Somehow they achieved that effect without a post-filter change.  It will be necessary to do more investigation.)

Friday, July 7, 2023

What's going on with Poulenc in DSD

I've always loved the Linn recording of the Polulenc Concerto for Organ.  There is no question, it's very dynamic (as a whole album, much less so in individual tracks).  The SACD layer has a marvelous sound.  I have not listened to the PCM layer in awhile.

It has by far the highest peaks above the nominal 0dB RMS level of any disc I have encountered (+7.5dB).

I have been accounting for these peaks as Inter Sample Overs.  That might not be the correct description of what is happening on SACD's.

On PCM recording, the highest peaks seem to be associated with the leading edges that produce the most pre- or post- ringing.  This makes sense, as they basically represent very high level high frequency content which 'propels' an interpolation of the signal to go way beyond the normal boundaries.  Only high frequency content can do that.  Even when reaching 0dB, low frequences just reach the top slowly, over very many samples, and don't change enough from one sample to the next to cause an ISO.  (That was why I measured 0dB with 880 Hz, a low enough frequency not to produce ISOs.)

But on DSD...there is no pre- and post- ringing.  That is of course the beauty of it.  You get smooth looking curves that visibly look like the original wave forms.  (The eye is being deceived.  In actuality, those smooth looking curves are obscuring vast high frequency noise, which is often making the curves look smoother than the real thing.)

Here the entire Poulenc album on the DSD layer, recorded at 96kHz.  Notice that the very highest peaks which give the album very high dynamic range only occur in the last track.  Just a couple other tracks have higher peaks at all.


Poulenc (entire album)

Now lets take a look at what may be the highest peak (there are actually a bunch of them as you zoom in) in the last track:



Mostly, it doesn't look that extraordinary.  There's no sharp edge, no pre- or post- ringing.  However, there is an interesting notch on he bottom channel at the very peak (near the cursor).  Given the scale, that represents very high frequency information.  It may be a telltale sign that some sort of limiting or other thing was done just here.  But the notch itself is tiny in comparison with the overall peak, which doesn't look out of place in its surroundings.



Tuesday, July 4, 2023

Dynamic Range in Recordings is not a well defined concept

Dynamic Range in audio equipment seems relatively easy to understand.  It is the difference between the peak level and the noise.  

OR, it's the diffrence between the peak level and the lowest resolvable signal, which can sometimes be 20dB or so BELOW the noise level.  (Signals can be resolved from noise using spectrum and/or gestalt analysis, and the human auditory process does both.)  This is a little less well defined.  How capable is the spectrum analysis?  And the gestalt analysis is not very well defined at all.  Gestalt analysis is how we can identify discriminate different sound sources by other aspects of their quality than frequency, such as by their rhythm or randomness or apparent location.

 Either way, dynamic range in audio equipment is now specified as being done in the presence of signal, thereby requiring the signal to be filtered out from the product before analysis.  This is not hard to do now that we have ready access to things like FFT.

Sometimes this kind of Dynamic Range is called Signal to Noise Ratio in the context where we are looking at the noise of a particular component instead of an entire system.  And also perhaps when were are not measuring it in the presence of signal, which was the traditional way of measuring it (and what I usually do on my test bench).

But where it really gets thorny is when we are talking about the Dynamic Range in audio recordings.  It bugs the heck out of me when people just don't get how poorly defined this concept is (even though we now have 'Standards' and tools for measuring it) and how much, therefore, how much it depends on assumptions, heuristics, algorithms, psychoacoustic research, and the like.

For example, you could rightly claim that nearly every DDD recording has at least 96dB dynamic range, if you defined dynamic range as the difference between the peak level (which is almost always near the maximum level, defined as 0dB, on digital recordings) and the lowest level in the recording (such as in a fade out near the end of a recording).  

Also, since sound is wave-like, and audio is based on alternating currents, there will be zero crossings in every small section of recorded audio.  So it's always going down to zero somewhere and actually quite frequently!

Even if you removed silence-before-the-first-downbeat, silence or mechanical fade outs, and zero crossings, you may often find tiny near-silence gaps even within in highly compressed program material.  Even ruling out the two considerations above (fade outs and zero crossings) we can just look at the low level those relatively small gaps (for example 250ms, enough for 250 cycles at 1000 Hz and about where we hear gaps as gaps and not just clicks) and say "see, there is plenty of dynamic range between these 250ms intervals and the peak levels."

What the whole family of standards, methodologies, and tools for measuring "Dynamic Range" in recordings is really about is determining how much compression has ruined particular recordings, or alternatively how much compression can be used without making things sound bad.

It's a matter of heuristics.  They do this by comparing some quantity of peak levels to some quantity of less-than-peak levels.

But the quantities involved are not something that it intuitively obvious, but rather related to what is found to be 'interesting' in that it correlates to things sounding over-compressed.

So it is that some Dynamic Range methodologies use histograms and others use percentiles, things that are not intuitively obvious to self appointed audio gurus.

For example, the famous DR scale is based on a comparison between the peak level (fine, that's intuitive enough) and the average of all levels that are in the 20th percentile of highest levels or above (rather counter-intuitive).  IOW, it's comparing loud vs loudest, whereas you might think that dynamic range should compare loudest and softest (but then we are back to the 96dB of digital recording systems, etc).  But the presumption seems to be that "almost of the recording is at or below this level" and it should have at least one much higher peak to be "dynamic."  It seems to me there are other ways of being dynamic.

Why not the 50th percentile, or the 80th percentile???  Apparently because choosing the 20th percentile makes the tool identify recordings that sound overcompressed as compared to the genre of recordings they come from (which score differently even if no compression were used).

Furthermore, the DR tool analysis uses short non-overlapping segments of 3 seconds each.  You could get significantly different results if you defined those segments as shorter, longer, or overlapping.  You could argue for the merit of 250 msec intervals, which could produce a radically different result.  Likewise for 10 second intervals.

You might also think that something like Equal Loudness spectral curves (such as the famous Fletcher-Munson) ought to be applied...and indeed some tools use that...but the DR tool does not.

I was inspired to research and write about this after reading a poster (Tank) at Hoffman's discussion site talk about the difference in "dynamic range" between the 30th and 40th anniversary reissues of classic King Crimson albums.  I very very much like the 40th anniversary series, released on high resolution DVD-Audio, but this poster claimed the dynamic range on many albums in the 30th anniversary series was higher (and he like them better).  (I have not listened to the 30th anniversary series, and it's not "high resolution" so I might not bother.)

It really bugged me that this simpleton poster pushed back against a recording engineer (Plan 9) who was arguing with him, claiming that the dynamic range issues were easy to understand and the engineer was just full of BS.  The poster showed tiny graphs of different songs and said "See!"  But anyone who has done any amount of audio editing knows that things can look very different at different levels of zooming in.  

Sadly the engineer didn't do a great job spelling out his case, and after being denounced by Tank he just shut up.  But he did mention micro-dynamics and macro-dynamics and as far as I can tell, these are real things and not just BS.  Right now I couldn't define either one and I still need to read and understand more in this area.

[I previously wrote similar but less complete comments about the R128 used by Roon just a few posts back.  R128 does make slightly more sense to me than DR, perhaps just because I misunderstood DR.  It still seems to me highly arbitrary, though Roon's system works pretty well in practice for keeping a constant "level," which I noted then is not immediately obvious to me how that is done from the R128 dynamic range rating which is about range and not level.]