Friday, July 31, 2015

The Great Debate: Amir vs Arny

The Great Debate, Part 1110250, at AVSForum

The Great Debate, Part 1110251, at What is Best Forum (where Amir is a moderator).  Amir starts it by saying this thread was inspired by a thread at AVSForum, although he didn't link to it.*  The previous thread seemed to be the one…but I'm not sure because there were many others!  And, note that the threads I have linked are a hundred pages or more.  Are audiophiles obsessed with their arguments or what?  We are (yes me included) apparently argumentophiles also.

(*Sadly, typical of how Amir argues, without linking back to demonstrations of crucial points.  In contrast Arny's arguments are almost always backed up with a plethora of links, though not always of the highest quality and many are now dead, including his sadly long gone website pcabx.  For the service he has rendered to society, yes simply by being a gadfly and contrarian to the mainstream high end audio industry, I think Arny should have the resources of a titan of industry himself, but I fear it's not like that at all...)

Technically these debates are about whether the potential improvements from "high resolution digital audio" (meaning it has greater than 16 bit resolution and/or greater sampling rate than 44.1kHz).  But it's also a debate about methods, results, and personalities in all aspects of The Great Debate between White Hats and Black Hats (the terms coined by Peter Aczel, a self declared White Hat who believes that most decent amplifiers sound the same, the classic White Hat position going back to the 1970's).  We've apparently even superseded the always shaky term "Audio Objectivist" (having nothing to do with, say, Ayn Rand) and "Audio Subjectivist" with Amir calling himself the Objectivist--so where does that put Arny?  The object/subject difference hardly gets at it either, since Black Hats (and especially Grey Hats, like John Atkinson--I respect him and his self identification, and myself) may be quite into certain kinds of measurements, or at least technical phenomena, real things happening, that just don't happen to tickle the White Hat sense of importance.  And a hidden argument which is never even touched is absolutely fundamental--must everything pass DBT immediately, and what if not?  Should technical criteria (say, Jitter) be ignored simply because there is no current DBT proof of their importance?  I strongly think not, however any person not an Amir or higher titan of industry has to prioritize, and I've been shaming myself for spending several years now far more on digital issues even at the fringe of the Grey Hat regime, when I sort of intended to have been working on room acoustic issues instead.  So Jitter should not be ignored, and let's have John Atkinson's and even better jitter tests developed.  But generally, we should probably move on to more important things.  We should be skeptical of all existing Black Hat claims, but not necessarily at the existence of to-be-found issues discovered by Black Hats, even as White Hats do, though the White Hats always often end up with the more minimal explanation.  So the world needs everyone, even as it needs everyone's mind to evolve.

Amir Majidimehr (principal of Madrona, formerly VP of Windows Digital Media at Microsoft) and Arny Kruger have been frequently sparring about what equipment differences are actually audible.  And this goes way back.  Though I'm not sure how long Amir has been involved in these debates, Arny has been involved since at least 1976, and he has recounted a history of the development of the ABX test method around that time and the ABX comparator in 1982.  He was one of the principles of the ABX company.  The very creation of the ABX test method over 30 years ago was precisely to answer the question of what equipment differences are audible.  Way back in the mid 1970's as an audio society member Arny was involved in these debates, and he's still at it now on a multitude of websites.

Reading Arny (who I've never met in person) I am very impressed with his arguments and generally the ways he makes them.  I'm also very impressed with his patience and dedication.  I think I take his side, mostly.

Reading Amir, it is clear he is very smart and is a very technically qualified professional audio designer.  I believe he is honest and not a shill.  However I think he makes poor and sometimes ugly arguments (often to authority, and sometimes to his own authority, and often discrediting the authority of others) far more often than Arny does.  He also seems to me much more to be a tireless bully.  Nevertheless, Amir may be right about some things.

Sadly, in these arguments, there isn't really a suitable technical qualification.  Certainly being an electrical engineer, as such, doesn't necessarily make you an audio scientist, and The Great Debate is not engineering it is Science.  Within Science "Audio Science" is too small to be particularly tractable.

The Audio Engineering Society (AES) is really an engineering organization, not a scientific one, but it does strive mightily (and perhaps too mightily) to retain respectability.  Therefore it is not surprising it cleaves tightly to Double Blind Testing results in its papers which are often written by academic scientists and not engineers.  Meanwhile, many audio engineers don't bother with DBT's.  Many have never done DBT's and never will, but nevertheless often are believed to speak with authority about such matters.

The truth is, right now, there is no authority.  And it is difficult to establish one given all the possible economic conflicts of interest, not to mention egos, etc.

Amir starts the second thread with a post showing a DBT result which confirms his ability to hear the benefit of high resolution.  At the beginning, Arny was not posting (and in fact Amir said that Arny had permanently retired from posting after Amir had posted some brilliant refutation--another unfortunate Amirism).  I have been unable to confirm that Arny ever quit posting anywhere, and in fact Arny posted to this exact thread some time later, as well as continuing to post at AVSForum and HydrogenAudio).

It happens I have seen at least one of Arny's argument which has has often made with regards to some DBT results.  He has argued that high frequency nonlinearity in amplifiers, speakers, or headphones can produce differences at audible frequencies, and that is what people are hearing.

As it turns out, at least in the first page or so of the second thread, Amir did not reveal the particular equipment he had gotten his positive test results with.  I had read up to the first point and which that question was asked an an answer still was not provided.

Now quite often subjectivist reviewers are quite clear about what equipment they have used to perform some test, and when they do so they go into great detail about every last cable involved, because of course they believe it is all of importance.

So it is more than a bit suspicious actually that Amir left out this detail.

I have not read all of either thread, though it looks somewhat worthwhile for someone like me who remains very interested in The Debate, despite going on for hundreds of pages.

One of the high points of the second thread is where Amir presents a very respectable paper published by AES (Convention Paper 9174 presented at the 137th convention of AES in 2014, by Helen Jackson, Michael Capp, and J. Robert Stuart) recently proving the audible differences of different kinds of digital filters.  That result does cleave very much away from the "all digital sounds perfect" position of the White Hats, including Arny.

(Similar experiments in the past with positive results have been shot down on the basis of sampling artifacts caused by equipment configuration.  In a earlier blog discussing that, Arny nails the best way of preparing test material.  It should start from the High Res material then be down sampled to the lower rate.  THEN it should be up sampled to the high rate again so as to avoid playback differences caused by switching sampling rates.)

Now one wouldn't think J. Robert Stuart (a Fellow of the Audio Engineering Society) would make such mistakes.  But it seems he may have padded this test in various ways, according to Arny and others at Hyrdogen audio.  It's clear from information available that Stuart used defective Rectangular Dither in this test.  He tries to justify this on the basis that "it is often used."  He knows it is not the best because he makes products that use the better method (some kind of triangular).  The question being addressed is not Rectangular dither.  The correct approach is to use the best reasonably available technology except for the item being tested.  So, sadly, this positive looking result is not acceptable and the Convention Paper now looks more like slanted advertising for Meridian.  AJ at Hydrogen Audio writes:

The BS test is a complete farce and fabrication of results in a desperate attempt to justify $$$ales of "Hi Rez" which of course nosed dived once people realized the scam, confirmed by M&Ms AES peer reviewed tests of actual audiophools, their hardware and purported "Hi Rez" media, the EXACT conditions they and the scam industry claimed to be able to "hear" differences.
The scam industry does not require the audiophool be "trained", the "Hi Rez" equipment/system to be certified, the room to have a specific noise floor, or the music content be cherry picked and doctored as in the BS test.
And he goes on…  But I believe he meant to say "the music content must be NOT be cherry picked and doctored."  That may be required for some official standards.  But if we are talking about the limits of audibility, cherry picking sample music that best illustrates the differences between system is fine, and even doctoring that music is fine (so long as this is described).  To say that we must prove limits of audible differences only on average hifi systems with average music is ludicrous.  I am fine with parts of the Meridian test methodology.  But the whole experiment sucks because it seems the control condition uses outdated technology that was not supposed to be under test.  It is that particular part that blew it.

I will say emphatically that a person's achievements in audio engineering do not necessarily qualify that person or any person to make an authoritative statement about The Great Debate.  Even very qualified, experienced, and successful audio engineers are not necessarily up to speed on this.  It requires a skeptical stance toward many things, which is not engendered in our society, and a broad view rather than a focused specialist one.  Such skepticism is not about making money, and these days the incentive exists to show that everything is audible, because that sells more stuff.  Nor is it about "proving" anything except to the fickle and superstitious marketplace.  Successful audio engineers can therefore not be expected to have explored finding The Truth very deeply as part of being successful.  They only have their personal truths, which work for their game.

So even what some experienced engineer says is not evidence.  The only admissible evidence is from actual double blind testing done to the highest standard, and every aspect of that testing is open to deep criticism.  Will progress be made?  Who knows!  Heat death of the Sun or the collapse of human civilization may well occur first.

My position remains that the audible differences that Black Hats obsess about and White Hats dismiss are likely very small, if they exist at all.  That idea is tangentially supported by a very sophisticated exploration of the meaning of p values in testing I have been reading:
Getting a big p-value is not, by itself, very informative; even getting a small p-value has uncomfortable ambiguity. My advice would be to always supplement a p-value with a confidence set, which would help you tell apart "I can measure this parameter very precisely, and if it's not exactly 0 then it's at least very small" from "I have no idea what this parameter might be".
OK, this doesn't appear to apply to the Great Debate at all, because it's concerning the situation even when you have small P values, whereas the problem with most DBT's in The Great Debate is the continuing lack of small P values generally.  But turn it around, and you see even if we were getting small P values consistently it still wouldn be The Proof many people want.  The fact that p values are related to effect size and sample size means it's not easy to tease these things apart.  The safest thing to assume when an effect that is expected isn't verified in DBT is that the effect is small, not that it does not exist.

But this is never what the blackest of Black Hats say.  They always trumpet their unproven differences as very important, because otherwise why would anyone spend the megabucks necessary for Black Hat tweaks such as cryogenically treated cables?

We cannot trust the most successful audio designers or retailers any more than we can the most successful lawyers.   We can have much more trust in the White Hats who have gained little and have nothing to sell.  Things are not proven until they agree also.  The whole history of high end audio is full of flimflam, lies, and half-truths.  Progress has been made, but more slowly because of that.







No comments:

Post a Comment