Tag Archives: audio

MOTU??! Urrgh… *Real* Sound Engineers Only Use Prism Converters, Dontcha Know? Hiffle, piffle, plip and wank.

Earlier this year I visited Steve Albini’s Chicago studio, Electrical Audio, with the goal of not only recording some drums with the man himself, but also scrutinising his mic techniques in order to learn more about how such an incredible drum sound is achieved. The results were as fantastic as you would expect, and I publicly documented my experiences via a video presentation, thanks in no small part to the assistance of an impressively bearded cameraman by the name of Kevin Clarke. You can see the resultant video here.


In order to record the session as flexibly as possible, and preserve all naked, ungrouped signals for my scrutiny later on, I knew that I had to split the signals coming from each microphone, with one batch going to Steve for recording to tape, and another identical but independent batch leading to my digital recording system. To make this work, Steve carried out an impressive feat of ad-hoc patching in order that we could split the signals after the desk preamp. This way I would still maintain whatever Neotek goodness was being imparted on each signal. The only differing variables between the two simultaneous recordings were the recording mediums themselves; Steve recorded to RTM900 2” tape, 16 track, 15 IPS. I recorded digitally at 48 KHz, 24 bit via a pair of chained MOTU 828 mk2 interfaces.


Now, I didn’t particularly give much thought to what interface I would use for my end of the recording, given that I consider all interfaces to be much of a muchness. They all do the same thing, and they all sound pretty much as transparent as each other. Quibbling about spec sheets aside, the fact of the matter is that the analogue-to-digital converters inside all interfaces across the price spectrum these days are perfectly capable of capturing and reproducing music transparently, and any talk about the correlation between “sound quality” and price tag is, in my opinion, grounded in a whole host of psychological biases which influence our perception of “quality” to an impressive degree, even more so when you’ve actually paid over the odds for what is, at best, an imperceptibly subtle improvement. “Yeah, this shit sounds fucking sweet. Now get an awesome photo of it for the website.”


The association of price with quality is a well-documented phenomenon, and companies like Apple and Neumann are masters of its manipulation. Of course you’d pay £65 for a MacBook charger! That’s the price you pay for quality (or a flimsy piece of shit that breaks after a year). And of course you would pay £250 for a U87 cradle! Why, you’d be an unprofessional fool not to (despite the fact that any old £10 cradle would do an identical job, they just lack the correct microphone attachment). Companies have been selling us overpriced shit for years, and the justification lands largely on the credos that a particular brand name has within their particular market.


Ah, but hang on… I’m not taking into account that, when it comes to audio devices, engineers the world over can really, really hear the difference! No, really! They really can! And that’s why they’ve got Prism converters in their studios! Because their ears are, like, super mega awesome. Better than your ears, and definitely better than mine! They sleep every night in an anechoic chamber in order to recalibrate their hearing for the day ahead, and the really top guys have bionic aural implants that allow them to hear all the way up to 40 KHz! They’re like dogs! They’re literally just like fucking dogs.


The fact that so many people claim to perceive an improvement in sound quality with respect to the price of their ADCs is, to say the least, unconvincing. There’s an interesting Caltech study that actually explores this phenomenon in more depth, which you can find on their website here. I’m going to lift the gist of the study from this website, which does a fine job of summarising it:


Researchers from CalTech and Stanford told subjects that they were drinking five different varieties of wine and informed them of the prices for each as they drank. But in reality, they only tasted three types, because two were offered twice: a $5 wine described as costing $5 and $45, and a $90 bottle presented as $90 and $10. (There was also a $35 wine with the accurate price given.) Not only did the subjects rate identical wines as tasting better when they were told they were pricier, but brain scans showed greater activity in a part of the brain known to be related to the experience of pleasure. In other words, the experiment may be evidence that we genuinely experience greater pleasure from an identical object when we think it costs more.


Admittedly not conclusive findings, it nonetheless points to a phenomenon that our deepest intuitions surely corroborate; when we think something costs more, or when we have a significant vested interest, we are biased towards defending its greatness.


So anyway, back to the Albini session. Based on what I considered to be an obvious truth about the perceived sound quality of over-priced ADCs, I was happy to facilitate my recording with the closest, most convenient solution to hand; a couple of MOTU 828s that I could easily chuck in a suitcase and get to Chicago without much hassle. However, in doing that, and knowing that these devices would appear in the video, I absolutely knew that some twat would appear from the woodwork at some point and feel the need to start the tedious “MOTU converters aren’t good enough” debate, and sure enough, one plucky commenter on my YouTube channel decided to do just that.


And so the criticism raged about how “MOTU converters sound cloaked and muddy” (whatever that means), and no engineer worth his salt would use anything less than Prism Orpheus converters at £ several-hundred per channel to capture anything like the kind of magic that Steve was laying down on his Studer A820, and he should know, because he’s worked in, like, loads of studios n’ shit, and, like, all his engineer mates agree… and this one time, right, he transferred an album using MOTU converters, and then had to re-record the whole thing because MOTU converters are, like, just so shitty sounding and all cloaked and muddy and stuff.


Anecdotal accounts of this kind are unimpressive for reasons too numerous to mention. The only way to claim real knowledge of this sort, I would argue, is to have performed some pretty rigorous blind trials, ensuring that only the single variable under scrutiny is the thing in the signal chain to be altered. Or indeed, if you feel confident enough to make the claim that something sounds “cloaked and muddy”, that you have subjected this to some close scrutiny such that you can describe in less ambiguous terms the perceived character of signal degradation, and under what conditions it manifests.


So, as much as I recline from the implication that I’m writing an entire blog post just to prove one person on YouTube wrong, I do think it presents an interesting opportunity to run a few tests and explore the topic further, not least so that my own perceptions may be less clouded by mere rhetoric of this kind.


Right then, let’s get to it. The first obvious comparison to run is between Steve’s tape recording and my simultaneous digital recording. This would essentially be a comparison between ATM900 2” tape and 48 KHz, 24 bit MOTU digital. Now, there are a few issues with this comparison which are worth laying bare at the outset, the most notable of which being that the multi-track transfer from tape to computer was itself made using the MOTU devices. This will obviously upset the angry YouTubers of this world, as the claim then becomes that the very act of running the tape recordings through MOTU converters has itself imparted intractable MOTU ugliness upon the signal, and therefore the comparison is of limited utility. And yes, I would agree that ultimate scientific rigour is sadly absent from such a comparison. However, that being said, if we are to assume that the 2” tape itself is doing something uniquely magical to the signal (warm, punchy, creamy, soupy… whatever bullshit adjective you want to throw in there), then we should expect to hear some difference between that recording and a purely digital recording. If the MOTU devices have done something nasty to the signal, then it should have incurred such nastiness identically on both the original digital recording and on the tape transfers, and as such we should still be able to identify which one maintains a semblance of the analogue magic. At the very least we should be able to tell them apart. Of course, tape is identifiable by its hiss, so in order to make this comparison fair, I’ve artificially added some hiss to the digital recording.


Below this paragraph there are two files; one tape recording (digitally transferred), and one simultaneous digital recording. Both are multitrack mix-downs of a solo drum performance with matched processing – the analogue tape utilising Steve’s outboard (as described in the aforementioned video), and the digital using comparable in-the-box plugins. You are invited to download them and compare them. See if you can spot the difference. Which one has the analogue “magic”? If you email me at james [at] jamesmakesmusic.com I will happily tell you which one is which, although be warned that I don’t consider this a bullet-proof perception test, given that you still have a 50% chance of guessing it correctly. This is simply a little comparison to get us started:


So let’s now move on to the real test, and that is a test of the claim that MOTU converters sound “cloaked and muddy”, and that I am clearly a fool to be using them. I’m going to take my cue here from the brilliant Ethan Winer, acoustician and life-long debunker of audiophile claptrap, who has himself conducted an identical test to the one I am about to perform, but with Focusrite and Soundblaster converters in his sites, rather than MOTU. You can find his tests on his website here. Essentially, the logic works like this:


The claim is that ADC X sounds crappy in some way (“cloaked and muddy” in this case). Therefore by performing a loopback recording using the ADC in question, over a number of generations this crappiness should become more and more pronounced. A loopback recording simply means playing an audio file out of a stereo output of the device, and physically patching that output back into two inputs and recording it in whatever recording software you use. The resultant recording is then used as the source for the next loopback recording. And that one for the next. And that one for the next. And so on. After several generations the claim should be very easy to verify, as the signal degradation, or “cloaked muddiness” should be cumulatively imparted on the signal, such that the discrepancy between, say generation 10 and the original file is absolutely obvious. Certainly, if the ADCs under scrutiny are as crappy as has been claimed, then even after a single generation we should hear an obvious difference between the result and the original file. However, if it transpires that it is a struggle to hear any difference, and actually in a blind trial it is not even clear which is the original file and which are the subsequent generations, then it’s pretty safe to say that people like our YouTube friend may well just be subject to the aforementioned psychoacoustical biases, and therefore, simply talking shit.

Below this paragraph you will find several batches of files. I used four different pieces of music as my test subjects, ranging from grunge, through trip hop, to classical and jazz. For each genre I have posted four files; the original, generation 1, generation 5 and generation 10. I have aligned all recordings and gain matched as closely as possible. All recordings are carried out at 44.1 KHz, 16 bit, in order that any degradation manifests as obviously as possible. The goal for anyone who wants to partake in the challenge is to correctly identify each file. Once again, you can obtain the correct answers by sending me an email to james [at] jamesmakesmusic.com. If you are confident about MOTU converters being unsuitable for use because of their defects in “sound quality”, then you should have no trouble correctly identifying each file. And to RasTatum – the man who made the claim about MOTU converters sounding “cloaked and muddy” in the first place (but who has since rather curiously deleted all his comments) – I eagerly await your contribution to this experiment.


Good luck!


Nirvana – Smells Like Teen Spirit (1991)


Sneaker Pimps – 6 Underground (1996)


Royal Liverpool Philharmonic Orchestra – Beethoven’s Symphony #2 (1998)


Thelonious Monk & Gerry Mulligan – ‘Round Midnight (1957)



Finally, the last brief experiment I wished to perform was a test of the self-noise of MOTU converters, because whilst the sound quality may not be as affected as we thought, then that still leaves room for the claim that the devices themselves are noisy. So I performed another loopback test using a file of total silence as the source. I won’t bother actually posting the resultant audio files here, but I will tell you that after 20 generations I was seeing a noise floor of -72.2 dB, as you can see from the image below. I hope you’d agree that that demonstration renders concerns of this type absolutely negligible.

So there you have it. Turns out you don’t actually need to spend thousands of moneys on ADCs just because someone tells you to.




The Bullshittery Of Audio Jargon

The topic of audio recording is vast and open-ended, and discussion about associated equipment in particular often gives rise to much heated debate with respect to perceived differences in the sonic performance between devices. It is not uncommon for hostile discussions to be waged pitting the minutia characteristics of this device against that, with all parties using increasingly elaborate language to define their subjective auditory experience, yet in the process obfuscating any real scientific analysis in favour of regurgitating “buzz” words that, when examined, actually fail to reveal anything helpful about the nature of the device in question. “Warmth”, “openness”, “air”, “punch”, “creaminess”, “sheen”, “silkiness”, “purpleness”, “dogturdidness”; fluffy terminology of this nature can often be observed in industry magazines (as some notable culprits are particularly guilty of), where vast word salads are served up in an attempt to suitably bewilder the reader into believing some imposed perception about a given piece of equipment. Whether it is an industry effort to create brand association with generic “good sounding” Barnum statements, or simply sloppy journalism in which authoritarianism comes from using words that everyone is too confused to question, the amount of bullshit I witness people talking on a regular basis goes to show how successful this method is.

I find language of that nature problematic for several reasons, not least because it denies us, as students of audio recording practices, access to scientific truths with regards to our field, where discussion of imparted harmonic content via signal distortion is much more helpful than fogging the issue under a linguistic cloud of subjective terminology and thereby propagating marketing myths about the necessity of over-priced equipment. It is no doubt a valuable weapon across all levels of the audio equipment industry, each brand justifying the apparent necessity of its newest model by using words that no one really understands. It’s interesting how readily we accept this lack of clarity in the discussion of audio, and how encourageable everyone seems to be to jump on the bullshit bandwagon. Note how we don’t accept this terminology in discussion of equipment where the scientific validity of their specifications really matters – I’m sure no FMRI scanner was sold on the basis of the “punch” of the scan or the “warmth” of the images produced. We can more readily accept fuzzy jargon in that context as obviously ridiculous and unhelpful.

“Brilliance”, anyone?

One of my audio recording heroes is Ethan Winer – musician, acoustician, and owner of the acoustic treatment company RealTraps – Ethan is somewhat notorious for his efforts to debunk tenacious myths prevalent among recording enthusiasts, whilst grounding his discussions in empirical scientific analyses, thereby abstaining from and often criticising the use of ambiguous subjective terms. I highly recommend his book “The Audio Expert” in which he talks about this very topic:

“Some of the worst examples of nonsensical audio terms I’ve seen arose from a discussion in a hi-fi audio forum. A fellow claimed that digital audio misses capturing certain aspects of music compared to analog tape and LP records. So I asked him to state some specific properties of sound that digital audio is unable to record. Among his list were tonal texture, transparency in the midrange, bloom and openness, substance, and the organic signature of instruments. I explained that these are not legitimate audio properties, but he remained convinced of his beliefs anyway. Perhaps my next book will be titled Scientists Are from Mars, Audiophiles Are from Venus.”

With this in mind then, allow me to demonstrate the principle of audio bullshit in action. As I came to undertake an investigation into the sonic differences between several different microphone preamps (post on that soon), I encountered a 2007 article from Sound On Sound in review of the Neve Portico 5012 Dual Microphone Preamp. As my curiosity led me to probe how such a device can justify a £1,400 price tag, one sentence in particular proved to be such an excellent demonstration of the ambiguity of industry terminology that I was inspired to finally write this blog post, hailing my discovery as a gold standard of audio bullshittery. Let’s have a look:

“The 5012 […] has a full bodied, solid sound that gives that slightly larger-than-life character that is the trademark of a really top-class preamp. It sounds clean and detailed in normal use, without that edgy crispness that can detract in some designs…

When the Silk mode is switched in, the sound becomes a little smoother, rounder, and sweeter still in the upper mids. The high end gains a little more air, and the bottom end becomes a tad richer and thicker.”

Terms like “larger-than-life” and “edgy crispness” are rampant when describing microphone preamps, analogue-to-digital converters and other studio essentials, yet they say nothing useful whatsoever about the actual, verifiable sonic characteristics of the device, instead simply propagating the usage of these vague terms and using them as flimsy justification for impressionable enthusiasts to feel anxious about the “below-par” consumer grade equipment they are currently using, and therefore encouraging them to unnecessarily part with not insignificant sums of money, thereby continuing the trend. That’s not to say of course that there is no value in “high-end” gear such as this, however I would prefer that its usage could be justified in more certain terms than these floppy, nothing words that we all have to keep grappling with. In my experience it’s always worth pushing for clarification via language that is arrived at through scientific consensus so that we can all be on the same page in terms of our expectations. This is the best prophylactic available against the tech-heads who claim authority by asserting that their £X,000 device sounds “sweet”. Chances are, they’re talking bullshit.

Comb Filtering In Drum Overhead Microphones

Recording drums in a small room is a problem that any engineer not blessed with an infinite budget must deal with at some point. Among the difficulties inherent in this scenario is the problem of comb filtering in the audio signal due to the microphone’s proximity to a boundary, i.e. the ceiling or a nearby wall. For example, if a singer sang into an omni-directional microphone placed 1 metre from a reflective wall or surface, the sound of their voice would hit the mic but also carry on past it, hit the wall, rebounding back and re-entering the mic about 6 milliseconds after the direct signal.



This is exactly the right amount of time for the frequency components around 85-86Hz to come back close to 180° out of phase with the direct signal. There will not be total cancellation, since the rebounded signal will be weaker and because the sonic characteristics of the singer’s voice are constantly changing, but the effect may still be significant.



Rounding down to 85Hz, at 170Hz the reflection will come back in phase and reinforce the 170Hz components within the direct signal. At 255Hz it will be out of phase again, and at 425Hz and 595Hz, and at intervals of 170Hz all the way up the frequency spectrum. This is known as “comb filtering”, due to the regular series of peaks and notches across the spectrum. It sounds phasey and generally undesirable.

This effect is demonstrated in this video, where a drum overhead microphone is moved towards a nearby boundary and back again. The comb filtering artefacts are clearly audible in the recorded signal. The first microphone – a Royer R121 ribbon mic – clearly suffers from this effect with great prominence given it’s bi-directional polar pattern, and thus greater susceptibility to rear reflections. The second mic – an Audio Technica ATM450 – reveals itself to be less harshly affected due to its cardioid polar pattern. This then demonstrates the importance of microphone selection with regard to its placement within a recording environment, as well as the importance of placing the mic as far from boundaries as possible, or, when this is not feasible, treating nearby surfaces with good quality acoustic absorption in order to eliminate as many reflections as possible. A combination of absorption and diffusion is most effective.


Many thanks to my beautiful assistant, Bebe Bentley, for helping me with these tests. Check out her excellent work in film and moving image on her Vimeo page.