Six-time Grammy winner Frank Filipetti is an acclaimed music producer, engineer and mixer. Born in Bristol, Connecticut, and based in New York since 1971, Frank was one of the first engineers to embrace digital mixing. His early credits include his first number one album, Foreigner’s Agent Provocateur (which included the single “I Want to Know What Love Is”), Kiss' Lick It Up, Carly Simon’s Coming Around Again, Hole’s Celebrity Skin, and Rod Stewart’s The Great American Songbook Volumes 1 and 2.
Filipetti is also a big proponent of immersive audio, having remixed classic albums by the likes of Billy Joel, Carly Simon, Meatloaf, and James Taylor in 5.1 surround sound. More recently, he’s embraced Dolby Atmos and mixed much of George Michael’s back catalog in the format.
I had the chance to speak with Frank about his career, approach to immersive mixing, as well as his thoughts on prescient topics in the world of professional audio such as streaming and high-resolution sound.
You’ve been an advocate of surround music since the early days of 5.1, with James Taylor’s Hourglass being one of the first multichannel SACD releases way back in the late-90s. How were you introduced to the concept of mixing in surround, and what do you enjoy about it most?
I was doing a lot of work for Sony at the time, with James Taylor’s Hourglass being one such project. I was also very interested in and working with some of the folks over there on their DSD project. This was around the turn of the century.
At the time, I was mixing a lot of material in the DSD format. For those who aren’t familiar with it, that stands for Direct Stream Digital. Instead of the standard 16-bit or 24-bit PCM model, it uses something called delta modulation in place of bit depth. So it's actually 1-bit recording, sampling at over a million times per second. Later on, it went up to 2-bit and even higher. It was a wonderful format and sounded great, but didn't have great editing capabilities, which I think led to its demise.
I had mixed some James Taylor stuff in that format, and they brought up the idea of expanding the DSD format to include a 5.1 version as well. We decided to mix Hourglass in that format, and my go-to guy in New York, David Smith (who unfortunately passed away quite a few years ago), set me up with a DSD multi-channel workstation. I had a ball making the 5.1 mix.
To this day, I think you still can’t import, export, or edit DSD files in Pro-Tools
My good friend Gus Skinas is still involved with the DSD project, it’s still going on. It is more difficult to edit and you can understand why.
In PCM, you’re taking a sample and the bit-depth determines the level of that particular sample. Each sample is kind of a snapshot, but in the DSD world they only indicate if the level is rising or falling. That’s why they call it one-bit, because each sample is either “plus” or “minus” relative to the last sample–so all the samples are all related in a way that’s arbitrary. The metadata merely stating if the level has gone up or down in reference to the prior sample isn’t really enough information to drop a marker for an edit, but they’ve somehow figured out a way around this.
It’s a lovely-sounding system, but I don't think it will ever get that much traction because of these difficulties. The concept is very cool though, because if you think of it in terms of a graph, each sample being a plus or minus basically means that it’s tracing the analog waveform. So in many ways, it's a more-perfect digital representation of the analog waveform than PCM.
It’s ironic that many SACD players end up converting DSD back to PCM on the user-end.
Oh yeah, absolutely. Like I said, it’s fascinating but not entirely ubiquitous.
There are so many different immersive mixing styles used by engineers. Some are more conservative and use the extra speakers primarily for back-of-the-hall ambience, while others are more experimental and place isolated instruments behind or above the listener. Can you talk about your decision-making process for utilizing the extra channels? Are you trying to create a ‘center of the band’ or ‘in the audience’ perspective? Where do you draw the line between ‘immersive’ and ‘gimmicky’?
I started mixing in 5.1 before it hit the market, so there weren’t really any reference points or examples to go off. I was working with Phil Ramone on a lot of projects at the time, and I think we actually produced the first commercially-available multichannel music with a company called N2K. It was a Dave Grusin record, West Side Story (1997).
At that point in time, I was approaching it as “super-stereo.” I’d try to get the mix sounding just like stereo, but blown up and moved out to the sides. For James Taylor’s Live At The Beacon (1998), I’d put the band more towards the front and the audience to the rear.
Over the years, my technique has gotten a little more adventurous in the sense that I started placing more towards the back. I was kind of spurred on by my good friend Elliot Scheiner, who had no problem with putting anything in the rear speakers. Some people have a problem with that, but I love it.
When I’m mixing in Dolby Atmos or Sony 360 Reality Audio today, I like to use the whole palette. I don’t mind having drums in the back, or the vocals moved further out into the room. I’m looking to evoke emotion rather than go for exactly what was in the stereo mix. It’s important to recreate the vibe of the original stereo mix when you’re doing a catalog album, but I like being able to add some emotional context into the process.
Since the introduction of 5.1 surround in the late ‘90s, there’s been debate over how the center speaker should be utilized in a music-only context. Where do you stand on the center channel? I’ve noticed that your 5.1 mixes tend to engage the center quite aggressively, for dry vocals and other key elements.
Prior to doing these music-only DVD or SACD projects, I did a lot of film mixing. The center channel is there for dialogue, and it’s useful. If I’m looking for real intimacy, I find that the phantom center just does not do as good a job as the center channel.
It can be a double-edged sword though: The vocal coming from a single speaker solves the comb-filtering problem you get with a phantom center, but it can make the singer’s voice sound brighter or even harsher. With some judicious EQ, you can get it to work.
Once you get into more complex surround formats like 5.1.4 or 7.1.4, you’ll find that the comb filtering happens with all of the speakers. So you have to be careful about how you place things. I love the center channel though, and the way it can anchor the vocal. That doesn’t necessarily mean I’ll always use it for vocals though, I just did three George Michael albums in Atmos & 360RA and his vocals are all over the place. Sometimes it's in the center, other times left and right with some center divergence. I’ll even move it out into the room on occasion, depending on the mix. I love having the ability to play around like that.
In the early-2000s, you had the opportunity to remix a number of high-profile albums by artists such as Billy Joel, Carly Simon, and Meatloaf in 5.1 surround for release on Super Audio CD or DVD-Audio. What was it like working on these projects? Were the artists usually involved in the remixing process?
Very rarely. I did Billy Joel’s The Stranger (1977) and 52nd Street (1978) with Phil Ramone, who had produced the original records. I felt more comfortable with experimentation and taking chances on those, since I was there with someone heavily involved in the original production.
As much as you want to play and be adventurous, you don’t want to muck up the original intent of the stereo mix. Some artists don’t love the idea of sounds being separated out into different speakers.
It’s understandable though, to the artists these albums are like their children. Doing the remixes is like trying to take over someone else’s kids for a month and change the way their parents raised them. I always respect the artist, even if I disagree with their opinion. If I’m left to my own devices though, I try to stay as true to the original intent as possible while still allowing myself a certain degree of creativity.
There are a lot of rumors and hearsay in the audiophile community about 5.1 remixes that were completed and left unreleased during those SACD/DVD-A years, stemming mainly from press articles and product catalogs.
It’s because the format started dying. It was not a trivial matter to release something in 5.1, in the sense that you’d have to find the old multitrack tapes, pay for mixing time, and have the result mastered for disc replication. On Meatloaf’s Bat Out Of Hell (1977) for example, we had to do a lot of forensic work because there was never a clearly-labeled master.
There was a format war at the time between Sony’s SACD format with DSD and the DVD-Audio high-resolution PCM format. This created all kinds of issues and problems. Elliot Scheiner actually developed a 5.1 system for automobiles, but only the Acura brand used it.
We talked to many manufacturers, but they had a “let’s wait and see” kind of attitude to the whole thing. It started with a lot of fanfare, but quietly wound down to the point where the record companies just didn’t want to deal with it anymore. I remember hearing a Sony exec say something to the effect of “I don’t want to hear the word ‘surround’ anymore,” so it died off from a combination of there being no real marketing push and a lack of available new products.
I’m hopeful that this new big push towards immersive, with Dolby Atmos and Sony 360, is a lot more viable and will stand the test of time. When you listen on a great system, there’s nothing like it.
How does mixing in Dolby Atmos differ from 5.1? I understand that it requires a whole new way of thinking, as specific sounds can be assigned to “objects'' located anywhere in a three-dimensional space.
As you said, it’s based on objects. The objects are basically tracks with embedded metadata indicating the position and level. For the mix to play back properly, it has to be rendered on-the-fly by a separate application outside Pro Tools.
When the Atmos mix file, the ADM file, is played back on another system, it passes through the Dolby processor again. So all the information appears not only in the right spot, but also in the right level. It will also scale up-and-down depending on the size of your system. It’s a very complex process.
How adventurous are you with regard to the height channels? Will you place distinct vocals and instrumentation above the listener?
Absolutely. It’s like in the old days of mono transitioning to stereo, when a lot of mixers were afraid to pan things because listeners had their system set up wrong. When I started engineering in the ‘80s, I wasn’t afraid to fully utilize the stereo field. The same goes for 5.1, and the 7.1.4 format I’m working with now. I love the height channels and plenty of stuff gets sent up there.
With the introduction of Apple’s Spatial Audio format, it seems most listeners are experiencing a binaural approximation of Atmos over headphones rather than the full immersive experience on a home theater setup. Do you ever check how your Atmos mixes sound on headphones, or are they only meant to be heard on speakers?
A lot of people are going to be listening in binaural, so you do have to take that into account.
What most don't understand is that the binaural mix is not a perfect representation of the surround mix you’ll hear on speakers. It just doesn’t have the same width, depth, and height you’d get on a 7.1.4 system. It’s more like a good stereo mix, which is great because previous efforts to fold 5.1 down to stereo were almost always not totally successful.
These days, Dolby and Sony have worked very hard to create an impression of surround in the binaural mix. It’s an interesting alternative to the stereo mix, but I don't think it comes to the level of hearing it on a great system through speakers. That being said, most people will be hearing it this way and you’ll want to make sure it works. I’d like it to get better though.
You mentioned the Sony 360 Reality Audio system. To my knowledge, there are very few if any pieces of consumer hardware that incorporate the decoder for it. While there are plenty of 360RA-mixed albums on streaming platforms such as Tidal, I’m not sure how listeners are playing them back. Do you know if and when this will change?
Let's hope it’s soon, because I'm doing a lot of work with it. I don't totally keep track of all this stuff, but there are definitely discussions going on with folks at Amazon and Apple about expanding immersive capability. Amazon’s Echo speaker will soon feature a processor to handle the Sony format. It’ll likely happen soon, as there are a lot of new albums mixed in this format.
You mentioned earlier that the George Michael immersive albums are available in both Dolby Atmos and 360RA. Did you create a unique 7.1.4 mix for each format, or are mostly identical?
It's hard enough to get your client or your artist to sit down and listen to one format, let alone two. If I get approval on the 360 playback, I try to make the Atmos as close as possible and vice versa. The good news is that’s completely doable. The systems are similar, but each have advantages and disadvantages.
With Dolby, I don’t love the placement of the height channels. According to their official specs, the heights circumscribe an area of the room that’s smaller than the bottom channels. When I build a room, it’s usually a big rectangle like Sony recommends for 360. On the other hand, Atmos seems to be much more ubiquitous and available. I like that it has a dedicated subwoofer channel, which 360RA does not. Once I get something that works, I kind of just gravitate from one to the other. There shouldn’t be any serious difference.
I didn’t know about Dolby’s recommended placement of the height speakers, I have mine positioned parallel to the corresponding floor channels forming a large rectangle.
Yeah, you have to remember that Dolby Atmos was originally set up as a film mixing process. They gravitated toward music application later on.
The concept in the film world is that you wouldn’t have height channels above the main speakers in front of the screen, so they brought them about two-thirds of the way forward where the side channels enter the picture. Like you, I have my height channels in the same configuration as my left front, right front, left rear, and right rear.
Another odd thing is that Dolby and Pro-Tools don’t allow a 7.1.4 music bed. I don’t know why that is, as nearly every demonstration of these formats I’ve seen uses that configuration.
There's a lot of weird areas as you get deeper into it, but at the end of the day I prefer both systems to nothing at all. I hope Sony and Dolby continue to refine these systems as it’s a wonderful way to listen to music.
When it comes to new multichannel music content, streaming seems to have overtaken physical discs as the method of consumer delivery. What are your thoughts on the sound quality of immersive streaming? There is a certain irony to the fact that we’ve gone from these unimaginably high-resolution 5.1 disc formats of the early-2000s to what’s essentially 12 discrete channels of MP3 sound.
I haven't really heard what happens streaming-wise yet. I've been so busy trying to get these mixes done. Delivering separate Atmos, 360RA, and binaural versions takes quite a bit of time! If Sony and Dolby are able to get the streaming thing to really work, it’s nothing but a plus for all of us.
As a longtime surround music fan, I’ve been amazed at just how much new content there is up on Apple Music and Tidal. There’s more than I have time to listen to, which is unprecedented.
It’s great, but the obvious downside of that is not being able to own a physical product. I’m old-school in the sense that I like owning the artist’s work, a little piece of their creativity. My kids are more than happy with streaming. I do wish that Sony and Dolby would release more products on Blu-Ray, but we’ll just have to wait and see what happens.
Universal Music Group is still pretty good about including Blu-Rays with immersive in their elaborate deluxe editions for bands such as The Beatles, The Band, or Kiss, but it’s seeming more and more like the exception rather than the norm. It’s particularly frustrating because you did an Atmos mix of George Michael’s Older (1996), which isn’t being included in the upcoming deluxe box set of that album.
I don't know what the plans are with that. If that's true, it's definitely disappointing. It can’t be an issue of rights and so forth, because all of that's already been taken care of. As for what goes into the marketing of these things and why, I’ve never understood it.
Perhaps those making the decisions don’t fully understand what the product is? I’ve yet to meet anyone who hasn’t been impressed by a demonstration of surround sound music, be it 5.1 or Dolby Atmos.
I remember in the DVD-Audio days, improperly-configured systems were a huge issue. They’d have the center speaker out of phase or the rear channels reversed. It’s a complex undertaking, and no one ever wrote a clear manual on how to do it.
I had a conversation with Alan Parsons about the center channel some years ago. He told me that a lot of people’s setups didn’t even have a center speaker? So you’d have 5.1 mixes with all the vocals missing. We’re always going to have these issues, but the idea of making it more compatible–be it through object-based audio or binaural rendering–is very exciting for me.
When it comes to immersive streaming, I’ve found that the bandwidth is variable based on groups of speakers. The sound quality of the front channels may be significantly better than the rears or heights, but somehow it all hangs together.
Yeah, I’ve heard people complaining about the sound quality of MP3s for years. It’s not high quality audio by any means, but is it really that much worse than the singles we used to listen to over AM radio? The most important aspect of this for me is still the emotional experience, but I wish that more people were interested in high-resolution sound. We need to get back to that idea of listening to an album on a good stereo with the lights turned down.