Allow finer adjustments to playback speed, allow changes to horizontal scrolling performance
JKL is pretty familiar to anyone who uses Audition regularly. If you edit podcasts, you know how crucial this can be for the purposes of saving time. The problem is, much like the terrible means of searching for silence in Audition (for the purposes of dialogue editing, anyway), it’s far too rigid and playback sounds incredibly unnatural. Right now, I have my shuttle increases set to half speed, which gives me a roughly 1.3X playback. I’d like to use a higher speed, but anything greater than that is useless for spoken word. They’re chipmunks and you can’t understand anything. Pro Tools gives you the option to perfectly set playback speed to your preference since it allows you to enter an exact percentage. This is immensely helpful. And there should also be a toggle for how the pitch of the output changes. I mean, every single podcast and audiobook player can do this, but some of the best implementations are Overcast and Libby (and even Audible), both of which do a good job of allowing you to smoothly adjust speed for preference with minimal destruction to the “legibility” of the words spoken.
Along these lines, I use a Kensington SlimBlade for moving around my DAW. There are some options to scrolling speed and direction, and I have my upper-right button set to allow me to hold it to scroll horizontally; but what I really want is for Audition to allow me to fine-tune horizontal scrolling behavior even further. I want to be able to dial in the speed and inertia of scrolling at will and even assign hot keys to change behavior on the fly as needed.
It seems like every DAW maker, with perhaps the odd exception of Ferrite on iPadOS (which itself still has a long way to go), was blindsided by the rise of spoken word for the purposes of editing. With the onset of audiobooks and podcasts rising rapidly even today virtually every DAW is ill-suited for the task. It’s like everything out there is thinking in terms of a 3-4 minute song and not long form. There is the exception of movie scores, but even then it’s a very specialized task that’s ultimately broken up into much smaller bits that are later added to the final product in usually a video editing environment. You can sort of do this with audiobooks, but even then a typical session is hours long. Podcasts are often 30-120 minutes, easily. We need a way to move around these editing environments much, much more quickly. We need playback that is much, much more adaptable to listening quickly. We need tools like strip silence that eliminate the unnecessary stuff recorded when another person is speaking, which helps tremendously with not only overall editing speed, but also the application of effects. We need a method better than groups and locking to timeline to prevent accidentally getting audio out of sync. We need excellent multicore performance so we don’t have to rely on pre-rendering as aggressively before mixing down, which can take an hour or more for the typical hourlong podcast. Leveraging just one core is just INSANE to me in 2020.
One of these days someone will come along that will not only create a purpose-built podcasting DAW that not only thinks in the terms above, but they’ll also have it on multiple platforms (including mobile), they’ll have default layouts and preset workspaces better suited to the task of long-form dialogue, and they’ll support excellent plugins while still cutting out the cruft that comes with supporting music or soundtracks first. This wold be a huge, huge product. It’ll think about automatically routing and recording remote audio during guest call-ins while also recoding locally. It’ll think about syncing dialogue (it can use the remote track as reference, even as guests send in their local audio), it’ll think about (as perhaps an even better alternative to strip silence) automatically creating disparate clips of only the portions where people are talking using the other already synced tracks as a way to gather intelligence about who is speaking when, and so much more.
The first DAW to really understand the needs of podcast and audiobook editing — which are in some ways is much more lightweight, and in many ways more esoteric than that of music or soundtracks — is going to OWN this space. They’ll have a huge, huge hit on their hands if they make a purpose-built tool.
I saw the Sensei demo recently where they automatically isolated ums and ahs. This is indeed a start and gives me hope for Audition, but at the same time Adobe moves so painfully slow and so much of the low-hanging fruit has already been ignored. And what’s the point of a demo if I can’t use it for, what, another 2-3 years? It doesn’t help.
What helps is simpler stuff you folks can implement now.