The News: At its Adobe MAX event in Japan, Adobe previewed Project Sound Lift, a new AI-powered technology that is designed to separate recordings into separate tracks consisting of voices, non-speech sounds, and background noise. The technology is designed to allow creators to manipulate and optimize audio within video recordings. Adobe’s Enhance Speech technology—now available in Adobe applications such as Premiere Pro—is integrated within Project Sound Lift to further transform the way creators produce and control studio-quality audio content. More information on this survey and research can be found on the Adobe website.

Adobe Unveils Project Sound Lift at Adobe MAX in Japan

Analyst Take: Adobe recently unveiled Project Sound Lift, an AI-powered technology that separates speech recordings into distinct tracks of voices, non-speech sounds, and other background noise in a video. Project Sound Lift, currently in preview, helps users effortlessly manipulate audio recordings across a range of scenarios, leveraging AI to independently enhance, transform, and control speech and sound independently. Project Sound Lift also incorporates Adobe’s Enhance Speech technology, which uses AI to remove noise and improve the quality of dialogue clips.

Developed by speech AI researchers at Adobe Research, Project Sound Lift was announced at Adobe MAX in Japan as part of Adobe’s Sneaks showcase, where Adobe engineers and research scientists offer sneak peeks at prototype ideas and technologies, each showing future potential to become important elements of Adobe products that are trusted by millions of users across the world.

Although there are other tools used to clean up audio files—Supertone’s CLEAR separates audio into three tracks consisting of ambient sounds, voice, and voice reverb—Adobe’s Project Sound Lift can detect and split specific sounds, including speech, applause, laughter, music, and other various noises, into distinct tracks. Each track can be individually controlled to enhance the quality and content of the video.

Ubiquitous, High-End Camera Phones Have Created a Mismatch Between Video and Audio Quality

The ubiquity of high-quality digital cameras embedded into consumer devices such as the iPhone and other smartphones and tablets has led to a technology mismatch for content creators, both consumers and those generating content for business use. These devices can capture still and moving images at a quality level that rivals or sometimes exceeds professional, standalone video equipment, in a convenient, portable, and unobtrusive form factor.

However, these cameras are not capable of capturing sound with a commensurate level of quality without using outboard gear, such as a shotgun, handheld, or lavalier microphone, which requires additional setup time and is generally more intrusive and less portable than simply using a smartphone. If this specialized sound-recording equipment is not used, live recordings pick up ambient noise, other speakers, and other unwanted sounds that can mar an otherwise acceptable recording.

AI-Driven Software to Level Up Video and Audio Quality

To address this quality disparity, the preview version of the software lets users import an audio file into the app and then simply choose which sound they want the tool to isolate. Project Sound Lift then uses AI to identify the disparate sounds within a video, filtering out distinct voices, background noise, including crowd noise, applause, wind noise, or other ambient noise, and other non-speech sounds. It then creates separate tracks for each sound, which can be altered, enhanced, or otherwise manipulated. Although Adobe has yet to disclose timing for a beta or general release of Project Sound Lift and has not provided any details into which applications the technology will be embedded, there is a lot of potential for both corporate and consumer use.

According to Adobe, the technology is currently capable of separating only predefined categories in a typical speech recording where there are one or several main speakers, ambient sounds (including music), and random event sounds, including applause, laughter, birds, dogs, or other noises. Notably, background music is separated into a single track together with ambient noises, and currently, the technology cannot be used to separate distinct frequency ranges or signatures in a music track. This limitation is a benefit to Adobe and its users, largely because it prohibits users from being able to strip out elements of a copyrighted piece of music. As such, the technology can be safely used in enterprises that would frown upon offering access to technology that could potentially violate copyright laws.

Disclosure: The Futurum Group is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.

Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of The Futurum Group as a whole.

Other insights from The Futurum Group:

Adobe Firefly: Blazing a Generative AI Application Trail

Adobe Express With Firefly Beta Is Now Generally Available

Adobe Revenue for Q3 2023 Again Sets a Record at $4.89 Billion

Author Information

Keith Kirkpatrick

Keith Kirkpatrick is Research Director, Enterprise Software & Digital Workflows for The Futurum Group. Keith has over 25 years of experience in research, marketing, and consulting-based fields.

He has authored in-depth reports and market forecast studies covering artificial intelligence, biometrics, data analytics, robotics, high performance computing, and quantum computing, with a specific focus on the use of these technologies within large enterprise organizations and SMBs. He has also established strong working relationships with the international technology vendor community and is a frequent speaker at industry conferences and events.

In his career as a financial and technology journalist he has written for national and trade publications, including BusinessWeek, CNBC.com, Investment Dealers’ Digest, The Red Herring, The Communications of the ACM, and Mobile Computing & Communications, among others.

He is a member of the Association of Independent Information Professionals (AIIP).

Keith holds dual Bachelor of Arts degrees in Magazine Journalism and Sociology from Syracuse University.

Adobe Unveils Project Sound Lift at Adobe MAX in Japan

Adobe Unveils Project Sound Lift at Adobe MAX in Japan

Ubiquitous, High-End Camera Phones Have Created a Mismatch Between Video and Audio Quality