Have you ever encountered distorted, such as clipped, audio? Techniques have not been able to fully address the challenge of recovering audio distortion, but now AI can help with this problem.
Recently, a research group did a detailed comparison of how well the latest AI techniques recover distorted audio. In this article, I’ll show how well our audio backbone techniques, from Amis and Rukai, can eliminate distortion.
What is audio distortion?
Audio distortion is an alteration of a sound signal, usually caused by exceeding the maximum volume capacity of an audio system either in the physical circuit or digital numerical precision. It can range from subtle colouration (e.g., exciter or saturator) to severe clipping of the sound waveform. This leads to a sound that is distorted and unpleasant to listen to. With the rise of deep learning technologies, scientists have been able to develop algorithms that can recover audio distortion. This could be done by analyzing the signal for “clipped” portions and then using signal processing techniques to restore them to their original waveform.
How AI can help recover audio distortion?
To recover distorted audio, signal processing techniques and machine learning algorithms can be used to filter out the noise from the original audio signal. Filtering involves separating the original audio signal from any additional noise or distortion-like sound present in the signal. These algorithms use datasets of distorted audio signals and their corresponding clean versions to learn how to reduce (filter) the amount of distortion-like sound present in a given signal. By training a deep learning model with these data sets, it learns which features are associated with distortion and then uses those features to identify and remove distortion, making it a workable option for audio recovery.
Rukai and Amis backbone work excellently to remove heavy distortion
Amis and Rukai are designed to detect and remake broken frequencies because of the microphone state, recovering to the original frequencies as much as possible. Noted, Amis and Rukai are NOT “filtering techniques” at all.
Both Amis and Rukai utilize deep neural networks with multiple layers that can detect the frequency components of distorted audio, capturing the essential audio features and remaking its timbre as it was recorded with a normal microphone.
The backbone technique developed by CrowdUnmix is the core of the Rukai and Amis AI models, which generates the most essential frequencies from user-input audio data.
Originally, the backbone was also used for remixing music, but here we found it works extremely well for reducing distortion (clipping) in listening experiences.
To ensure a fair comparison, we took the most difficult cases (input SDR=1 DB) from their research work to demonstrate our techniques. https://joimort.github.io/distortionremoval/
Furthermore, we applied distortion again on the distorted samples, creating double distortion.
This is because the samples provided from the research were only 8k frequencies at max. Although the double distortion destroys 0~8k frequencies excessively, such damage creates unwanted but informative higher harmonic frequencies that can fit into Rukai and Amis models. After Rukai and Amis analyses, resulting backbone audio samples were resample to 16khz to compare with others, including Demucs, WaveUNet, UMX, and ASPADE fairly.
Case\Method | Original | Distortion | DistortionX2 | Amis | Rukai | Demucs | WaveUNet | UMX | ASPADE |
Piano | |||||||||
Vocal+Guitar | |||||||||
Drum | |||||||||
Base+Electro |
Good news! Our backbone technique still works very well to reduce the distortion effect, even under heavy double distortion. Rukai and Amis were trained by different datasets and that’s why they gave different backbone audio. Users can choose either models for their preferences.
Although non of AI is perfect to recover distortion, this shows that AI-based audio processing techniques have great potential to reduce audio distortion. In my opinion, a good distortion should be able to restore distorted “Hey, Siri!” as close as its biometric to the bypass the phone device.
Noted 1: “Declipping” plugins can never recover the distortion audio. Declipping is a technique to “avoid” the distortion for volume amplification given a clean sample. Whenever clipping happens, it loses original frequencies globally, destroys the original frequencies, and creates unwanted harsh frequencies.
Noted 2: Practically, modern recording devices won’t give such intense distortion. Here is a real case from our user feedback. He used Rukai AI to recover distorted piano recordings.