97

I extract audio clips from a video file for speech recognition. These videos come from mobile/other handmade devices and hence contain a lot of noise. I want to reduce the background noise of the audio so that the speech that I relay to my speech recognition engine is clear. I am using FFmpeg to do all of this stuff, but am stuck at the noise reduction phase.

Till now I have tried following filters:

ffmpeg-20140324-git-63dbba6-win64-static\bin>ffmpeg -i i nput.wav -filter_complex "highpass=f=400,lowpass=f=1800" out2.wav

ffmpeg -i i nput.wav -af "equalizer=f=1000:width_type=h:width=900:g=-10" output.wav

ffmpeg -i i nput.wav -af "bandreject=f=1200:width_type=h:width=900:g=-10" output.wav

But the results are very disappointing. My reasoning was that since speech comes under 300-3000 Hz range I can filter out all other frequencies to suppress any background noise. What am I missing?

Also, I read about Wiener filters that could be used for speech enhancements and found this but am not sure how to use it.

Franck Dernoncourt
  • 24,246
  • 64
  • 231
  • 400
Sudh
  • 1,083

8 Answers8

73

If you are looking to isolate audible speech try combining a lowpass filter with a high pass filter. For usable audio I have noticed that filtering out 200hz and below then filter out 3000hz and above does a pretty good job of keeping usable voice audio.

ffmpeg -i <input_file> -af "highpass=f=200, lowpass=f=3000" <output_file>

In this example add the high pass filter first to cut the lower frequencies then use the low pass filter to cut the higher frequencies. If needed you could run your file through this more than once to clean up higher dB frequencies within the cut frequency ranges.

Franck Dernoncourt
  • 24,246
  • 64
  • 231
  • 400
av8r
  • 865
62

FFmpeg now has 3 native filters to deal with noise background:

  • afftdn: Denoises audio samples with FFT
  • anlmdn: Reduces broadband noise in audio samples using a Non-Local Means algorithm
  • arnndn: Reduces noise from speech using Recurrent Neural Networks. Examples for model files to load can be found here.

Also, since some time, one can use ladspa (look for noise-supressor) and/or lv2 (look for speech denoiser) filters with FFmpeg.

slhck
  • 235,242
25

Update: FFmpeg recently added afftdn which uses the noise threshold per-FFT-bin method described below, with various options for adapting / figuring out appropriate threshold values on the fly.

anlmdn (non-local means) is a technique that works well for video; I haven't tried the audio filter.

Either of these should be much better than highpass / lowpass, unless your only noise is a 60Hz hum or something. (Human speech can still sound ok in a pretty narrow bandpass, but there are much better ways to clean up a broadband noise background hiss.)


FFmpeg doesn't have any decent audio filters for noise-reduction built in. Audacity has a fairly effective NR filter, but it's designed to be used with 2-pass operation with a sample of just the noise, and then the input.

The comments at the top of https://github.com/audacity/audacity/blob/master/src/effects/NoiseReduction.cpp explain how it works. (basically: suppress every FFT bin that's below the threshold. So it only lets signals through when they're louder than the noise floor in that frequency band. It can do amazing things without causing problem. It's like a band-pass filter that adapts to the signal. Since the energy of the noise is spread over the whole spectrum, only letting through a few narrow bands of it will reduce the total noise energy a LOT.

See also Audio noise reduction: how does audacity compare to other options? for more details of how it works, and that thresholding FFT bins in one way or another is the basis of typical commercial noise-reduction filters, too.

Porting that filter to FFmpeg would be a bit awkward. Maybe implementing it as a filter with 2 inputs, instead of a 2-pass filter, would work best. Since it only needs a few seconds to get a noise profile, it's not like it has to read through the whole file. And you SHOULDN'T feed it the whole audio stream as a noise sample, anyway. It needs to see a sample of JUST noise to set thresholds for each FFT bin.

So yeah, a 2nd input, rather than 2pass, would make sense. But that makes it a lot less easy to use than most FFmpeg filters. You'd need a bunch of voodoo with stream split / time-range extract. And of course you need manual intervention, unless you have a noise sample in a separate file that will be appropriate for multiple input files. (one noise sample from the same mic / setup should be fine for all clips from that setup.)

Franck Dernoncourt
  • 24,246
  • 64
  • 231
  • 400
Peter Cordes
  • 6,345
23

I had a video with vary bad background noise. I managed to fix it in this way: I did two passes with the following command:

ffmpeg -i input.mp4 -af "afftdn=nf=-25" file1.mp4

ffmpeg -i file1.mp4 -af "afftdn=nf=-25" file2.mp4

Than I used in order to clarify the speak:

ffmpeg -i file2.mp4 -af "highpass=f=200, lowpass=f=3000" file3.mp4

At the end increased the volume with:

ffmpeg -i file3.mp4 -af "volume=4" finaloutput.mp4

In this way I managed to have a fairly good audio. Anyway sound is something subjective and what is good for me can be not for others.

Franck Dernoncourt
  • 24,246
  • 64
  • 231
  • 400
Matteo M.
  • 231
  • 2
  • 2
9

To complete user564335's answer:

This: -af arnndn=m=cb.rnnn is probably the best noise filter I have used with ffmpeg (AI based).

Like this:

ffmpeg -i <input_file> -af arnndn=m=cb.rnnn <output_file>

No need for frequency band-filters. Trained models (files.rnnn) available here (you need to download and use one of the files).

The cb (conjoined-burgers) model is the one I found most impressive and versatile. I also found this filter pretty efficient (doesn't seem to use more CPU than the loudnorm filter for instance).


Also, since ffmpeg 5.0, there is a new noise filter: afwtdn.

If I remember well, I tried it but it wasn't as efficient as the trained neural network above IMHO.

Totor
  • 1,581
6

The combination of lowpass and highpass filters with afftdn is quite impressive, I've managed to clean old vhs video from white noise with this configuration:

-af "highpass=200,lowpass=3000,afftdn"

Alex
  • 61
  • 1
  • 1
2

For my heavy broadband noise I attempted The Audacity Noise Reduction, and Noise Gate.

Now I am Trying with ffmpeg anlmdn.

First understand what NLM is (https://youtu.be/Va4Rwoy1v88?t=105)

Second check the filter api (ffmpeg -h filter=anlmdn)(http://ffmpeg.org/ffmpeg-filters.html#anlmdn)

Filter anlmdn
  Reduce broadband noise from stream using Non-Local Means.
    slice threading supported
    Inputs:
       #0: default (audio)
    Outputs:
       #0: default (audio)
anlmdn AVOptions:
  s                 <float>      ..F.A....T. set denoising strength (from 1e-05 to 10) (default 1e-05)
  p                 <duration>   ..F.A....T. set patch duration (default 0.002)
  r                 <duration>   ..F.A....T. set research duration (default 0.006)
  o                 <int>        ..F.A....T. set output mode (from 0 to 2) (default o)
     i               0            ..F.A....T. input
     o               1            ..F.A....T. output
     n               2            ..F.A....T. noise
  m                 <float>      ..F.A....T. set smooth factor (from 1 to 15) (default 11)

This filter has support for timeline through the 'enable' option.

(Optional)Confirm the api by breaking it:

ffmpeg -i "input.ac3" -af "anlmdn=-99999999999999999:-99999999999999999:............" test.ac3

After Playing with the filter with this very broadband noise sample

Noise sample and test different arguments

Think of Radius as the time span sample for a given frequency tuple. Think of Smooth as the high span that the frequency are grouped.

High Smooth process better noises that are oscillating in bands. High radius process better noises that are very punctual like a click.

Play command: ffmpeg -i "sample.ac3" -af anlmdn=s=7:p=0.002:r=0.002:m=15 test.ac3

dolpsdw
  • 21
2

This is an old question but it's worth noting that ffmpeg has moved on a bit since it was asked.
Specifically, in version 5, the filter dialoguenhance was added.

8.78 dialoguenhance Enhance dialogue in stereo audio.

This filter accepts stereo input and produce surround (3.0) channels output. The newly produced front center channel have enhanced speech dialogue originally available in both stereo channels. This filter outputs front left and front right channels same as available in stereo input.

The filter accepts the following options:

original Set the original center factor to keep in front center channel output. Allowed range is from 0 to 1. Default value is 1.

enhance Set the dialogue enhance factor to put in front center channel output. Allowed range is from 0 to 3. Default value is 1.

voice Set the voice detection factor. Allowed range is from 2 to 32. Default value is 2.

The options can be a bit brutal, so running with the defaults is usually best.

Using it in combination with other filters, such as lowpass, highpass and afttdn can produce impressive results.

For example:

ffmpeg -i interview.wav -af "highpass=f=300,asendcmd=0.0 afftdn sn start,asendcmd=1.5 afftdn sn stop,afftdn=nf=-20,dialoguenhance,lowpass=f=3000"  -f matroska - | ffplay -i -

Results in a test using ffplay where:

  • First everything below 300Hz is thrown away;
  • the Denoise with FFT filter is used with a noise profile of the first 1.5 seconds of the audio and a noise floor of -20dB;
  • then we enhance the dialogue;
  • finally remove frequencies above 3000Hz

As an aside, the reason this is using ffmpeg to pass the output to ffplay is because I don't have a version 5 copy of ffplay, thus the dialoguenhance filter would be missing, if I tried to use it directly.