How does FFmpeg control bitdepth and bitrate?

Question

When converting an audio with FFmpeg, there are -ar and -ac options, which control sampling rate and number of channels, respectively. It also offers -ab option, which controls bitrate, but no option to control bitdepth.

Since…

[bitrate] = [number of channels] * [sampling rate] * [bitdepth]

…does this mean ffmpeg calculates bitdepth from the other 3 parameters?

Another post says bitdepth is controlled in the format option. This confused me because the above equation may not hold any more if user sets all 4 parameters.

Could anyone please explain how bitdepth and bitrate work in FFmpeg?

flolilo · Accepted Answer · 2017-08-30T01:24:49.680

As for the possibilities of setting the bit depth: It depends on your source files and on your output-format.

For example, take pcm_s16le and pcm_s24le - both will render PCM files, but with 16bit / 24bit of bit depth respectively. (You can find this information with ffmpeg -h encoder=<YOUR_ENCODER>.)

If you have a format that can take multiple bit depths (such as libmp3lame), FFmpeg will by default specify the used value by looking at the source file: So if you take -c:a libmp3lame and your input-file has a bit-depth of 16bit, then FFmpeg will use 16 bit. If you have a 32 bit file and want to encode it in a codec that can only hold 16 bit, then FFmpeg will sample it down for you.

But you can also specify it yourself using -sample_fmt.

As for your bit rate formula: FFmpeg, like virtually all other tools, sees the number of audio channels, the bit depth, and the sample rate as fixed values (meaning that they won't change intra-file), while the bit rate is the variable that correlates with the quality of the encoding. That, however, is only true for lossy codecs; that's why re-encoding a WAV-file 200 times won't make a difference, while re-encoding the same files 20 times with a fairly decent MP3-encoder could already lead to unbearable results.

Lossy codecs add quality in your equation: e.g. in MP3, a value of 320k would accomplish a very good quality, as the encoder would not have to drop much information to stay within the specified values. If you used -b:a 64k, the encoder would have to drop many information to achieve the specified bit rate. Encoders will decrease as low as necessary to achieve the bit rate: Very trivialised and therefore only theoretical examples of information that could be dropped:

If combining all frequencies from 8-16kHz into one semi-complex sine-wave would get the encoder to achieve the bit rate, it would do so.
If deleting every signal below -32dB would do the trick, it would do.

For some real-world explanation about what information will get dropped, I suggest you start by reading Wikipedia's article about audio-data-compression.

How does FFmpeg control bitdepth and bitrate?

1 Answers1