Using Intel's QuickSync (on supported platforms):
This answer extends the answer above, with a few changes:
- For
vp9_qsv encoder wrapper, note that low power mode is mandatory (for now). Failure to set this (via the private codec option -low_power 1) will result in failure, whereupon the MFX runtime will print out a log similar to:
[vp9_qsv @ 000001b156147b40] Selected ratecontrol mode is unsupported
[vp9_qsv @ 000001b156147b40] Low power mode is unsupported
[vp9_qsv @ 000001b156147b40] Current frame rate is unsupported
[vp9_qsv @ 000001b156147b40] Current picture structure is unsupported
[vp9_qsv @ 000001b156147b40] Current resolution is unsupported
[vp9_qsv @ 000001b156147b40] Current pixel format is unsupported
[vp9_qsv @ 000001b156147b40] some encoding parameters are not supported by the QSV runtime. Please double check the input parameters.
Error initializing output stream 0:0 -- Error while opening encoder for output stream #0:0 - maybe incorrect parameters such as bit_rate, rate, width or height
Conversion failed!
This is because the QSV MFX runtime must negotiate all requirements with the device driver (iHD, on Linux) before an MFX session can register successfully. To my knowledge, this wrapper will only work on Linux at the moment. This may change in the near future.
- All examples below show a case of 1:N transcoding (ie one input used to provide multiple outputs). A complex filter chain is also in use, as well as the
tee muxer slaves calling up the underlying segment muxers.
On Intel Icelake and above, you can use the vp9_qsv encoder wrapper with the following known limitations (for now), as tested on Linux:
(a). You must enable low_power mode because only the VDENC decode path is exposed by the iHD driver for now.
(b). Coding option1 and extra_data are not supported by MSDK.
(c). The IVF header will be inserted in MSDK by default, but it is not needed for FFmpeg, and remains disabled by default.
See the examples below, taking a single input and producing multiple outputs via the tee muxer slaves calling up segment muxers:
- If you need to deinterlace, call up the
vpp_qsv filter as shown:
ffmpeg -nostdin -y -fflags +genpts \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=iHD \
-filter_hw_device va -hwaccel vaapi -hwaccel_output_format vaapi \
-threads 4 -vsync 1 -async 1 \
-i 'http://server:port' \
-filter_complex "[0:v]hwmap=derive_device=qsv,format=qsv,vpp_qsv=deinterlace=2:async_depth=4,split[n0][n1][n2]; \
[n0]vpp_qsv=w=1152:h=648:async_depth=4[v0]; \
[n1]vpp_qsv=w=848:h=480:async_depth=4[v1];
[n2]vpp_qsv=w=640:h=360:async_depth=4[v2]" \
-b:v:0 2250k -maxrate:v:0 2250k -bufsize:v:0 360k -c:v:0 vp9_qsv -g:v:0 50 -r:v:0 25 -low_power:v:0 2 \
-b:v:1 1750k -maxrate:v:1 1750k -bufsize:v:1 280k -c:v:1 vp9_qsv -g:v:1 50 -r:v:1 25 -low_power:v:1 2 \
-b:v:2 1000k -maxrate:v:2 1000k -bufsize:v:2 160k -c:v:2 vp9_qsv -g:v:2 50 -r:v:2 25 -low_power:v:2 2 \
-c:a aac -b:a 128k -ar 48000 -ac 2 \
-flags -global_header -f tee -use_fifo 1 \
-map "[v0]" -map "[v1]" -map "[v2]" -map 0:a \
"[select=\'v:0,a\':f=segment:segment_time=5:segment_format_options=movflags=+faststart]$output_path0/output%03d.mp4| \
[select=\'v:1,a\':f=segment:segment_time=5:segment_format_options=movflags=+faststart]$output_path1/output%03d.mp4| \
[select=\'v:2,a\':f=segment:segment_time=5:segment_format_options=movflags=+faststart]$output_path2/output%03d.mp4"
- Without deinterlacing:
ffmpeg -nostdin -y -fflags +genpts \
-init_hw_device vaapi=va:/dev/dri/renderD128,driver=iHD \
-filter_hw_device va -hwaccel vaapi -hwaccel_output_format vaapi \
-threads 4 -vsync 1 -async 1 \
-i 'http://server:port' \
-filter_complex "[0:v]hwmap=derive_device=qsv,format=qsv,split=3[n0][n1][n2]; \
[n0]vpp_qsv=w=1152:h=648:async_depth=4[v0]; \
[n1]vpp_qsv=w=848:h=480:async_depth=4[v1];
[n2]vpp_qsv=w=640:h=360:async_depth=4[v2]" \
-b:v:0 2250k -maxrate:v:0 2250k -bufsize:v:0 2250k -c:v:0 vp9_qsv -g:v:0 50 -r:v:0 25 -low_power:v:0 2 \
-b:v:1 1750k -maxrate:v:1 1750k -bufsize:v:1 1750k -c:v:1 vp9_qsv -g:v:1 50 -r:v:1 25 -low_power:v:1 2 \
-b:v:2 1000k -maxrate:v:2 1000k -bufsize:v:2 1000k -c:v:2 vp9_qsv -g:v:2 50 -r:v:2 25 -low_power:v:2 2 \
-c:a aac -b:a 128k -ar 48000 -ac 2 \
-flags -global_header -f tee -use_fifo 1 \
-map "[v0]" -map "[v1]" -map "[v2]" -map 0:a \
"[select=\'v:0,a\':f=segment:segment_time=5:segment_format_options=movflags=+faststart]$output_path0/output%03d.mp4| \
[select=\'v:1,a\':f=segment:segment_time=5:segment_format_options=movflags=+faststart]$output_path1/output%03d.mp4| \
[select=\'v:2,a\':f=segment:segment_time=5:segment_format_options=movflags=+faststart]$output_path2/output%03d.mp4"
Note that we use the vpp_qsv filter with the async_depth option set to 4. This massively improves transcode performance over using scale_qsv and deinterlace_qsv. See this commit on FFmpeg's git.
Notes:
This will only work on Linux, running the current media-driver package for VAAPI H/W acceleration, which ffmpeg picks via -init_hw_device vaapi=va:/dev/dri/renderD128,driver=iHD -filter_hw_device va -hwaccel vaapi -hwaccel_output_format vaapi bound to a DRI node /dev/dri/rendereD128. This is the default on single-GPU systems. However, this will change if more than one GPU is present. We use VAAPI for H/W acceleration as its' more resilient for decode acceleration. QuickSync decode is surprisingly fragile and will result in MFX errors on multiple input files.
We also derive a QSV context via the hwmap filter, called up via hwmap=derive_device=qsv,format=qsv which is then chained immediately to the format=qsv filter, specifying that we want QSV H/W frames to be fed to the adjacent filter vpp_qsv in the complex filter chain.
Warnings:
- With QuickSync, regardless of input formats and the QSV encoder wrapper in use, expect a slightly higher CPU overhead especially on smaller processors such as the
Intel AtomĀ® x7-E3950 Processor. The same overhead is greatly mitigated against on more capable desktop-grade processors and the Iris Pro capable CPUs.
- For VP9-based H/W based encoding, I'd still strongly recommend using the
vp9_vaapi encoder wrapper instead, and with caveats (related to the use of B-frames and rate control modes). VAAPI tends to be more stable overall than QSV.
References:
- See the encoder options, including rate control methods supported:
ffmpeg -h encoder=vp9_qsv
- On the
vpp_qsv filter usage, see:
ffmpeg -h filter=vpp_qsv
Warning:
Note that the SDK requires at least 2 threads to prevent deadlock, see this code block.