11

I am trying to encode a 10-bit H.265 video from a 8-bit H.264 source using ffmpeg with CUDA hardware acceleration.

Without hardware acceleration, a typical command would be ffmpeg -i input.mkv -pix_fmt yuv420p10le -c:v libx265 -crf 21 -x265-params profile=main10 out.mkv.

Using CUDA (on a Pascal 1050 Ti), I expect the corresponding command to be ffmpeg -i input.mkv -pix_fmt yuv420p10le -c:v hevc_nvenc -profile:v main10 -cq 21 out.mkv.

However, when I list the supported encoder settings using ffmpeg -h encoder=hevc_nvenc (output of command pasted below), even though it supports main10, no 10-bit pixel format is supported: Supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 rgb0 cuda d3d11. Does the encoding work for my use case, or not?

ffmpeg version 4.3.1-2021-01-01-full_build-www.gyan.dev Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 10.2.0 (Rev5, Built by MSYS2 project)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-w32threads --disable-autodetect --enable-fontconfig --enable-iconv --enable-gnutls --enable-libxml2 --enable-gmp --enable-lzma --enable-libsnappy --enable-zlib --enable-libsrt --enable-libssh --enable-libzmq --enable-avisynth --enable-libbluray --enable-libcaca --enable-sdl2 --enable-libdav1d --enable-libzvbi --enable-librav1e --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxvid --enable-libaom --enable-libopenjpeg --enable-libvpx --enable-libass --enable-frei0r --enable-libfreetype --enable-libfribidi --enable-libvidstab --enable-libvmaf --enable-libzimg --enable-amf --enable-cuda-llvm --enable-cuvid --enable-ffnvcodec --enable-nvdec --enable-nvenc --enable-d3d11va --enable-dxva2 --enable-libmfx --enable-libcdio --enable-libgme --enable-libmodplug --enable-libopenmpt --enable-libopencore-amrwb --enable-libmp3lame --enable-libshine --enable-libtheora --enable-libtwolame --enable-libvo-amrwbenc --enable-libilbc --enable-libgsm --enable-libopencore-amrnb --enable-libopus --enable-libspeex --enable-libvorbis --enable-ladspa --enable-libbs2b --enable-libflite --enable-libmysofa --enable-librubberband --enable-libsoxr --enable-chromaprint
  libavutil      56. 51.100 / 56. 51.100
  libavcodec     58. 91.100 / 58. 91.100
  libavformat    58. 45.100 / 58. 45.100
  libavdevice    58. 10.100 / 58. 10.100
  libavfilter     7. 85.100 /  7. 85.100
  libswscale      5.  7.100 /  5.  7.100
  libswresample   3.  7.100 /  3.  7.100
  libpostproc    55.  7.100 / 55.  7.100
Encoder hevc_nvenc [NVIDIA NVENC hevc encoder]:
    General capabilities: delay hardware
    Threading capabilities: none
    Supported hardware devices: cuda cuda d3d11va d3d11va
    Supported pixel formats: yuv420p nv12 p010le yuv444p p016le yuv444p16le bgr0 rgb0 cuda d3d11
hevc_nvenc AVOptions:
  -preset            <int>        E..V...... Set the encoding preset (from 0 to 11) (default medium)
     default         0            E..V......
     slow            1            E..V...... hq 2 passes
     medium          2            E..V...... hq 1 pass
     fast            3            E..V...... hp 1 pass
     hp              4            E..V......
     hq              5            E..V......
     bd              6            E..V......
     ll              7            E..V...... low latency
     llhq            8            E..V...... low latency hq
     llhp            9            E..V...... low latency hp
     lossless        10           E..V...... lossless
     losslesshp      11           E..V...... lossless hp
  -profile           <int>        E..V...... Set the encoding profile (from 0 to 4) (default main)
     main            0            E..V......
     main10          1            E..V......
     rext            2            E..V......
  -level             <int>        E..V...... Set the encoding level restriction (from 0 to 186) (default auto)
     auto            0            E..V......
     1               30           E..V......
     1.0             30           E..V......
     2               60           E..V......
     2.0             60           E..V......
     2.1             63           E..V......
     3               90           E..V......
     3.0             90           E..V......
     3.1             93           E..V......
     4               120          E..V......
     4.0             120          E..V......
     4.1             123          E..V......
     5               150          E..V......
     5.0             150          E..V......
     5.1             153          E..V......
     5.2             156          E..V......
     6               180          E..V......
     6.0             180          E..V......
     6.1             183          E..V......
     6.2             186          E..V......
  -tier              <int>        E..V...... Set the encoding tier (from 0 to 1) (default main)
     main            0            E..V......
     high            1            E..V......
  -rc                <int>        E..V...... Override the preset rate-control (from -1 to INT_MAX) (default -1)
     constqp         0            E..V...... Constant QP mode
     vbr             1            E..V...... Variable bitrate mode
     cbr             2            E..V...... Constant bitrate mode
     vbr_minqp       8388612      E..V...... Variable bitrate mode with MinQP (deprecated)
     ll_2pass_quality 8388616      E..V...... Multi-pass optimized for image quality (deprecated)
     ll_2pass_size   8388624      E..V...... Multi-pass optimized for constant frame size (deprecated)
     vbr_2pass       8388640      E..V...... Multi-pass variable bitrate mode (deprecated)
     cbr_ld_hq       8            E..V...... Constant bitrate low delay high quality mode
     cbr_hq          16           E..V...... Constant bitrate high quality mode
     vbr_hq          32           E..V...... Variable bitrate high quality mode
  -rc-lookahead      <int>        E..V...... Number of frames to look ahead for rate-control (from 0 to INT_MAX) (default 0)
  -surfaces          <int>        E..V...... Number of concurrent surfaces (from 0 to 64) (default 0)
  -cbr               <boolean>    E..V...... Use cbr encoding mode (default false)
  -2pass             <boolean>    E..V...... Use 2pass encoding mode (default auto)
  -gpu               <int>        E..V...... Selects which NVENC capable GPU to use. First GPU is 0, second is 1, and so on. (from -2 to INT_MAX) (default any)
     any             -1           E..V...... Pick the first device available
     list            -2           E..V...... List the available devices
  -delay             <int>        E..V...... Delay frame output by the given amount of frames (from 0 to INT_MAX) (default INT_MAX)
  -no-scenecut       <boolean>    E..V...... When lookahead is enabled, set this to 1 to disable adaptive I-frame insertion at scene cuts (default false)
  -forced-idr        <boolean>    E..V...... If forcing keyframes, force them as IDR frames. (default false)
  -spatial_aq        <boolean>    E..V...... set to 1 to enable Spatial AQ (default false)
  -spatial-aq        <boolean>    E..V...... set to 1 to enable Spatial AQ (default false)
  -temporal_aq       <boolean>    E..V...... set to 1 to enable Temporal AQ (default false)
  -temporal-aq       <boolean>    E..V...... set to 1 to enable Temporal AQ (default false)
  -zerolatency       <boolean>    E..V...... Set 1 to indicate zero latency operation (no reordering delay) (default false)
  -nonref_p          <boolean>    E..V...... Set this to 1 to enable automatic insertion of non-reference P-frames (default false)
  -strict_gop        <boolean>    E..V...... Set 1 to minimize GOP-to-GOP rate fluctuations (default false)
  -aq-strength       <int>        E..V...... When Spatial AQ is enabled, this field is used to specify AQ strength. AQ strength scale is from 1 (low) - 15 (aggressive) (from 1 to 15) (default 8)
  -cq                <float>      E..V...... Set target quality level (0 to 51, 0 means automatic) for constant quality mode in VBR rate control (from 0 to 51) (default 0)
  -aud               <boolean>    E..V...... Use access unit delimiters (default false)
  -bluray-compat     <boolean>    E..V...... Bluray compatibility workarounds (default false)
  -init_qpP          <int>        E..V...... Initial QP value for P frame (from -1 to 51) (default -1)
  -init_qpB          <int>        E..V...... Initial QP value for B frame (from -1 to 51) (default -1)
  -init_qpI          <int>        E..V...... Initial QP value for I frame (from -1 to 51) (default -1)
  -qp                <int>        E..V...... Constant quantization parameter rate control method (from -1 to 51) (default -1)
  -weighted_pred     <int>        E..V...... Set 1 to enable weighted prediction (from 0 to 1) (default 0)
  -b_ref_mode        <int>        E..V...... Use B frames as references (from 0 to 2) (default disabled)
     disabled        0            E..V...... B frames will not be used for reference
     each            1            E..V...... Each B frame will be used for reference
     middle          2            E..V...... Only (number of B frames)/2 will be used for reference
  -dpb_size          <int>        E..V...... Specifies the DPB size used for encoding (0 means automatic) (from 0 to INT_MAX) (default 0)

ffmpeg log of a 2-minute encode of a UHD video:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'input.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf57.83.100
  Duration: 00:40:30.04, start: 0.000000, bitrate: 25161 kb/s
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 3840x2160 [SAR 1:1 DAR 16:9], 24993 kb/s, 23.98 fps, 23.98 tbr, 24k tbn, 47.95 tbc (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 164 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> hevc (hevc_nvenc))
  Stream #0:1 -> #0:1 (copy)
Press [q] to stop, [?] for help
Incompatible pixel format 'yuv420p10le' for codec 'hevc_nvenc', auto-selecting format 'p010le'
Output #0, matroska, to 'out.mkv':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
    Stream #0:0(und): Video: hevc (hevc_nvenc) (Main 10), p010le, 3840x2160 [SAR 1:1 DAR 16:9], q=-1--1, 23.98 fps, 1k tbn, 23.98 tbc (default)
    Metadata:
      handler_name    : VideoHandler
      encoder         : Lavc58.91.100 hevc_nvenc
    Side data:
      cpb: bitrate max/min/avg: 0/0/0 buffer size: 4000000 vbv_delay: N/A
    Stream #0:1(und): Audio: aac (LC) ([255][0][0][0] / 0x00FF), 48000 Hz, stereo, fltp, 164 kb/s (default)
    Metadata:
      handler_name    : SoundHandler

MediaInfo of the output.

General
Format                         : Matroska
Format version                 : Version 4 / Version 2
File size                      : 288 MiB
Duration                       : 2 min 0 s
Overall bit rate               : 20.2 Mb/s
Writing application            : Lavf58.45.100
Writing library                : Lavf58.45.100
ErrorDetectionType             : Per level 1

Video ID : 1 Format : HEVC Format/Info : High Efficiency Video Coding Format profile : Main 10@L5@Main Codec ID : V_MPEGH/ISO/HEVC Duration : 2 min 0 s Width : 3 840 pixels Height : 2 160 pixels Display aspect ratio : 16:9 Frame rate mode : Constant Frame rate : 23.976 (24000/1001) FPS Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 10 bits Writing library : Lavc58.91.100 hevc_nvenc Default : Yes Forced : No DURATION : 00:02:00.037000000 HANDLER_NAME : VideoHandler

Audio ID : 2 Format : AAC Format/Info : Advanced Audio Codec Format profile : LC Codec ID : A_AAC Duration : 2 min 0 s Channel(s) : 2 channels Channel positions : Front: L R Sampling rate : 48.0 kHz Frame rate : 46.875 FPS (1024 SPF) Compression mode : Lossy Delay relative to video : -42 ms Default : Yes Forced : No DURATION : 00:02:00.000000000 HANDLER_NAME : SoundHandler

If I disable the fallback to p010le, the output would be Bit depth : 8 bits but Format profile : Main 10@L5@Main. What does it mean? The encoding process is 10-bit while the output is reduced to 8-bit?

Isabella
  • 273
  • 1
  • 3
  • 8

2 Answers2

9

...even though it supports main10, no 10-bit pixel format is supported:

Hardware HEVC encoder uses pixel formats p010le and p016le for 10-bit output where first one produces yuv 4:2:0 and the second one yuv 4:4:4.

If I disable the fallback to p010le, the output would be Bit depth : 8 bits but Format profile : Main 10@L5@Main. What does it mean?

The profile specifies minimum capabilities of a device to be able to play the video and vice-versa for encoder specifies maximum values that can be used to encode the video.

This means: if you specify that video is Main 10@L5@Main it can be played on any TV that supports 10bit format and is able to decode at least 25Mbps. However this does not tell encoder how to actually encode the video but rather tells it that the video can have maximum of 10bits per sample and the bitrate cannot exceed 25Mbps, which means if encoder creates 8bit video with 5Mbps, it still fulfills the given conditions and can mark the video as Main 10@L5@Main.

If you want to tell the encoder what color depth and bitrate it should use, you must specify it by other parameters (see below).


Here is a command I use to convert videos from AVC to HEVC 10-bit using Pascal encoder (GTX 10x0 cards):

ffmpeg -y -hide_banner -hwaccel nvdec -hwaccel_device 0 -vsync 0 -i "input.mp4" -c copy -c:v:0 hevc_nvenc -profile:v main10 -pix_fmt p010le -rc:v:0 vbr_hq -rc-lookahead 32 -cq 21 -qmin 1 -qmax 51 -b:v:0 10M -maxrate:v:0 20M -gpu 0 "output.mkv"

Similar command that can be used on newer Turing (GTX 20x0) and Ampere (RTX 30x0) encoder:

ffmpeg -y -hide_banner -vsync 0 -hwaccel cuda -hwaccel_output_format cuda  -hwaccel_device 0 -c:v:0 h264_cuvid -i "input.mp4" -vf "hwdownload,format=nv12" -c copy -c:v:0 hevc_nvenc -profile:v main10 -pix_fmt p010le -rc:v:0 vbr -tune hq -preset p5 -multipass 1 -bf 4 -b_ref_mode 1 -nonref_p 1 -rc-lookahead 75 -spatial-aq 1 -aq-strength 8 -temporal-aq 1 -cq 21 -qmin 1 -qmax 99 -b:v:0 10M -maxrate:v:0 20M -gpu 0 "output.mkv"

Params explanation:

  • -pix_fmt p010le converts 8bit input into 10bit; note that conversion is done by CPU so it makes the encoding slower but produces better quality video and in CRF also lower bitrate (smaller file). For CUDA decoder must be used with -vf "hwdownload,format=nv12" (or -vf "hwdownload,format=p010le" for 10 bit input video) to copy decoded frames from CUDA into CPU for conversion (NVDEC decoder sends frames into CPU automatically.) Specifying -profile main10 is required to allow 10bit encoding but does not accually affect how the encoder encodes the video - encoder itself does not change the bit depth of the input!
  • -rc:v:0 vbr_hq -cq 21 -qmin 1 -qmax 99 is needed to fully enable CRF mode. Increase qmin to lower bitrate peaks, lower qmax to prevent low-quality frames (recommended for encoding without AQ). On Turing and Ampere use -rc:v:0 vbr -tune hq instead of vbr_hqfor same result. BTW for HEVC is recommended quality -cq 28 (or -cq 30 with AQ enabled).
  • -b:v:0 10M -maxrate:v:0 20M specifies recommended and maximum bitrate supported by the target device. For main tier @L5 you can use max. 25M, for @L6 maximum is 60Mbps (for 30fps video). This is also needed for the hardware encoder to know how to calculate the QP value in CRF mode. I use 10M/20M for videos stored on NAS and played on TV over LAN.
  • present=slow enables 2-pass processing and other advanced optimizations; since hardware encoder is faster than software encoders, you can go with slow preset and still get a lot faster processing than from CPU on faster preset. On Ampere you must use -preset p5 -multipass 2 which equals to the slow preset (you can go up to p7 which equals to very slow but has almost no additional effect on file size in most cases; you can use -multipass 1 for 4-times faster first pass).
  • hwaccell enables hardware decoder and specify which device will decode the video (if you have SLI). Based on your CPU speed you can test which is best for you. NVDEC can decode any MPEG video but is slower; for faster CUDA you must specify if source is AVC, HEVC or AV1. For DivX, Xvid and non-MPEG input remove it completely to switch to software decoder using CPU.
  • -bf 4 -b_ref_mode 1 -nonref_p 1 enables improved B-frames processing on Turing and Ampere (note that it's not supported by h264_nvenc).
  • Alternatively you can use -bf 0 -weighted_pred 1 to use Weighted prediction instead of B-frames if your source has uneven lighting (flickering lights or lots of fade-ins/-outs) to get better quality and smaller file (however disabling B-frames increases file size for other sources with stable lighting).
  • -rc-lookahead 75 -spatial-aq 1 -aq-strength 8 -temporal-aq 1 enables Adaptive quantifier supported on Turing and Ampere. This improves video quality with same or lower bitrate in CRF mode. Change the rc-lookahead to either get faster speed or better quality. Increase aq-strength if you see artifacts in very dark colors.
  • using -gpu 0 you specify which device encode the video if you have SLI or on-board (Intel/AMD) card.
  • in addition with CUDA decoder you can add -resize WIDTHxHEIGHT and/or -crop TOPxBOTTOMxLEFTxRIGHT (before the -i parameter) to change the input using the hardware decoder. This is faster than using -vf scale and -vf crop which is done on CPU.
2

According to this reddit post by @Anton1699:

p010le is equivalent to yuv420p10le (it's 10-bit video with 4:2:0 subsampling = 15-bit per pixel).

I have yet to find a more authoritative source in documentation.

p010le is supported by nvenc. The output log also indicates that this works. As a result, I put this up as a tentative answer. Sample command:

ffmpeg -i input.mkv -pix_fmt p010le -c:v hevc_nvenc -profile:v main10 -cq 21 out.mkv

Isabella
  • 273
  • 1
  • 3
  • 8