This blog post contains instructions to encode a side-by-side stereo video into MV-HEVC using x265 and ffmpeg. It warns that it won't be recognized as such on Apple Vision Pro because it lacks the appropriate metadata.
My question is: can the corresponding metadata be produced with ffmpeg 7.1, now that it supports MV-HEVC?
Investigation below
Following the instructions from the medium post, I started from a side-by-side stereo semi equirectangular video [sample here]. From here on, I'll call this video input.mov
ffmpeg -i input.mov -pix_fmt yuv420p -f rawvideo stereo_raw.yuv
x265 --input stereo_raw.yuv --num-views 2 --format 1 --input-res 4096x4096 \
--output mv_output.hevc --fps 59.94 --profile main \
--colorprim bt709 --transfer bt709 --colormatrix bt709 --bitrate 50000
ffmpeg -i mv_output.hevc -c copy -tag:v hvc1 output.mov
Interestingly, output.mov plays as expected (square fisheye image) in VLC, but it has a significant back-and-forth stutter in QuickTime. When loaded onto an Apple Vision Pro, it doesn't exhibit the expected 3D effect of spatial video, it shows the same stuttery playback found in QuickTime on desktop.
Using Mike Swanson's "Spatial" tool, I can add the VEXU metadata
spatial metadata -i output.mov -o output_with_metadata.mov --args vexu.args
where vexu.args is
--set vexu:cameraBaseline=60.0
--set vexu:eyeViewsReversed=false
--set vexu:hasAdditionalViews=false
--set vexu:hasLeftEyeView=true
--set vexu:hasRightEyeView=true
--set vexu:heroEyeIndicator=left
--set vexu:horizontalDisparityAdjustment=0.0
--set vexu:horizontalFieldOfView=180.0
--set vexu:projectionKind=halfEquirectangular
Seemingly this metadata is necessary but not sufficient, as QuickTime/Vision Pro playback of output_with_metadata.mov is still wrong.
Digging deeper into the file metadata, here's the ffprobe JSON output of output_with_metadata.mov
ffprobe -v quiet -print_format json -show_format -show_streams output_with_metadata.mov
{
"streams": [
{
"index": 0,
"codec_name": "hevc",
"codec_long_name": "H.265 / HEVC (High Efficiency Video Coding)",
"profile": "Main",
"codec_type": "video",
"codec_tag_string": "hvc1",
"codec_tag": "0x31637668",
"width": 4096,
"height": 4096,
"coded_width": 4096,
"coded_height": 4096,
"closed_captions": 0,
"film_grain": 0,
"has_b_frames": 2,
"pix_fmt": "yuv420p",
"level": 180,
"color_range": "tv",
"color_space": "bt709",
"color_transfer": "bt709",
"color_primaries": "bt709",
"chroma_location": "left",
"refs": 1,
"view_ids_available": "0,1",
"view_pos_available": "",
"id": "0x1",
"r_frame_rate": "60000/1001",
"avg_frame_rate": "1000000/16683",
"time_base": "1/1200000",
"start_pts": 0,
"start_time": "0.000000",
"duration_ts": 12316800,
"duration": "10.264000",
"bit_rate": "103235022",
"nb_frames": "620",
"extradata_size": 146,
"disposition": {
"default": 1,
"dub": 0,
"original": 0,
"comment": 0,
"lyrics": 0,
"karaoke": 0,
"forced": 0,
"hearing_impaired": 0,
"visual_impaired": 0,
"clean_effects": 0,
"attached_pic": 0,
"timed_thumbnails": 0,
"non_diegetic": 0,
"captions": 0,
"descriptions": 0,
"metadata": 0,
"dependent": 0,
"still_image": 0,
"multilayer": 0
},
"tags": {
"handler_name": "VideoHandler",
"vendor_id": "FFMP"
},
"side_data_list": [
{
"side_data_type": "Stereo 3D",
"type": "unspecified",
"inverted": 0,
"view": "packed",
"primary_eye": "left",
"baseline": 60000,
"horizontal_disparity_adjustment": "0/10000",
"horizontal_field_of_view": "180000/1000"
},
{
"side_data_type": "Spherical Mapping",
"projection": "half equirectangular",
"yaw": 0,
"pitch": 0,
"roll": 0
}
]
}
],
"format": {
"filename": "output_with_metadata.mov",
"nb_streams": 1,
"nb_programs": 0,
"nb_stream_groups": 0,
"format_name": "mov,mp4,m4a,3gp,3g2,mj2",
"format_long_name": "QuickTime / MOV",
"start_time": "0.000000",
"duration": "10.264000",
"size": "133484177",
"bit_rate": "104040667",
"probe_score": 100,
"tags": {
"major_brand": "qt ",
"minor_version": "512",
"compatible_brands": "qt ",
"encoder": "Lavf61.7.100"
}
}
}
It might help to contrast with the ffprobe output of a native spatial video captured on iPhone 15 Pro:
ffprobe -v quiet -print_format json -show_format -show_streams IMG_4931.MOV
{
"streams": [
{
"index": 0,
"codec_name": "hevc",
"codec_long_name": "H.265 / HEVC (High Efficiency Video Coding)",
"profile": "Main",
"codec_type": "video",
"codec_tag_string": "hvc1",
"codec_tag": "0x31637668",
"width": 1920,
"height": 1080,
"coded_width": 1920,
"coded_height": 1088,
"closed_captions": 0,
"film_grain": 0,
"has_b_frames": 2,
"pix_fmt": "yuv420p",
"level": 123,
"color_range": "tv",
"color_space": "bt709",
"color_transfer": "bt709",
"color_primaries": "bt709",
"chroma_location": "left",
"refs": 1,
"view_ids_available": "0,1",
"view_pos_available": "2,1",
"id": "0x1",
"r_frame_rate": "30/1",
"avg_frame_rate": "30/1",
"time_base": "1/600",
"start_pts": 0,
"start_time": "0.000000",
"duration_ts": 1220,
"duration": "2.033333",
"bit_rate": "18337105",
"nb_frames": "61",
"extradata_size": 185,
"disposition": {
"default": 1,
"dub": 0,
"original": 0,
"comment": 0,
"lyrics": 0,
"karaoke": 0,
"forced": 0,
"hearing_impaired": 0,
"visual_impaired": 0,
"clean_effects": 0,
"attached_pic": 0,
"timed_thumbnails": 0,
"non_diegetic": 0,
"captions": 0,
"descriptions": 0,
"metadata": 0,
"dependent": 0,
"still_image": 0,
"multilayer": 1
},
"tags": {
"creation_time": "2024-10-08T22:48:51.000000Z",
"language": "und",
"handler_name": "Core Media Video",
"vendor_id": "[0][0][0][0]",
"encoder": "HEVC"
},
"side_data_list": [
{
"side_data_type": "Stereo 3D",
"type": "unspecified",
"inverted": 0,
"view": "packed",
"primary_eye": "none",
"baseline": 19276,
"horizontal_disparity_adjustment": "200/10000",
"horizontal_field_of_view": "63400/1000"
},
{
"side_data_type": "Spherical Mapping",
"projection": "rectilinear",
"yaw": 0,
"pitch": 0,
"roll": 0
}
]
},
{
"index": 1,
"codec_name": "aac",
"codec_long_name": "AAC (Advanced Audio Coding)",
"profile": "LC",
"codec_type": "audio",
"codec_tag_string": "mp4a",
"codec_tag": "0x6134706d",
"sample_fmt": "fltp",
"sample_rate": "44100",
"channels": 2,
"channel_layout": "stereo",
"bits_per_sample": 0,
"initial_padding": 0,
"id": "0x2",
"r_frame_rate": "0/0",
"avg_frame_rate": "0/0",
"time_base": "1/44100",
"start_pts": 0,
"start_time": "0.000000",
"duration_ts": 89670,
"duration": "2.033333",
"bit_rate": "168131",
"nb_frames": "90",
"extradata_size": 2,
"disposition": {
"default": 1,
"dub": 0,
"original": 0,
"comment": 0,
"lyrics": 0,
"karaoke": 0,
"forced": 0,
"hearing_impaired": 0,
"visual_impaired": 0,
"clean_effects": 0,
"attached_pic": 0,
"timed_thumbnails": 0,
"non_diegetic": 0,
"captions": 0,
"descriptions": 0,
"metadata": 0,
"dependent": 0,
"still_image": 0,
"multilayer": 0
},
"tags": {
"creation_time": "2024-10-08T22:48:51.000000Z",
"language": "und",
"handler_name": "Core Media Audio",
"vendor_id": "[0][0][0][0]"
}
},
{
"index": 2,
"codec_type": "data",
"codec_tag_string": "mebx",
"codec_tag": "0x7862656d",
"id": "0x3",
"r_frame_rate": "0/0",
"avg_frame_rate": "0/0",
"time_base": "1/600",
"start_pts": 0,
"start_time": "0.000000",
"duration_ts": 1220,
"duration": "2.033333",
"bit_rate": "39",
"nb_frames": "1",
"disposition": {
"default": 1,
"dub": 0,
"original": 0,
"comment": 0,
"lyrics": 0,
"karaoke": 0,
"forced": 0,
"hearing_impaired": 0,
"visual_impaired": 0,
"clean_effects": 0,
"attached_pic": 0,
"timed_thumbnails": 0,
"non_diegetic": 0,
"captions": 0,
"descriptions": 0,
"metadata": 0,
"dependent": 0,
"still_image": 0,
"multilayer": 0
},
"tags": {
"creation_time": "2024-10-08T22:48:51.000000Z",
"language": "und",
"handler_name": "Core Media Metadata"
}
},
{
"index": 3,
"codec_type": "data",
"codec_tag_string": "mebx",
"codec_tag": "0x7862656d",
"id": "0x4",
"r_frame_rate": "0/0",
"avg_frame_rate": "0/0",
"time_base": "1/600",
"start_pts": 0,
"start_time": "0.000000",
"duration_ts": 1220,
"duration": "2.033333",
"bit_rate": "31",
"nb_frames": "1",
"disposition": {
"default": 1,
"dub": 0,
"original": 0,
"comment": 0,
"lyrics": 0,
"karaoke": 0,
"forced": 0,
"hearing_impaired": 0,
"visual_impaired": 0,
"clean_effects": 0,
"attached_pic": 0,
"timed_thumbnails": 0,
"non_diegetic": 0,
"captions": 0,
"descriptions": 0,
"metadata": 0,
"dependent": 0,
"still_image": 0,
"multilayer": 0
},
"tags": {
"creation_time": "2024-10-08T22:48:51.000000Z",
"language": "und",
"handler_name": "Core Media Metadata"
}
},
{
"index": 4,
"codec_type": "data",
"codec_tag_string": "mebx",
"codec_tag": "0x7862656d",
"id": "0x5",
"r_frame_rate": "0/0",
"avg_frame_rate": "0/0",
"time_base": "1/600",
"start_pts": 0,
"start_time": "0.000000",
"duration_ts": 1220,
"duration": "2.033333",
"bit_rate": "34560",
"nb_frames": "61",
"disposition": {
"default": 1,
"dub": 0,
"original": 0,
"comment": 0,
"lyrics": 0,
"karaoke": 0,
"forced": 0,
"hearing_impaired": 0,
"visual_impaired": 0,
"clean_effects": 0,
"attached_pic": 0,
"timed_thumbnails": 0,
"non_diegetic": 0,
"captions": 0,
"descriptions": 0,
"metadata": 0,
"dependent": 0,
"still_image": 0,
"multilayer": 0
},
"tags": {
"creation_time": "2024-10-08T22:48:51.000000Z",
"language": "und",
"handler_name": "Core Media Metadata"
}
},
{
"index": 5,
"codec_type": "data",
"codec_tag_string": "mebx",
"codec_tag": "0x7862656d",
"id": "0x6",
"r_frame_rate": "0/0",
"avg_frame_rate": "0/0",
"time_base": "1/600",
"start_pts": 0,
"start_time": "0.000000",
"duration_ts": 1220,
"duration": "2.033333",
"bit_rate": "173",
"nb_frames": "1",
"disposition": {
"default": 1,
"dub": 0,
"original": 0,
"comment": 0,
"lyrics": 0,
"karaoke": 0,
"forced": 0,
"hearing_impaired": 0,
"visual_impaired": 0,
"clean_effects": 0,
"attached_pic": 0,
"timed_thumbnails": 0,
"non_diegetic": 0,
"captions": 0,
"descriptions": 0,
"metadata": 0,
"dependent": 0,
"still_image": 0,
"multilayer": 0
},
"tags": {
"creation_time": "2024-10-08T22:48:51.000000Z",
"language": "und",
"handler_name": "Core Media Metadata"
}
}
],
"format": {
"filename": "IMG_4931.MOV",
"nb_streams": 6,
"nb_programs": 0,
"nb_stream_groups": 0,
"format_name": "mov,mp4,m4a,3gp,3g2,mj2",
"format_long_name": "QuickTime / MOV",
"start_time": "0.000000",
"duration": "2.033333",
"size": "4723222",
"bit_rate": "18583171",
"probe_score": 100,
"tags": {
"major_brand": "qt ",
"minor_version": "0",
"compatible_brands": "qt ",
"creation_time": "2024-10-08T22:48:51.000000Z",
"com.apple.quicktime.full-frame-rate-playback-intent": "0",
"com.apple.quicktime.spatial.format-version": "1.0",
"com.apple.quicktime.spatial.aggressors-seen": "0",
"com.apple.quicktime.make": "Apple",
"com.apple.quicktime.model": "iPhone 15 Pro",
"com.apple.quicktime.software": "18.0",
"com.apple.quicktime.creationdate": "2024-10-08T15:48:50-0700"
}
}
}
A few observations comparing the two outputs:
- in
stream[0]:"view_pos_available": "",in the ffmpeg/x265 video vs."view_pos_available": "1,2",in the iPhone one (seems important, see description here) - in
stream[0], under"disposition":"multilayer": 0in the ffmpeg/x265 video vs."multilayer": 1in the iPhone one (seems important) - in
"format", under"tags":"com.apple.quicktime.spatial.format-version": "1.0",and"com.apple.quicktime.spatial.aggressors-seen": "0",only in the iPhone video (doesn't seem important, I saw working mv-hevc mov files that didn't have those)
Trying to set the metadata flags of 1. and 2. with the following command:
ffmpeg -i output_with_metadata.mov -c copy \
-disposition:v:0 +multilayer \
-metadata:s:v:0 view_pos_available="1,2" \
output_with_even_more_metadata.mov
A file is successfully produced but it doesn't seem to actually overwrite the values (ffprobe shows they are untouched), and QuickTime/Vision Pro are still not satisfied. Additionally, the following warning appears:
Not writing 'lhvC' atom for multilayer stream.
Is this a clue for what's going on? Is the lhvC atom/box that's supposed to contain information about the second eye somehow missing in the MV-HEVC encoding?
I'm not entirely sure if my ffmpeg parameters are wrong, if some support is missing in ffmpeg's encoding of mv-hevc videos, or if it's something in x265 that's causing the issue.