I have two files (one audio, one video, both mp4) I want to combine into a single mp4 without re-encoding the streams. They both have the same duration (well, technically one is 0.05s longer than the other, but it's close enough for my application). The command I am using to combine them is:
ffmpeg -i audio.mp4 -i video.mp4 -c:v copy -c:a copy test.mp4
When I play test.mp4 in VLC player, the audio is way out of sync. The video duration is correct, but the sound is all happening too soon and it gets more out of sync as it continues. However, if I play back the same file using QuickTime, it's perfect!
What is going on here? Is there a way for me to ensure correct playback regardless of video player, without re-encoding the streams?
Here's the full output of the command:
% ffmpeg -i audio.mp4 -i video.mp4 -c:v copy -c:a copy test.mp4
ffmpeg version 5.0.1 Copyright (c) 2000-2022 the FFmpeg developers
built with Apple clang version 13.1.6 (clang-1316.0.21.2.5)
configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.0.1_2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
libavutil 57. 17.100 / 57. 17.100
libavcodec 59. 18.100 / 59. 18.100
libavformat 59. 16.100 / 59. 16.100
libavdevice 59. 4.100 / 59. 4.100
libavfilter 8. 24.100 / 8. 24.100
libswscale 6. 4.100 / 6. 4.100
libswresample 4. 3.100 / 4. 3.100
libpostproc 56. 3.100 / 56. 3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'audio.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42mp41isomiso2
creation_time : 2022-07-24T22:23:15.000000Z
Duration: 00:00:27.17, start: 0.000000, bitrate: 129 kb/s
Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
Metadata:
creation_time : 2022-07-24T22:23:15.000000Z
handler_name : SoundHandler
vendor_id : [0][0][0][0]
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'video.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42mp41isomiso2
creation_time : 2022-07-24T22:23:15.000000Z
Duration: 00:00:27.12, start: 0.000000, bitrate: 7662 kb/s
Stream #1:0[0x1](und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1280x720, 7653 kb/s, SAR 1:1 DAR 16:9, 103.75 fps, 120 tbr, 12k tbn (default)
Metadata:
creation_time : 2022-07-24T22:23:15.000000Z
handler_name : VideoHandler
vendor_id : [0][0][0][0]
Output #0, mp4, to 'test.mp4':
Metadata:
major_brand : mp42
minor_version : 0
compatible_brands: mp42mp41isomiso2
encoder : Lavf59.16.100
Stream #0:0(und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 7653 kb/s, 103.75 fps, 120 tbr, 12k tbn (default)
Metadata:
creation_time : 2022-07-24T22:23:15.000000Z
handler_name : VideoHandler
vendor_id : [0][0][0][0]
Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
Metadata:
creation_time : 2022-07-24T22:23:15.000000Z
handler_name : SoundHandler
vendor_id : [0][0][0][0]
Stream mapping:
Stream #1:0 -> #0:0 (copy)
Stream #0:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
[mp4 @ 0x148708150] Non-monotonous DTS in output stream 0:0; previous: 29308, current: 29308; changing to 29309. This may result in incorrect timestamps in the output file.
[mp4 @ 0x148708150] Non-monotonous DTS in output stream 0:0; previous: 111505, current: 111505; changing to 111506. This may result in incorrect timestamps in the output file.
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x148704fe0] Invalid timestamps stream=0, pts=128503, dts=128504, size=6321
[mp4 @ 0x148708150] Invalid DTS: 128504 PTS: 128503 in output stream 0:0, replacing by guess
[mp4 @ 0x148708150] Non-monotonous DTS in output stream 0:0; previous: 128504, current: 128504; changing to 128505. This may result in incorrect timestamps in the output file.
frame= 2814 fps=0.0 q=-1.0 Lsize= 25817kB time=00:00:27.16 bitrate=7784.7kbits/s speed=1.5e+03x
video:25340kB audio:424kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.201361%