0

I have two files (one audio, one video, both mp4) I want to combine into a single mp4 without re-encoding the streams. They both have the same duration (well, technically one is 0.05s longer than the other, but it's close enough for my application). The command I am using to combine them is:

ffmpeg -i audio.mp4 -i video.mp4 -c:v copy -c:a copy test.mp4

When I play test.mp4 in VLC player, the audio is way out of sync. The video duration is correct, but the sound is all happening too soon and it gets more out of sync as it continues. However, if I play back the same file using QuickTime, it's perfect!

What is going on here? Is there a way for me to ensure correct playback regardless of video player, without re-encoding the streams?

Here's the full output of the command:

% ffmpeg -i audio.mp4 -i video.mp4 -c:v copy -c:a copy test.mp4
ffmpeg version 5.0.1 Copyright (c) 2000-2022 the FFmpeg developers
  built with Apple clang version 13.1.6 (clang-1316.0.21.2.5)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.0.1_2 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
  libavutil      57. 17.100 / 57. 17.100
  libavcodec     59. 18.100 / 59. 18.100
  libavformat    59. 16.100 / 59. 16.100
  libavdevice    59.  4.100 / 59.  4.100
  libavfilter     8. 24.100 /  8. 24.100
  libswscale      6.  4.100 /  6.  4.100
  libswresample   4.  3.100 /  4.  3.100
  libpostproc    56.  3.100 / 56.  3.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'audio.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41isomiso2
    creation_time   : 2022-07-24T22:23:15.000000Z
  Duration: 00:00:27.17, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0[0x1](und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2022-07-24T22:23:15.000000Z
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
Input #1, mov,mp4,m4a,3gp,3g2,mj2, from 'video.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41isomiso2
    creation_time   : 2022-07-24T22:23:15.000000Z
  Duration: 00:00:27.12, start: 0.000000, bitrate: 7662 kb/s
  Stream #1:0[0x1](und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1280x720, 7653 kb/s, SAR 1:1 DAR 16:9, 103.75 fps, 120 tbr, 12k tbn (default)
    Metadata:
      creation_time   : 2022-07-24T22:23:15.000000Z
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
Output #0, mp4, to 'test.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: mp42mp41isomiso2
    encoder         : Lavf59.16.100
  Stream #0:0(und): Video: hevc (Main) (hvc1 / 0x31637668), yuv420p(tv), 1280x720 [SAR 1:1 DAR 16:9], q=2-31, 7653 kb/s, 103.75 fps, 120 tbr, 12k tbn (default)
    Metadata:
      creation_time   : 2022-07-24T22:23:15.000000Z
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
  Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2022-07-24T22:23:15.000000Z
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #1:0 -> #0:0 (copy)
  Stream #0:0 -> #0:1 (copy)
Press [q] to stop, [?] for help
[mp4 @ 0x148708150] Non-monotonous DTS in output stream 0:0; previous: 29308, current: 29308; changing to 29309. This may result in incorrect timestamps in the output file.
[mp4 @ 0x148708150] Non-monotonous DTS in output stream 0:0; previous: 111505, current: 111505; changing to 111506. This may result in incorrect timestamps in the output file.
[mov,mp4,m4a,3gp,3g2,mj2 @ 0x148704fe0] Invalid timestamps stream=0, pts=128503, dts=128504, size=6321
[mp4 @ 0x148708150] Invalid DTS: 128504 PTS: 128503 in output stream 0:0, replacing by guess
[mp4 @ 0x148708150] Non-monotonous DTS in output stream 0:0; previous: 128504, current: 128504; changing to 128505. This may result in incorrect timestamps in the output file.
frame= 2814 fps=0.0 q=-1.0 Lsize=   25817kB time=00:00:27.16 bitrate=7784.7kbits/s speed=1.5e+03x    
video:25340kB audio:424kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.201361%
Alex
  • 1

0 Answers0