5

I'm learning how to split and combine videos with ffmpeg. Presently I'm practicing on a 799 KB MP4, one_through_ten-timestamped.mp4, where each frame has a burnt-in "frame number" (up to frame 40, the burnt-in number advances only every other frame).

The command I use is:

ffmpeg -f concat -i one_through_ten_edit_list.txt -c copy one_through_ten_edited.mp4

Where one_through_ten_edit_list.txt is:

file one_through_ten-timestamped.mp4
inpoint 00:00:01.500
outpoint 00:00:05.000
file one_through_ten-timestamped.mp4
inpoint 00:00:09.500
outpoint 00:00:13.000

When I perform this command in ffmpeg the output video has several issues:

  1. The output starts from the beginning of the source video instead of 1.5 s in.
  2. The audio after the second input video is concatenated is delayed (when played on QuickTime Player. It's fine on ffplay).

When I perform these same edits in MPEG Streamclip, the resulting video plays as expected. Here's some tabular output from ffprobe on one_through_ten-timestamped.mp4 which shows where the ffmpeg video and MPEG Streamclip differ in their concatenation points:

start of first segment end of first segment mpeg streamclip start 1st start of second segments end of second segment

Notice that both ffmpeg and MPEG Streamclip handle the second segment roughly the same, however ffmpeg doesn't seem to respect the inpoint in the first input video. Note that in my MP4, there's one packet per frame.

ffmpeg has behaved this way from version 2.8 (perhaps before) up through 3.3.3. Am I missing a command-line or edit-list option? Surely others have been successfully using ffmpeg to split and combine videos (without recompressing).

watkipet
  • 445

1 Answers1

0

You can try adding -async 1

ffmpeg -f concat -async 1 -safe 0 -i one_through_ten_edit_list.txt -c copy one_through_ten_edited.mp4

See also reference docs at async - Resampler Options

async

For swr only, simple 1 parameter audio sync to timestamps using stretching, squeezing, filling and trimming. Setting this to 1 will enable filling and trimming, larger values represent the maximum amount in samples that the data may be stretched or squeezed for each second. Default value is 0, thus no compensation is applied to make the samples match the audio timestamps.

Kissaki
  • 773