How to accurately split and combine videos with ffmpeg respecting inpoint/outpoint

Question

I'm learning how to split and combine videos with ffmpeg. Presently I'm practicing on a 799 KB MP4, one_through_ten-timestamped.mp4, where each frame has a burnt-in "frame number" (up to frame 40, the burnt-in number advances only every other frame).

The command I use is:

ffmpeg -f concat -i one_through_ten_edit_list.txt -c copy one_through_ten_edited.mp4

Where one_through_ten_edit_list.txt is:

file one_through_ten-timestamped.mp4
inpoint 00:00:01.500
outpoint 00:00:05.000
file one_through_ten-timestamped.mp4
inpoint 00:00:09.500
outpoint 00:00:13.000

When I perform this command in ffmpeg the output video has several issues:

The output starts from the beginning of the source video instead of 1.5 s in.
The audio after the second input video is concatenated is delayed (when played on QuickTime Player. It's fine on ffplay).

When I perform these same edits in MPEG Streamclip, the resulting video plays as expected. Here's some tabular output from ffprobe on one_through_ten-timestamped.mp4 which shows where the ffmpeg video and MPEG Streamclip differ in their concatenation points:

Notice that both ffmpeg and MPEG Streamclip handle the second segment roughly the same, however ffmpeg doesn't seem to respect the inpoint in the first input video. Note that in my MP4, there's one packet per frame.

ffmpeg has behaved this way from version 2.8 (perhaps before) up through 3.3.3. Am I missing a command-line or edit-list option? Surely others have been successfully using ffmpeg to split and combine videos (without recompressing).

score 0 · Answer 1 · edited Oct 07 '24 at 09:28

You can try adding -async 1

ffmpeg -f concat -async 1 -safe 0 -i one_through_ten_edit_list.txt -c copy one_through_ten_edited.mp4

See also reference docs at async - Resampler Options

async

For swr only, simple 1 parameter audio sync to timestamps using stretching, squeezing, filling and trimming. Setting this to 1 will enable filling and trimming, larger values represent the maximum amount in samples that the data may be stretched or squeezed for each second. Default value is 0, thus no compensation is applied to make the samples match the audio timestamps.

How to accurately split and combine videos with ffmpeg respecting inpoint/outpoint

1 Answers1

Linked