0

I would like to cut videos using ffmpeg for machine learning.

How can I ensure that if I cut (for example) 1s of video @25fps this will provide exactly 25 frames of synchronized audio and video?

I have seen that ffmpeg looks for keyframes at time of cutting, picking the nearest one. I had troubles as it was generating negative timestamps and filling the end of the cut video with copied frames.

I understand that metadata does not display real fps and so on.

So what would be the pipeline to obtain precise cuts with the exact amount of frames aligned with the audio stream?

Thaanks

1 Answers1

-1

Depending on the codec and container of the Video this might need a few counterintuitive steps: Many codecs simply do not directly allow random in- and outpoints.

  • First of all you need cut the video (ignoring the audio for a moment) this is best done using -ss inpoint and -pix_fmt yuv420p -an -f yuv4mpegpipe -frames:v 25 piping into either x264 or another instance of ffmpeg. This has proven to be a reliable way to cut the video stream in a frame-exact way.
  • For the audio, the easiest way is to convert to -c:a pcm_s16le and '-f s16le', then manipulate the result on a file level to contain the correct number of bytes from the correct offset.
  • A last ffmpeg pass can compress the audio if necessary or just mux it to the video. Since raw PCM doesn't contain timestamps, there is no potential for asynchronity.

This has proven a reliable way to put video from doubtfull sources (i.e. commercial clips from some advertiser) into a highly regulated linear stream (i.e. a TV channel) without introducing any artefacts.

Eugen Rieck
  • 20,637