Use ffmpeg for thumbnail selections

Question

This is a follow-up question from HERE. I want to sample k frames out of the n number of frames in a given video using FFMpeg and its thumbnail filter. There is a post here providing some insights how to select meaningful frames. However, in that question, the requirement was to select ones with scene changes higher than %40 threshold. In my case, it is opposite. I believe we should pick frames with lowest scene changes compared to their previous ones (usually in a video when something is focused, the scene is more stable and camera doesn't move much).

How to do this with FFMpeg? Can we also cluster the frames and pick representatives?

Gyan · Accepted Answer · 2018-07-06T14:22:27.457

Basic command for thumbnail filter is

ffmpeg -i in.mp4 -vf thumbnail=n=100 -vsync 0 -frame_pts 1 out%d.png

This will select one representative frame out of every 100 frames.

-vsync 0 preserves the source timestamps.

-frame_pts 1 encodes that timestamp in the output filename. So, if your video is 24 fps, and output filename is out322.png, then that frame was taken from timestamp 322/24 = 13.41s of the video.

These are the relevant functions in the filter,

The filter works with packed 8-bit RGB pixel frames i.e.

R1G1B1R2G2B2R3G3B3...

Each frame's histogram is calculated as follows,

// update current frame RGB histogram
for (j = 0; j < inlink->h; j++) {
    for (i = 0; i < inlink->w; i++) {
        hist[0*256 + p[i*3    ]]++;
        hist[1*256 + p[i*3 + 1]]++;
        hist[2*256 + p[i*3 + 2]]++;
    }
    p += frame->linesize[0];
}

For each pixel, three array elements' values are incremented. Their index being the color values (adjusted as per data layout shown above) of those pixel components.

Then, the average histogram of a cluster is calculated.

// average histogram of the N frames
for (j = 0; j < FF_ARRAY_ELEMS(avg_hist); j++) {
    for (i = 0; i < nb_frames; i++)
        avg_hist[j] += (double)s->frames[i].histogram[j];
    avg_hist[j] /= nb_frames;
}

For each component color value, their count over all frames is averaged.

Then the 'best' frame is chosen,

// find the frame closer to the average using the sum of squared errors
for (i = 0; i < nb_frames; i++) {
    sq_err = frame_sum_square_err(s->frames[i].histogram, avg_hist);
    if (i == 0 || sq_err < min_sq_err)
        best_frame_idx = i, min_sq_err = sq_err;
}

where the sum of squared errors is as follows

for (i = 0; i < HIST_SIZE; i++) {
    err = median[i] - (double)hist[i];
    sum_sq_err += err*err;
}

HIST_SIZE = 3 x 256 = 768.

Use ffmpeg for thumbnail selections

1 Answers1