Basic command for thumbnail filter is
ffmpeg -i in.mp4 -vf thumbnail=n=100 -vsync 0 -frame_pts 1 out%d.png
This will select one representative frame out of every 100 frames.
-vsync 0 preserves the source timestamps.
-frame_pts 1 encodes that timestamp in the output filename. So, if your video is 24 fps, and output filename is out322.png, then that frame was taken from timestamp 322/24 = 13.41s of the video.
These are the relevant functions in the filter,
The filter works with packed 8-bit RGB pixel frames i.e.
R1G1B1R2G2B2R3G3B3...
Each frame's histogram is calculated as follows,
// update current frame RGB histogram
for (j = 0; j < inlink->h; j++) {
for (i = 0; i < inlink->w; i++) {
hist[0*256 + p[i*3 ]]++;
hist[1*256 + p[i*3 + 1]]++;
hist[2*256 + p[i*3 + 2]]++;
}
p += frame->linesize[0];
}
For each pixel, three array elements' values are incremented. Their index being the color values (adjusted as per data layout shown above) of those pixel components.
Then, the average histogram of a cluster is calculated.
// average histogram of the N frames
for (j = 0; j < FF_ARRAY_ELEMS(avg_hist); j++) {
for (i = 0; i < nb_frames; i++)
avg_hist[j] += (double)s->frames[i].histogram[j];
avg_hist[j] /= nb_frames;
}
For each component color value, their count over all frames is averaged.
Then the 'best' frame is chosen,
// find the frame closer to the average using the sum of squared errors
for (i = 0; i < nb_frames; i++) {
sq_err = frame_sum_square_err(s->frames[i].histogram, avg_hist);
if (i == 0 || sq_err < min_sq_err)
best_frame_idx = i, min_sq_err = sq_err;
}
where the sum of squared errors is as follows
for (i = 0; i < HIST_SIZE; i++) {
err = median[i] - (double)hist[i];
sum_sq_err += err*err;
}
HIST_SIZE = 3 x 256 = 768.