1

I have 3 million JPG files stored in a Linux CentOS 6 server.

I want to change quality to %50 file size over 1 megabyte. I wrote this command but got an "argument list too long" error:

$ find -type f -name "*..jpg" -size +1M | xargs mogrify -quality 50 *.jpg
bash: /usr/bin/xargs: Argument list too long

How can I change the quality of millions of files?

slhck
  • 235,242

4 Answers4

1

When using find and xargs, you don't need to name the files for xargs. It will get the list of files from find:

find -print0 -type f -name '*.jpg' -size +1M | xargs -0 -n100 mogrify -quality 50

-n100 will process the images by 100s. -print0 and -0 will make the pipe work even if the filenames contain whitespace.

You can also call mogrify directly from find, ideally if it supports the + ending for exec:

find  -type f -name '*.jpg' -size +1M -exec mogrify -quality 50 {} +
choroba
  • 20,299
1

xargs supports a -n argument to limit the amount of arguments passed to whatever it calls:

find -type f -name '*.jpg' -size +1M -print0 | xargs -0 -n1 mogrify -quality 50

This will launch mogrify once per image. As mogrify can only process one file at the time, this is the way to go.

0

A cross-platform solution with Python+convert: it will convert all PDF files of the current directory into PNG files (you can change to JPG if you prefer) multithreadedly.

from __future__ import print_function
import os
import glob
import multiprocessing      

def convert_to_png(pdf_filepath):
    '''
    Convert PDF file to PNG file
    '''
    png_filepath = '{0}.png'.format(pdf_filepath[:-4])
    print('pdf_filepath: {0}'.format(pdf_filepath))
    print('png_filepath: {0}'.format(png_filepath))
    command = 'convert -background white -alpha off -geometry 1600x1600 -density 200x200 -quality 100 -resize 800x {0} {1}'.format(pdf_filepath, png_filepath)
    print(command)
    os.system(command)

def main():
    pdf_filepaths = glob.iglob(os.path.join('.','*.pdf'))
    pool = multiprocessing.Pool(processes=4)
    pool.map(convert_to_png, pdf_filepaths)
    pool.close()
    pool.join()   
    print('done')

if __name__ == "__main__":
    main()
    #cProfile.run('main()') # if you want to do some profiling

This requires Imagemagick and Ghostscript to be installed. Works on Linux/Mac OS X/Microsoft Windows.

If you prefer to add the filename on each image, you can replace the command in convert_to_png() by:

command = 'convert  -background white -alpha off -geometry 1600x1600 -density 200x200 -quality 100 -annotate +50+50 {2} -resize 800x {0} {1}'.format(pdf_filepath, png_filepath, os.path.basename(pdf_filepath))

(See -annotate documentation)

Franck Dernoncourt
  • 24,246
  • 64
  • 231
  • 400
0

As mentioned on SO, you could also do:

$ find -type f -name "*..jpg" -size +1M > my_jpeg.txt
$ mogrify -quality 50 @my_jpegs.txt
malat
  • 1,374