2

I'm running GPL Ghostscript 9.27 version. I've compressed some PDF files (in Linux) through gs command with success, I thinked.

But after checking some particular PDFs, I've seen some modifications of the color in some pages, at least in two PDFs; in fact for example some files result to have some pages with words of entire paragraphs turned in red (besides the majority of pages left like the original, that has writings in black characters). I didn't know if the affected text was vectors, or raster (from a scan) with vector overlay.

Moreover another PDF that has some images in black and white (gray scale), now with the compression has the text (maybe as vectors) left of the same color (black), but has all the images turned in red and black. So here the problem seems different, I suppose, because affects just all the raster images. Maybe in the file where there is the raster problem there isn't the vector/text one.

Down here in the top, a part of the file with images affected, after (left) and before (right) compression; in the bottom left we have another file after compression with text (as vector) in red and in bottom right another page with text (as vector) in normal black color like the original.

Here an example.

The command I used left the version of PDF output like the original (that is specified in -dCompatibilityLevel). It is like this:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=original_version -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dSAFER -dBATCH -sOutputFile=file_out file_in

The program doesn't show me there is any error. I tried removing the -dQUIET or the -dSAFER option, but nothing changed. To be precise the -dSAFER option should have prevented that gs change the PDF integrity.

So, how to avoid this unexpected changes with Ghostscript? What is the cause of this problem?


I've found trying the command pdfimages -list <pdf_file> with the PDF that has the images problem, that all of them have image color space of type "Separation" (column color = sep). But the same command used with other various PDFs which have just text problem (or with no problems), have image color space different (like rgb, cmyk, gray, etc.). I don't know a lot, but I suspect this difference has consequences related to those gs problems.


Thanks to Yorik for asking some explanations and clarifications.

bonzo
  • 51

1 Answers1

3

The problem is probably caused by a bug in version 9.27 so if you can, update Ghostscript and jump down here to P.S.. Else if you use this version continue to read.

I finally found one solution to all the problems. I searched for colors in Ghostscript online manual in PDF output and found the switch ColorConversionStrategy which changes color space (I believe). The choices are: LeaveColorUnchanged, Gray, RGB, CMYK or UseDeviceIndependentColor. Then I tried with the PDF whose images was changed the command:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=original_version -dPDFSETTINGS=/ebook -dNOPAUSE -dQUIET -dSAFER -dBATCH -sColorConversionStrategy=LeaveColorUnchanged -sOutputFile=file_out file_in

and the output now has the right original colors. The other options did this (the original images were grey scale): Gray the images remain like the original, RGB images changed in red and black, CMYK images changed in white and cyan, UseDeviceIndependentColor remain the same (but manual says it is less compatible with ps).

I tried the same command with all the PDFs affected by vector text color change. It always fixed the problem too. I retried RGB option and caused the problem.

So the best option between all in these cases I think is -sColorConversionStrategy=LeaveColorUnchanged.
I think is good to include this option as a norm, in case you'll use /screen or /ebook setting (at least in version 9.27; read P.S. below) .

The matter was that the program with /ebook setting sets automatically ColorConversionStrategy to RGB that in my case causes the problem. So this switch generates less (or no) troubles with LeaveColorUnchanged instead.


About the cause I don't know specifically, but the first suspect is a problem in the compression of images with color space Separation.
I noticed through pdfimages -list that color space "Separation" in input PDF remains in output (but /ebook should have changed it).

P.S.: Probably according to this answer (to similar problem) I discovered just now, there is a bug in Ghostscript version 9.27.
My workaround for me works, but the problem should haven't happened normally. Probably 9.50 version works well. The current release is 9.56.1, but I'm running a Debian based system so it's not much updated.


Edit note: Excuse me I'm not an expert so I thinked that selectable text could be only a vector, but it can be also a raster that is overlayed by vectors through OCR, and I think this was the case (as I suppose color spaces regard only raster). So I edited a bit my sentences.

bonzo
  • 51