5

I'm developing a "paperless" workflow and plan to save all files in PDF/A-1b format.

I'm trying to develop a simple batch file for converting PDF files that I create or receive to PDF/A-1b. Starting from this answer, I have the following batch file:

gswin32c ^
   -dPDFA ^
   -dNOOUTERSAVE ^
   -sProcessColorModel=DeviceCMYK ^
   -dUseCIEColor ^
   -sDEVICE=pdfwrite ^
   -o %2 ^
   -dPDFACompatibilityPolicy=1 ^
    "C:\Program Files (x86)\gs\gs9.07\mylib\PDFA_def.ps" ^
    %1

In PDFA_def.ps, I've tried a few different ICC profiles, including one I found on my system

C:/Windows/System32/spool/drivers/color/CalibratedDisplayProfile-5.icc

and sRGB_IEC61966-2-1_no_black_scaling.icc from color.org.

My test input file is a 1-page email printed from Microsoft Outlook 2010 using CutePDF 2.8 (which uses Ghostscript 8.15).

After converting with my batch file and Ghostscript 9.07, Adobe Reader thinks the output is PDF/A, but PDF/A-1b validation by pdf-tools.com fails with the message "The value of the key N is 4 but must be 3."

I have traced this back to the following construct in the PDF output file:

<</Filter/FlateDecode
/N 4/Length 2595>>stream

If I change /N 4 to /N 3, the "value of key N" message goes away. /N apparently represents the number of objects in the stream that follows this header. I don't know how to read the encoded stream so I don't understand what it contains nor why pdf-tools thinks it must only contain 3 objects.

A PDF/A printed using Bullzip, which also uses Ghostscript, also fails validation with the "key N is 4 but must be 3" message.

Does this have something to do with the color space? I'm out of my depth there. I think I'd be happy with a "plain" sRGB space. Ghostscipt docs say the PDF/A encoding must be CMYK. Adobe implies that either RGB or CMYK works for PDF/A. So I'm unclear about how to find an appropriate .icc profile.

Or maybe the validator is wrong and everything is fine?

Mark Berry
  • 1,557

2 Answers2

8

With the help of a GhostScript developer in this bug report, I was able to solve the /N problem. Lessons learned:

  • The GhostScript doc referenced in my question is out of date. The current doc, here, says that ProcessColorModel=DeviceRGB is okay.
  • ICC profiles describe a color space. Some valid color spaces are GRAY, RGB, and CMYK. You can check the color space of an ICC profile using the free ICC Profile Inspector.
  • In the section of the PDF file causing validation errors, /N represents the number of colorants.
  • The PDFA_def.ps file emits the /N value. The sample included with Ghostscript 9.07 only emits /N 1 (for ProcessColorModel=DeviceGray) or /N 4 (for any other ProcessColorModel).
  • My original test specified ProcessColorModel=DeviceCMYK which caused /N 4, but used an ICC profile describing an RGB color space. The validators correctly caught this discrepancy: I promised 4 colors but only described 3.

Most ICC profiles that I found for displays and office printers describe an RGB color space. (CMYK seems more specific to high-end printing presses and certain kinds of paper.) For my purposes, RGB is preferable. The following batch file converts a PDF file to PDF/A-1b with an RGB color space:

gswin32c ^
   -dPDFA ^
   -dNOOUTERSAVE ^
   -sProcessColorModel=DeviceRGB ^
   -dUseCIEColor ^
   -sDEVICE=pdfwrite ^
   -o %2 ^
   -dPDFACompatibilityPolicy=1 ^
    "C:\Program Files (x86)\gs\gs9.07\mylib\PDFA_def.ps" ^
    %1

In PDFA_def.ps, specify an ICC profile that describes an RGB color space, and change the section for defining an ICC profile as follows:

% Define an ICC profile :

[/_objdef {icc_PDFA} /type /stream /OBJ pdfmark
[{icc_PDFA} <</N systemdict /ProcessColorModel get /DeviceGray eq {1} {systemdict /ProcessColorModel get /DeviceRGB eq {3} {4} ifelse} ifelse >> /PUT pdfmark
[{icc_PDFA} ICCProfile (r) file /PUT pdfmark

The long line includes a nested ifelse statement that will detect ProcessColorModel=DeviceRGB and emit the appropriate /N 3. The resulting file should pass validation at pdf-tools.com.

Update: I've created a somewhat more capable batch program and published it in a blog post: Batch Convert PDF to PDF/A.

Mark Berry
  • 1,557
3

I would suggest to first re-test your problem on the latest version 9.07 of ghostscript, just in case this problem was already fixed.

If this doesn't help, it will take a real PDF guru to answer this problem. I suspect the problem has something to do with a conflict between the content of the .ps file and the parameters of the gswin32c command.

However, as the problematic file is generated by ghostscript, you have the right to post your question on the ghostscript Bugzilla page (registration required), where the developers will answer your question. If it is a bug in ghostscript, it will most probably be fixed in the next version.

In addition to the problem description as in your post, you should attach an example input .ps file and the resulting .pdf file. Try to minimize their sizes.

In the past I have reported several suspected ghostscript bugs on that forum and was always well-answered, and the real bugs I have found were all fixed.

harrymc
  • 498,455