Here is technical information about how Qoppa’s PDF library jPDFPreflight validates CharSet and CIDSet when checking against PDF/A-1 and PDF/A-2 standards. The PDF technical specs are very ambiguous on this subject.

We reviewed the PDF Association technote which reads on page 22: “Requirements on the use of CharSet and CIDSet entries in font dictionaries of subsetted fonts cause a lot of confusion and are not actually used in practice.” (Phew, this is confusing but really not so important after all…)

The technote quotes that for PDF/A-2 and PDF/A-3 the CIDSet or CharSet entries should “…list the character names of all glyphs present in the font program, regardless of whether a glyph in the font is referenced or used by the PDF or not…

We’re including below a description of our approach when validating CharSet and CIDSet and also include a comparison with other vendors, including VeraPDF and Adobe.

PDF/A-1 Verification

For Type1 subsetted simple font, CharSet:

  • veraPDF accepts any non-null entry (even empty CharSet entry)
  • Acrobat verifies that the set matches the font file
  • We are doing the same as Acrobat here

For CIDFonts, CIDSet:

  • veraPDF accepts any non-null entry just like for CharSet
  • Acrobat verifies against the set of CIDs actually used in content. This behavior is not justified by the spec but…
  • we are doing the same as Acrobat here

PDF/A-2 Verification

For Type1 subsetted simple font, CharSet:

    • everybody ensures set equality with the font file here (if CharSet entry is not missing, which is allowed in PDF/A-2)

For CIDFonts, CIDSet (if CIDSet is not missing which is allowed):

  • Basically everybody ensures set equality with the font file but when it comes to TrueType CIDFonts with a CIDToGIDMap that is not “Identity” veraPDF and Acrobat XI and DC all differ in their interpretation of the correct font file’s CID set:
    1. veraPDF uses all CIDs from the CIDtoGIDMap that actually map to a GID that is <= the max GID in the font file
    2. Acrobat XI actually doesn’t seem to accept any CIDSet as correct for non-identity CIDToGIDMaps
    3. Acrobat DC is similar to veraPDF in that all CIDs from the CIDtoGIDMap that map to a GID <= the max GID are counted in the set, however in addition Acrobat DC counts as valid CIDs the range from the max CID in the CIDtoGIDMap to the max GID from the font file. E.G., if the CIDtoGIDMap maps cids from 0 to 1, but the font file’s max GID is 2000, then the CIDSet would go up to CID 2000. This, in our opinion doesn’t make sense.
    4. We do the same as veraPDF, because we don’t think DC makes sense.

PDF/A-2 Conversion

When doing conversion in PDF/A-2 if we detect the CharSet or CIDSets as being incorrect then we just remove them. This is accepted by all verification software.