Q: When I convert my document from PDF to HTML / SVG, it seems that some of the ‘e’ characters have been replaced with another accented e.
A: We looked into your PDF document and we noticed that there are two glyphs mapping to the same Unicode character (lowercase e):
GID 65 -> e (e char) -> U+0065
GID E9 -> é (accented e char) -> U+0065
SVG content uses Unicode and jPDFWeb does output the correct lower case e in the output SVG / HTML file. But jPDFWeb also recreates a font to embed to make sure that the SVG renders exactly the same as the original PDF. And since only one glyph can be mapped to the Unicode character, we have to choose one of the glyphs.
One way to see that the unicode mapping is invalid in the PDF document:
• Using Adobe Acrobat, look into the document structure to inspect the font and Unicode mapping.
• Try and copy text containing the e accented character from Adobe Reader or PDF Studio to Word and you will will get a non-accented e.
• Try and export the PDF to HTML in Adobe and note that the e accented character will export to non accented e.
One way to see that jPDFWeb is converting to the correct Unicode character:
• Try and remove the embedded font from the SVG and replace it with another font, you will see an ‘e’ character instead of the é
The upstream cause for this bad PDF font information could be the original font typeface or the software used to create the PDF file with this font.