Feature Request - font debugging

mathog · Post by **mathog** » Fri Feb 04, 2011 5:46 pm

I have been on a quest lately to squeeze PDFs down in size. One way that can be achieved is by sticking with the base 14 fonts. Doing that typically reduces a 3 page text file from 0.5M to around 25KB. However, sticking with the base 14 fonts is easier said than done (see below), and finding the buried exceptions can be a pain. I have other tools (in python and Perl) for listing component sizes in PDFs (which fonts are used, how big they are)

ftp://saf.bio.caltech.edu/pub/software/ ... ary.tar.gz

and these tell me when there is a problem. But they cannot show me where that problem is. It would be a huge help if PDF Xchange viewer had some way to search by font or at least highlight them in some way.

Also, if fonts are used in a PDF which are not embedded there isn't much of a warning. At best the text shows up as a little square (from who knows which default font) instead of a letter, but these an be hard to find when there are just a couple of them scattered around in many pages. If we could set a default substitution font one could hopefully do something like override its font size or color, so that the these characters could be made to stand out by being huge, or being red.

Here is an example where this arises. In a particular MS Word document I did "select all" and then changed all the fonts to Times New Roman. Then printed that to a PDF using PDF Creator (which goes through Ghostscript). Opened that in PDF-Xchange viewer and look at the document properties, fonts. In addition to Times (for the text) and Symbol (for some bullets in a list) there was also one Arial and one TimesNewRoman. After much hunting (select text, text properties, examine fonts) it turned out that the Arial was a space generated by MS Word's automatic numbering scheme. That is,

i. blah
ii. blah

uses Arial for the space between the period and the blah. There was apparently no way to change this in Word, as the font there was already set to Times New Roman. The TimesNewRoman font was traced down to a single alpha in that font. Changed the one letter to an "a" and then changed the font to Symbol. Regenerated PDF and that eliminated that font, but the TimesNewRoman was still there.

Mon Feb 07, 2011 2:11 pm

Hello mathog,

I can not answer this myself, so have asked for some advice from the dev guys, and will be able to post here a more detailed answer later.
In the mean time, if you can supply us with a sample doc + pdf files for testing that will be quite helpful!

Regards,
Stefan

mathog · Post by **mathog** » Mon Feb 07, 2011 5:16 pm

Attached are example files, source .doc (in a .zip) and the resulting PDF.

The first example illustrates how Word 2003 autonumber uses Arial spaces even when the font on the autonumber and the entered text are both Times New Roman. The space is just a space so when it is written to a PDF it becomes a Helvetica space.

The second issue is Word 2003's handling of the "greater than or equal to" symbol. Using the Arial font, it is supposed to map to Helvetica, as in the preceding example. However, use "insert symbol" for this symbol puts in Unicode value 2265. This does not automatically map to Helvetica but instead requires the inclusion of a subset of Arial. (It also has the strange side effect of making that one character show up as font "MS PGothic" in Word even though everything around it is Arial. Even stranger, select that symbol, change the Font to Arial, and it stays "MP PGothic" - the font cannot be changed as it would be for other characters.) On the line below that is a >= using character 179 in the Symbol font, which acts as expected and stays as Symbol without requiring inclusion of any descriptor for Symbol.

List the fonts in the PDF and one sees:

Font (Object): Size (Description) PP: <page list>
1 ( 7): 401 (/IOGUVG+Times-Roman) PP: 1
2 ( 9): 372 (/JIEEHM+Helvetica) PP: 1
3 ( 11): 442 (/KPSHBO+Arial) PP: 1
4 ( 13): 213 (/SBZKDV+Symbol) PP: 1
5 ( 15): 509 (/UFQSLH+Cambria) PP: 1
6 ( 17): 525 (/HSAEUE+Calibri) PP: 1
7 ( 19): 262 (/RDZRPI+LucidaSans) PP: 1

The Arial entry is the result of the inclusion of the first >= symbol. Every other Arial character was mapped to Helvetica in the PDF.

The third example is just a variety of different fonts embedded in text. Here they are moderately easy to see, but that is not always the case, a single letter or two buried in text with an unintended font can be miserably hard to see.

Thanks.

Mon Feb 21, 2011 10:41 am

Hello mathog,

After discussing this with the devs, they were not too enthusiastic about this, and told me that this FR will have low priority, and if considered, it will definitely be later, after the initial release of Ver3 of the Viewer.
We however created a support ticket for this "#874: FR: font debugging" so it will not get forgotten.

I will post back here in the upcoming months when the status of the ticket changes. For now it is stalled until after ver3 is ready.

Best,
Stefan

Feature Request - font debugging

Feature Request - font debugging

Re: Feature Request - font debugging

Re: Feature Request - font debugging

Re: Feature Request - font debugging