XREF Table Corrupt

PDF-XChange Viewer SDK for Developer's
(ActiveX and Simple DLL Versions)

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

HJBrown
User
Posts: 38
Joined: Fri Feb 06, 2009 12:55 pm

XREF Table Corrupt

Post by HJBrown »

I have attached two PDF files which we received from two different customers of ours. These files became corrupt after saving in the Trakker Viewer SDK. Upon looking at these two files in WordPad, they both have a truncated XREF table.[
You do not have the required permissions to view the files attached to this post.
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: XREF Table Corrupt

Post by Lzcat - Tracker Supp »

Yes, both files are truncated.

And what we can say?

The Viewer does not truncate files, so there should be some other issues.

We will need a small sample which reproduce's the problem, otherwise we won't be able to help.

I can recover both of these files, but I can't say why they are truncated.
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
HJBrown
User
Posts: 38
Joined: Fri Feb 06, 2009 12:55 pm

Re: XREF Table Corrupt

Post by HJBrown »

Unfortunately this problem is impossible to reproduce. It is very random and happens a very small percentage of the time.

Could you share with me how you are able to recover these files?
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: XREF Table Corrupt

Post by Lzcat - Tracker Supp »

Ok, but this fix should be made "by hand"
You will need to know basic information about pdf file format - what is xref, trailer, dictionary, indirect object reference, document catalog.
Also you will need a binary file editor, but not a text editor, except those which edit files in binary mode and do not touch text which was not edited. For example notepad/wordpad/word are not suitable (they may convert new line characters, which is not acceptable because pdf is binary format), but notepad++ may be used. I'm using HexWorkshop.
Both problem files have a truncated xref table and therefore the trailer dictionary is missing. All that we need to do is recover the trailer dictionary. To do this we will need to locate document catalog (required) and the information dictionary (optional), collect the corresponding object numbers and record them. Then we will need append the following to the file:
1. One or more newline characters.
2. Keyword trailer followed by a newline character.
3. Dictionary which will contain indirect references to the document catalog (and info dictionary, if any).
After this we will need to save the modified file and now it can be opened in the Viewer. Viewer will say that the file is broken, but it will open the file and allow to you resave.
For example to file corrupt-1.pdf you will need append following.

Code: Select all


trailer
<</Root 2 0 R/Info 1 0 R>>
This means that document catalog is located in object number 2 and information dictionay in object number 1. You may easily find the document root object by searching string /Catalog preceded by string /Type. However some pdf files may not contain this pair, so you will need another way to locate them. When file is saved using Viewer (AX) bot info dictionary and document catalog are one of the first objects.
HTH
PS. Looks like the files were truncated after saving, so this may be not viewer problem, but other software which monitor files. Or maybe your program, if you are using IStream interface to read/save files.
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.