XREF Table Corrupt

HJBrown · Post by **HJBrown** » Thu Jul 02, 2009 7:13 pm

I have attached two PDF files which we received from two different customers of ours. These files became corrupt after saving in the Trakker Viewer SDK. Upon looking at these two files in WordPad, they both have a truncated XREF table.[

Fri Jul 03, 2009 6:55 am

Yes, both files are truncated.

And what we can say?

The Viewer does not truncate files, so there should be some other issues.

We will need a small sample which reproduce's the problem, otherwise we won't be able to help.

I can recover both of these files, but I can't say why they are truncated.

HJBrown · Post by **HJBrown** » Fri Jul 03, 2009 2:14 pm

Unfortunately this problem is impossible to reproduce. It is very random and happens a very small percentage of the time.

Could you share with me how you are able to recover these files?

Sat Jul 04, 2009 7:44 am

Ok, but this fix should be made "by hand"
You will need to know basic information about pdf file format - what is xref, trailer, dictionary, indirect object reference, document catalog.
Also you will need a binary file editor, but not a text editor, except those which edit files in binary mode and do not touch text which was not edited. For example notepad/wordpad/word are not suitable (they may convert new line characters, which is not acceptable because pdf is binary format), but notepad++ may be used. I'm using HexWorkshop.
Both problem files have a truncated xref table and therefore the trailer dictionary is missing. All that we need to do is recover the trailer dictionary. To do this we will need to locate document catalog (required) and the information dictionary (optional), collect the corresponding object numbers and record them. Then we will need append the following to the file:
1. One or more newline characters.
2. Keyword trailer followed by a newline character.
3. Dictionary which will contain indirect references to the document catalog (and info dictionary, if any).
After this we will need to save the modified file and now it can be opened in the Viewer. Viewer will say that the file is broken, but it will open the file and allow to you resave.
For example to file corrupt-1.pdf you will need append following.

Code: Select all


trailer
<</Root 2 0 R/Info 1 0 R>>

This means that document catalog is located in object number 2 and information dictionay in object number 1. You may easily find the document root object by searching string /Catalog preceded by string /Type. However some pdf files may not contain this pair, so you will need another way to locate them. When file is saved using Viewer (AX) bot info dictionary and document catalog are one of the first objects.
HTH
PS. Looks like the files were truncated after saving, so this may be not viewer problem, but other software which monitor files. Or maybe your program, if you are using IStream interface to read/save files.

XREF Table Corrupt

XREF Table Corrupt

Re: XREF Table Corrupt

Re: XREF Table Corrupt

Re: XREF Table Corrupt