Processing scanned PDF documents

PDF-XChange Viewer SDK for Developer's
(ActiveX and Simple DLL Versions)

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

anti_g
User
Posts: 22
Joined: Thu Apr 02, 2009 9:55 am

Processing scanned PDF documents

Post by anti_g »

Hello,

I wonder if there are some special system requirements for scanners which create PDF documents. We have an interesting phenomenon here: some scanned documents work fine with annotations and text functions and some are "rejected" by certain text functions. The function GetAllText seems to be especially "picky".
The command "Find" works sometimes even if GetAllText fails.
Attached are 2 sample PDFs, the one (scanned with a Canon scanner) works, the other one does not even allow the highlighting.

It is clear to me that not all scanned documents can be processed correctly, but I need to know which ones, why, what are the limitations etc. in order to inform our customers.

Thanks,
Anton.
scannedFiles.ZIP
You do not have the required permissions to view the files attached to this post.
Corwin - Tracker Sup
User
Posts: 664
Joined: Tue Nov 14, 2006 12:23 pm

Re: Processing scanned PDF documents

Post by Corwin - Tracker Sup »

Hi Anton,

Actually scanning gives an image of a page without the possibility to select any text.

I suspect that when you scan to your Canon_DR_2580C.pdf it also uses some OCR (Optical Character Recognition) program provided which creates selectable text in those PDFs.

At this time we do not offer an OCR solution - though we have been working on our own OCR library for the past 4+ years - it is not available yet as a commercial release.