Processing scanned PDF documents

anti_g · Post by **anti_g** » Tue Oct 20, 2009 4:05 pm

Hello,

I wonder if there are some special system requirements for scanners which create PDF documents. We have an interesting phenomenon here: some scanned documents work fine with annotations and text functions and some are "rejected" by certain text functions. The function GetAllText seems to be especially "picky".
The command "Find" works sometimes even if GetAllText fails.
Attached are 2 sample PDFs, the one (scanned with a Canon scanner) works, the other one does not even allow the highlighting.

It is clear to me that not all scanned documents can be processed correctly, but I need to know which ones, why, what are the limitations etc. in order to inform our customers.

Thanks,
Anton.

scannedFiles.ZIP

Tue Oct 20, 2009 4:21 pm

Hi Anton,

Actually scanning gives an image of a page without the possibility to select any text.

I suspect that when you scan to your Canon_DR_2580C.pdf it also uses some OCR (Optical Character Recognition) program provided which creates selectable text in those PDFs.

At this time we do not offer an OCR solution - though we have been working on our own OCR library for the past 4+ years - it is not available yet as a commercial release.

Processing scanned PDF documents

Processing scanned PDF documents

Re: Processing scanned PDF documents