Image-only PDFs (scanned documents)

mcanti · Post by **mcanti** » Tue May 03, 2011 6:15 am

Hello,

is there any easy way to find out if a given PDF is a image-only PDF? Like telling if the PDF is coming from a scanner?

Edit: I found this info as a Knowledge Base item:

Things that indicate a PDF might be image based include:
if you know it came from a scanner
if you cannot select text using the "Select Tool"
if you get no results searching for a word that you know is in the document
if you zoom the document greatly and it gets pixelated

But how can I answer these questions programatically?
And another question: when will the OCR functionality be available?

Best regards,
Cantemir

Post by **Vasyl - PDF-XChange** » Tue May 03, 2011 9:42 am

Hi, Cantemir.

if you know it came from a scanner

There is no way to know it.

if you cannot select text using the "Select Tool"
if you get no results searching for a word that you know is in the document

You can detect if document contains the selectable text by:

Code: Select all

object dataOut;
DoDocumentVerb(docId, "", "GetAllText", NULL, out dataOut, 0);
if (IS_NOT_EMPTY(dataOut))
{ 
     // document has selectable text
};

if you zoom the document greatly and it gets pixelated

There is no way to know it.

The new version V3 you will have more access to document structure/content for any analysis...

Best
Regards

Image-only PDFs (scanned documents)

Image-only PDFs (scanned documents)

Re: Image-only PDFs (scanned documents)