Image-only PDFs (scanned documents)

PDF-XChange Viewer SDK for Developer's
(ActiveX and Simple DLL Versions)

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

mcanti
User
Posts: 10
Joined: Fri Apr 08, 2011 7:54 am

Image-only PDFs (scanned documents)

Post by mcanti »

Hello,

is there any easy way to find out if a given PDF is a image-only PDF? Like telling if the PDF is coming from a scanner?

Edit: I found this info as a Knowledge Base item:
Things that indicate a PDF might be image based include:
  • if you know it came from a scanner
    if you cannot select text using the "Select Tool"
    if you get no results searching for a word that you know is in the document
    if you zoom the document greatly and it gets pixelated
But how can I answer these questions programatically?
And another question: when will the OCR functionality be available?

Best regards,
Cantemir
User avatar
Vasyl - PDF-XChange
Site Admin
Posts: 2448
Joined: Thu Jun 30, 2005 4:11 pm

Re: Image-only PDFs (scanned documents)

Post by Vasyl - PDF-XChange »

Hi, Cantemir.
if you know it came from a scanner
There is no way to know it.
if you cannot select text using the "Select Tool"
if you get no results searching for a word that you know is in the document
You can detect if document contains the selectable text by:

Code: Select all

object dataOut;
DoDocumentVerb(docId, "", "GetAllText", NULL, out dataOut, 0);
if (IS_NOT_EMPTY(dataOut))
{ 
     // document has selectable text
};
if you zoom the document greatly and it gets pixelated
There is no way to know it.

The new version V3 you will have more access to document structure/content for any analysis...

Best
Regards
PDF-XChange Co. LTD (Project Developer)

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.