PDF not searchable (although it contains copiable text)

The PDF-XChange Viewer for End Users
+++ FREE +++

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

Victor Warner
User
Posts: 143
Joined: Wed Sep 23, 2009 8:15 am

PDF not searchable (although it contains copiable text)

Post by Victor Warner »

I have a pdf which I obtained from a UK government website which I cannot find text within (using Find).

There are some security measures attached to the file (Page Extraction not be allowed).

(For the sake of completeness I tried finding text using Adobe Acrobat 10 and also Skim (an Apple Mac PDF viewer which allows the adding of notes etc). They also where unable to find text).

Although this is not purely a PDFX-Change problem it does affect the program.

Does the security measures applied simply mean that no pdf program can find text in it?

The file is attached.
You do not have the required permissions to view the files attached to this post.
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19930
Joined: Mon Jan 12, 2009 8:07 am

Re: PDF not searchable (although it contains copiable text)

Post by Stefan - PDF-XChange »

Hello Victor,

Yes - the problem is in the file, not in our Viewer - as you've already noticed the same problem happens with other PDF reading software.

The problem is not only with "finding" text but also with copy/pasting it - try that and copy/paste some text from this file to e.g. notepad and you will see some gibberish characters.

So if you perform a search entering this gibberish in the viewer's search field - it will find it inside the document.

Best,
Stefan
srn123
User
Posts: 28
Joined: Fri Jul 31, 2009 11:52 am

Re: PDF not searchable (although it contains copiable text)

Post by srn123 »

Is that because of the font encoding? The font used is embedded and is not substituted with a font that is available on the local computer. I have seen similar issues with other documents, but they typically involve TrueType fonts.
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19930
Joined: Mon Jan 12, 2009 8:07 am

Re: PDF not searchable (although it contains copiable text)

Post by Stefan - PDF-XChange »

Hi RamNarayan,

Text is stored inside a PDF file in a very specific way, and it's possible that such text is perfectly well rendered, but when the additional information required for it's extraction is not included - it is also completely unusable for anything else than viewing it on screen (or OCRing :) ).

Best,
Stefan