OCR pdf text extract problem

The PDF-XChange Viewer for End Users
+++ FREE +++

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

quant
User
Posts: 151
Joined: Fri Jan 18, 2008 2:48 pm

OCR pdf text extract problem

Post by quant »

Could you please improve text selection/extraction for pdf files that are OCRed scanned pages? Please, see attached sample page. Pdf-exchange sometimes represents space as no space, and sometimes as new line. The highlighted or extracted text is then unreadable. This is also a problem when you search, because the "word that precedes the searched term" is not displayed in the search results.
No such problem in adobe, foxit, or some pdf indexing programs: all of them extract/highlight text as expected.

Thank you.
You do not have the required permissions to view the files attached to this post.
User avatar
Ivan - Tracker Software
Site Admin
Posts: 3603
Joined: Thu Jul 08, 2004 10:36 pm

Re: OCR pdf text extract problem

Post by Ivan - Tracker Software »

Thanks for the file.
Will work on improving text extraction.
PDF-XChange Co Ltd. (Project Director)

When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.