Could you please improve text selection/extraction for pdf files that are OCRed scanned pages? Please, see attached sample page. Pdf-exchange sometimes represents space as no space, and sometimes as new line. The highlighted or extracted text is then unreadable. This is also a problem when you search, because the "word that precedes the searched term" is not displayed in the search results.
No such problem in adobe, foxit, or some pdf indexing programs: all of them extract/highlight text as expected.
Thank you.
OCR pdf text extract problem
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
-
quant
- User
- Posts: 151
- Joined: Fri Jan 18, 2008 2:48 pm
OCR pdf text extract problem
You do not have the required permissions to view the files attached to this post.
-
Ivan - Tracker Software
- Site Admin
- Posts: 3603
- Joined: Thu Jul 08, 2004 10:36 pm
Re: OCR pdf text extract problem
Thanks for the file.
Will work on improving text extraction.
Will work on improving text extraction.
PDF-XChange Co Ltd. (Project Director)
When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.