OCR pdf text extract problem

quant · Post by **quant** » Thu Aug 28, 2008 11:29 pm

Could you please improve text selection/extraction for pdf files that are OCRed scanned pages? Please, see attached sample page. Pdf-exchange sometimes represents space as no space, and sometimes as new line. The highlighted or extracted text is then unreadable. This is also a problem when you search, because the "word that precedes the searched term" is not displayed in the search results.
No such problem in adobe, foxit, or some pdf indexing programs: all of them extract/highlight text as expected.

Thank you.

Mon Sep 01, 2008 2:48 pm

Thanks for the file.
Will work on improving text extraction.

OCR pdf text extract problem

OCR pdf text extract problem

Re: OCR pdf text extract problem