OCR - Optical Character Recognition

The PDF-XChange Viewer for End Users
+++ FREE +++

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

makahajo
User
Posts: 5
Joined: Mon Aug 03, 2009 5:16 am

OCR - Optical Character Recognition

Post by makahajo »

Hi,
I would like to apply OCR to my pdf documents in PDF xchange PRO.

I have PDF xchange PRO version 2.0 (build 42.6).

With some documents the "select Text" tool does not work.

In my TIFF viewer (mspview.exe from microsoft) there is a prompt asking me if I want to apply OCR to the document.

I do not see an OCR command or option for PDF xchange PRO.

My work-around currently is to use FILE-EXPORT-EXPORT TO IMAGE. Choosing the TIFF option.
then open the document in mspview.exe and then apply OCR.
Obviously this is cumbersome.

Am I missing something - any suggestions?

Is OCR available for PDF xChange PRO?

Thanks in advance.
User avatar
Bhikkhu Pesala
User
Posts: 1776
Joined: Tue May 29, 2007 9:29 am

Re: OCR - Optical Character Recognition

Post by Bhikkhu Pesala »

No, you're not missing anything. I'm surprised anyone would expect a PDF viewer to provide OCR capabilities. Does Adobe Reader 9 do that? The route that you're taking is probably the best, though JPG would probably be just as good as TIFF for OCR, and a lot smaller. PNG would be slightly smaller than TIFF.
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: OCR - Optical Character Recognition

Post by Lzcat - Tracker Supp »

Actually JPEG is not good for OCR, especially with high compression mode (and very low quality in this case) - it is lossy format, and jpeg compression "destroys" contrast lines and curves, making small text unreadable. For OCR purposes a much better solution is to scan with twice higher DPI as a black and white image and then use CCITT compression (most OCR programs prefer 300 DPI as good compromise with file size, recognition time and quality).
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
User avatar
Bhikkhu Pesala
User
Posts: 1776
Joined: Tue May 29, 2007 9:29 am

Re: OCR - Optical Character Recognition

Post by Bhikkhu Pesala »

I don't mind being corrected when I'm wrong, but in this case you are misinformed.

Tested with PNG screen grab of your post zoomed to 200% in Opera using Kadmos plugin for IrfanView
The resulting screen grab is therefore the equivalent of 192 dpi

Actually JPEG is not g00d for OCR, espedally with high compres~on mode (and very low quality in this case) it is lossy format, and jpeg comp~on "destroy" contrast lines and curves, making small text unreadable. Fol OCR purpo~s much better solution is to scan with twice higher DP| into black and white image and than use CC|lT compres~on (most OCR programs prefer 300 DP| as g00d compromise with file size, r~ognition time and quality).

Tested with JPG saved @ 75 quality from IrfanView

Actuatly JPEG is not good for OCR, especially with high compression mode (and vely low quality in this case) - it is lossy format, and jpeg compression ''destroy'' contrast lines and curves, making small text unreadable. For OCR purposes much better solution is to scan with twice higher DPI into black and white image and than use CC|lT compression (most OCR programs prefer 300 DP| as good compromise with fi|e size, recognition time and quality).

From the above results, my conclusion is that JPG format is actually better than lossless PNG.
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: OCR - Optical Character Recognition

Post by Lzcat - Tracker Supp »

Can you provide both JPEG and PNG files? And please say which OCR software are you using.
For sure this is offtopic, but results looks very strange for me (especially in case of recognizing text, captured from screen - almost ideal case for an OCR).
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.