OCR results from images in pdf

This Forum is for the use of End Users requiring help and assistance for Tracker Software's PDF-Tools.

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Stefan - PDF-XChange

ArminS
User
Posts: 9
Joined: Tue Apr 26, 2016 6:51 am

OCR results from images in pdf

Post by ArminS »

Hello :)

as you can see in the image the OCR results are not that good here. Is the text too small? When doing OCR, I activated English and German as the languages and high quality. Slightly bigger text on white background had better results but they were not good, too.

(To make the text visible, I moved the new layer to the bottom and removed text formation.)

Image
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19930
Joined: Mon Jan 12, 2009 8:07 am

Re: OCR results from images in pdf

Post by Stefan - PDF-XChange »

Hello ArminS,

Indeed there were some issues with the OCR engine in 317.0. Please update to 317.1 where this should have been resolved.

Regards,
Stefan
ArminS
User
Posts: 9
Joined: Tue Apr 26, 2016 6:51 am

Re: OCR results from images in pdf

Post by ArminS »

Ok thanks.
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19930
Joined: Mon Jan 12, 2009 8:07 am

Re: OCR results from images in pdf

Post by Stefan - PDF-XChange »

:)
ArminS
User
Posts: 9
Joined: Tue Apr 26, 2016 6:51 am

Re: OCR results from images in pdf

Post by ArminS »

I finally tested the new version 317.1 of PDF XChange. It is way better than before. The right picture still contains ~3 mistakes per line and the 1:1 sized picture of the "About" has really weird results.
Left example uses the settings: High quality Language only English.
Right example uses the settings: High quality Language only German.

Image
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm

Re: OCR results from images in pdf

Post by Will - Tracker Supp »

Hi ArminS,

Please try using Medium accuracy - as counter-intuitive as it is, Medium often produces better results.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
ArminS
User
Posts: 9
Joined: Tue Apr 26, 2016 6:51 am

Re: OCR results from images in pdf

Post by ArminS »

Indeed, thanks. When I use the same purple image, the text results are nearly without any mistakes at all.
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19930
Joined: Mon Jan 12, 2009 8:07 am

Re: OCR results from images in pdf

Post by Stefan - PDF-XChange »

Glad to hear that ArminS,

When you use "Medium" the OCR tool relies more on dictionaries, and when you use High - it tries to recognize each letter on it's own - so indeed for normal text - Medium gives better results. For other unusual strings (e.g. license keys of some ID numbers - letter and number combinations) - High might be better.

Regards,
Stefan