OCR Poor results

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

Crookie
User
Posts: 5
Joined: Thu Jun 06, 2024 2:01 pm

OCR Poor results

Post by Crookie »

I am getting quite poor results from OCR.
I want to convert a scanned PDF document into an editable PDF, without changing anything at all, just have an editable reproduction, but the results are consistently inconsistent, with any hand written notes being converted to garbage.
If anyone has any ideas I would be most grateful, I have 50 installs waiting on the back of this.
Top image is original, the others are results I don't want

Original.png
Fault.png
Fault2.png
You do not have the required permissions to view the files attached to this post.
User avatar
Dimitar - PDF-XChange
Site Admin
Posts: 2268
Joined: Mon Jan 15, 2018 9:01 am

Re: OCR Poor results

Post by Dimitar - PDF-XChange »

Hello Crookie,

Welcome to our Forum.

The OCR tool is not designed to recognize handwriting, but if you could give us a copy of the original document we will see what can be adjusted to get better results.

Regards.
Crookie
User
Posts: 5
Joined: Thu Jun 06, 2024 2:01 pm

Re: OCR Poor results

Post by Crookie »

This isn't just one document, this is just a sample I've been given.
We will be talking thousands, and it doesn't look like PDF-Xchange is up to it
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19913
Joined: Mon Jan 12, 2009 8:07 am

Re: OCR Poor results

Post by Stefan - PDF-XChange »

Hello Crookie,

Unfortunately the ABBYY Fine Reader engine that our Enhanced OCR uses is really focused on other types of text and handwritten recognition is not it's strength. Tesseract (the engine behind our standard OCR) - might be handling such text slightly better - so please do give that one a try as well. Unfortunately we can not really improve those OCR engines on our end, so if you have thousands of handwritten documents to OCR - we might not be able to fully help!

Kind regards,
Stefan
Loki@99
User
Posts: 558
Joined: Sat Dec 16, 2023 11:09 am

Re: OCR Poor results

Post by Loki@99 »

Hi,

Not handwritten but I also have poor OCR results with this file.
File_OCR.pdf
I would assume that the source has a quite bad quality but Microsoft Snipping Tool gave very good results. The difference is not even close.

PDFXCE
Language : French
Accuracy : Auto
Output : Fine Page content
image.png

Text extracted from Microsoft Snipping Tool text actions
image(1).png

Thanks for improving,
You do not have the required permissions to view the files attached to this post.
Major Stylus topics
- RemoveAnnotationsWithEraser T#6903
- MiniPopupMenuOnTextSelection T#6894
- AbnormalSpikes forum.pdf-xchange.com/viewtopic.php?p=179935&hilit=spikes#p179935
- ForceEraserPreview forum.pdf-xchange.com/viewtopic.php?t=42380
MedBooster
User
Posts: 1372
Joined: Mon Nov 15, 2021 8:38 pm

Re: OCR Poor results

Post by MedBooster »

Maybe you could add an option to switch between Tesseract and Abbyy in the standard menu?
image.png
You do not have the required permissions to view the files attached to this post.
My wishlist https://forum.pdf-xchange.com/viewtopic.php?p=187394#p187394
Disable SPACE page navigation, fix kb shortcut for highlighting advanced search tool search field, bookmarks with numbers, toolbar small icon size, AltGr/Ctrl+Alt keyboard issues
User avatar
Paul - PDF-XChange
Site Admin
Posts: 7370
Joined: Wed Mar 25, 2009 10:37 pm

Re: OCR Poor results

Post by Paul - PDF-XChange »

Hi, MedBooster

regards that sample we were sent, the issue is the original really is poor, even for human eyes!

image.png

I am afraid that switching engines "on the fly" so to speak has been rejected. That will remain in the settings as is I am afraid.
You do not have the required permissions to view the files attached to this post.
Best regards

Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com