Page 1 of 1

OCR Poor results

Posted: Thu Jun 06, 2024 2:15 pm
by Crookie
I am getting quite poor results from OCR.
I want to convert a scanned PDF document into an editable PDF, without changing anything at all, just have an editable reproduction, but the results are consistently inconsistent, with any hand written notes being converted to garbage.
If anyone has any ideas I would be most grateful, I have 50 installs waiting on the back of this.
Top image is original, the others are results I don't want

Original.png
Fault.png
Fault2.png

Re: OCR Poor results

Posted: Thu Jun 06, 2024 2:21 pm
by Dimitar - PDF-XChange
Hello Crookie,

Welcome to our Forum.

The OCR tool is not designed to recognize handwriting, but if you could give us a copy of the original document we will see what can be adjusted to get better results.

Regards.

Re: OCR Poor results

Posted: Fri Jun 07, 2024 6:19 am
by Crookie
This isn't just one document, this is just a sample I've been given.
We will be talking thousands, and it doesn't look like PDF-Xchange is up to it

Re: OCR Poor results

Posted: Fri Jun 07, 2024 7:49 am
by Stefan - PDF-XChange
Hello Crookie,

Unfortunately the ABBYY Fine Reader engine that our Enhanced OCR uses is really focused on other types of text and handwritten recognition is not it's strength. Tesseract (the engine behind our standard OCR) - might be handling such text slightly better - so please do give that one a try as well. Unfortunately we can not really improve those OCR engines on our end, so if you have thousands of handwritten documents to OCR - we might not be able to fully help!

Kind regards,
Stefan

Re: OCR Poor results

Posted: Mon Sep 09, 2024 8:03 am
by Loki@99
Hi,

Not handwritten but I also have poor OCR results with this file.
File_OCR.pdf
I would assume that the source has a quite bad quality but Microsoft Snipping Tool gave very good results. The difference is not even close.

PDFXCE
Language : French
Accuracy : Auto
Output : Fine Page content
image.png

Text extracted from Microsoft Snipping Tool text actions
image(1).png

Thanks for improving,

Re: OCR Poor results

Posted: Mon Sep 09, 2024 5:36 pm
by MedBooster
Maybe you could add an option to switch between Tesseract and Abbyy in the standard menu?
image.png

Re: OCR Poor results

Posted: Mon Sep 09, 2024 6:15 pm
by Paul - PDF-XChange
Hi, MedBooster

regards that sample we were sent, the issue is the original really is poor, even for human eyes!

image.png

I am afraid that switching engines "on the fly" so to speak has been rejected. That will remain in the settings as is I am afraid.