Can you directly call many online OCRs, such as Baidu OCR.

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: Daniel - PDF-XChange, PDF-XChange Support, Vasyl - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

Post Reply
softschool
User
Posts: 15
Joined: Thu Jan 19, 2023 11:45 am

Can you directly call many online OCRs, such as Baidu OCR.

Post by softschool »

Can PDF-XChange Editor directly call many online OCRs, such as Baidu OCR, to replace the pre-installed OCR in the software? Because some SHX English and Chinese font PDFs generated by AutoCAD, the recognition effect of using the preset OCR is terrible, and the ABBYY FineReader PDF is also very poor, but the effect of using Baidu OCR is very good, and the accuracy is almost 95%!

PDF Documentation for Experiments
https://drive.google.com/file/d/1525PS8Sth97vWBed4TWhx9ieTUA9dtD-/view?usp=share_link
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19793
Joined: Mon Jan 12, 2009 8:07 am
Contact:

Re: Can you directly call many online OCRs, such as Baidu OCR.

Post by Stefan - PDF-XChange »

Hello softschool,

No - I am afraid that because OCR is quite a resource heavy process - it has to be performed on your machine.
If you need to use an online tool - you can use the Editor to export images of the original PDF pages (with a customizable resolution), and then pass those to the online OCR.

Have you tried the different settings options in the Editor's OCR window - are all of them producing bad results?
Do you have a sample file you could share?

Kind regards,
Stefan
softschool
User
Posts: 15
Joined: Thu Jan 19, 2023 11:45 am

Re: Can you directly call many online OCRs, such as Baidu OCR.

Post by softschool »

https://drive.google.com/file/d/1525PS8Sth97vWBed4TWhx9ieTUA9dtD-/view?usp=share_link

You can try this PDF document, neither PDF-XChange nor ABBYY FineReader can recognize it well, I know Baidu OCR can recognize it well, but it is not an application or a web page, it is just a free or paid cloud service!

This is the Chinese introduction page

Baidu OCR general text recognition (high precision version with location)
https://ai.baidu.com/ai-doc/OCR/tk3h7y2aq
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19793
Joined: Mon Jan 12, 2009 8:07 am
Contact:

Re: Can you directly call many online OCRs, such as Baidu OCR.

Post by Stefan - PDF-XChange »

Hello softschool,

As you said - it is a cloud tool - so there is a server somewhere that has enough processing power to OCR the content you upload to this service. We do not have plans to include another OCR engine in our products for now, and I am sorry if the ones available are not getting correct recognition of your files!

Kind regards,
Stefan
softschool
User
Posts: 15
Joined: Thu Jan 19, 2023 11:45 am

Re: Can you directly call many online OCRs, such as Baidu OCR.

Post by softschool »

https://drive.google.com/file/d/1525PS8Sth97vWBed4TWhx9ieTUA9dtD-/view?usp=share_link

This is a PDF document generated by AutoCAD, and because it uses SHX fonts, even ABBYY cannot recognize it. Can you contact ABBYY to add recognition of this font to improve the recognition of your OCR engine?

Because there are so many PDF documents in this AutoCAD format, our translation industry often has to face this kind of PDF documents. It is very difficult to translate and typesetting. The most troublesome thing is the work of converting OCR into real text. I hope you can provide The solution for this!
User avatar
Paul - PDF-XChange
Site Admin
Posts: 7356
Joined: Wed Mar 25, 2009 10:37 pm
Contact:

Re: Can you directly call many online OCRs, such as Baidu OCR.

Post by Paul - PDF-XChange »

Hi softschool,

that is indeed a heavy OCR job! I will pass this on to the team to see what they think we or ABBYY can do.

warm regards
Best regards

Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
User avatar
Jensen Head
User
Posts: 529
Joined: Mon Sep 13, 2021 8:12 am

Re: Can you directly call many online OCRs, such as Baidu OCR.

Post by Jensen Head »

I came across an interesting article "OCR in 2024: Benchmarking Text Extraction/Capture Accuracy" [1], and instead of creating a new thread, I decided to post a link to it here in a thread that discusses alternative OCR engines and cloud OCR services.
_
Overall_1-612x304.png
Category 1 – Web page screenshots that include texts: This category includes screenshots from random Wikipedia pages and Google search results with random queries.
Category 2 – Handwriting: This category includes random photos that include different handwriting styles.
Category 3 – Receipts, invoices, and scanned contracts: This category includes a random collection of receipts, handwritten invoices, and scanned insurance contracts collected from the internet.
I was impressed by the quality of Amazon Textract and Google Cloud Platform Vision API in the case of handwritten text recognition, for which even Abbyy FineReader 16 is not suitable (only Microsoft Azure Computer Vision API is worse than it). OCR form recognition from Amazon and Google also does a better job than the latest version of Abbyy's engine.

[1] https://research.aimultiple.com/ocr-accuracy/
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19793
Joined: Mon Jan 12, 2009 8:07 am
Contact:

Re: Can you directly call many online OCRs, such as Baidu OCR.

Post by Stefan - PDF-XChange »

Hello Jensen Head,

Thanks for sharing that review with us.
Yes ABBYY's engine is more optimized towards recognizing typed text than hand written one, so if you have lots of documents with handwritten text in them you will probably need to have a specialized tool for that. Overall - given that in general most documents that need OCRing are already not handwritten - we are happy with how the OCR engines we use and offer to you are handling the tasks at hand!

Kind regards,
Stefan
Post Reply