Page 1 of 1

Can you directly call many online OCRs, such as Baidu OCR.

Posted: Sat Feb 18, 2023 6:29 pm
by softschool
Can PDF-XChange Editor directly call many online OCRs, such as Baidu OCR, to replace the pre-installed OCR in the software? Because some SHX English and Chinese font PDFs generated by AutoCAD, the recognition effect of using the preset OCR is terrible, and the ABBYY FineReader PDF is also very poor, but the effect of using Baidu OCR is very good, and the accuracy is almost 95%!

PDF Documentation for Experiments
https://drive.google.com/file/d/1525PS8Sth97vWBed4TWhx9ieTUA9dtD-/view?usp=share_link

Re: Can you directly call many online OCRs, such as Baidu OCR.

Posted: Mon Feb 20, 2023 4:07 pm
by Stefan - PDF-XChange
Hello softschool,

No - I am afraid that because OCR is quite a resource heavy process - it has to be performed on your machine.
If you need to use an online tool - you can use the Editor to export images of the original PDF pages (with a customizable resolution), and then pass those to the online OCR.

Have you tried the different settings options in the Editor's OCR window - are all of them producing bad results?
Do you have a sample file you could share?

Kind regards,
Stefan

Re: Can you directly call many online OCRs, such as Baidu OCR.

Posted: Tue Feb 21, 2023 10:03 am
by softschool
https://drive.google.com/file/d/1525PS8Sth97vWBed4TWhx9ieTUA9dtD-/view?usp=share_link

You can try this PDF document, neither PDF-XChange nor ABBYY FineReader can recognize it well, I know Baidu OCR can recognize it well, but it is not an application or a web page, it is just a free or paid cloud service!

This is the Chinese introduction page

Baidu OCR general text recognition (high precision version with location)
https://ai.baidu.com/ai-doc/OCR/tk3h7y2aq

Re: Can you directly call many online OCRs, such as Baidu OCR.

Posted: Tue Feb 21, 2023 2:43 pm
by Stefan - PDF-XChange
Hello softschool,

As you said - it is a cloud tool - so there is a server somewhere that has enough processing power to OCR the content you upload to this service. We do not have plans to include another OCR engine in our products for now, and I am sorry if the ones available are not getting correct recognition of your files!

Kind regards,
Stefan

Re: Can you directly call many online OCRs, such as Baidu OCR.

Posted: Fri Feb 24, 2023 5:36 am
by softschool
https://drive.google.com/file/d/1525PS8Sth97vWBed4TWhx9ieTUA9dtD-/view?usp=share_link

This is a PDF document generated by AutoCAD, and because it uses SHX fonts, even ABBYY cannot recognize it. Can you contact ABBYY to add recognition of this font to improve the recognition of your OCR engine?

Because there are so many PDF documents in this AutoCAD format, our translation industry often has to face this kind of PDF documents. It is very difficult to translate and typesetting. The most troublesome thing is the work of converting OCR into real text. I hope you can provide The solution for this!

Re: Can you directly call many online OCRs, such as Baidu OCR.

Posted: Fri Feb 24, 2023 3:46 pm
by Paul - PDF-XChange
Hi softschool,

that is indeed a heavy OCR job! I will pass this on to the team to see what they think we or ABBYY can do.

warm regards

Re: Can you directly call many online OCRs, such as Baidu OCR.

Posted: Sat Jun 01, 2024 7:06 pm
by Jensen Head
I came across an interesting article "OCR in 2024: Benchmarking Text Extraction/Capture Accuracy" [1], and instead of creating a new thread, I decided to post a link to it here in a thread that discusses alternative OCR engines and cloud OCR services.
_
Overall_1-612x304.png
Category 1 – Web page screenshots that include texts: This category includes screenshots from random Wikipedia pages and Google search results with random queries.
Category 2 – Handwriting: This category includes random photos that include different handwriting styles.
Category 3 – Receipts, invoices, and scanned contracts: This category includes a random collection of receipts, handwritten invoices, and scanned insurance contracts collected from the internet.
I was impressed by the quality of Amazon Textract and Google Cloud Platform Vision API in the case of handwritten text recognition, for which even Abbyy FineReader 16 is not suitable (only Microsoft Azure Computer Vision API is worse than it). OCR form recognition from Amazon and Google also does a better job than the latest version of Abbyy's engine.

[1] https://research.aimultiple.com/ocr-accuracy/

Re: Can you directly call many online OCRs, such as Baidu OCR.

Posted: Mon Jun 03, 2024 8:31 am
by Stefan - PDF-XChange
Hello Jensen Head,

Thanks for sharing that review with us.
Yes ABBYY's engine is more optimized towards recognizing typed text than hand written one, so if you have lots of documents with handwritten text in them you will probably need to have a specialized tool for that. Overall - given that in general most documents that need OCRing are already not handwritten - we are happy with how the OCR engines we use and offer to you are handling the tasks at hand!

Kind regards,
Stefan

Re: Can you directly call many online OCRs, such as Baidu OCR.

Posted: Wed Jul 02, 2025 7:17 am
by Jensen Head
The piggy bank of interesting solutions used in competitive products includes the ability to connect cloud recognition services such as OcrWebService (ocrwebservice.com) [1] and OCR Space (ocr.space) [2], which provide APIs for this, to a local application. The user decides whether to use them in their work or not (just as in the case of Adobe products, they decide whether to use Adobe's paid cloud infrastructure, including neurogenerative capabilities, or not).
͏
[1] Free month of use upon first registration. Limit 25 pages per day.
[2] 25,000 free pages per month. Daily limit 500 pages.

Can you directly call many online OCRs, such as Baidu OCR.

Posted: Wed Jul 02, 2025 4:32 pm
by Daniel - PDF-XChange
Hello Jensen Head,

Thank you for that, I am not really sure what you want us to do with this information however.

Kind regards,