Very slow processing speed

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: Daniel - PDF-XChange, PDF-XChange Support, Vasyl - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

Post Reply
Loki@99
User
Posts: 558
Joined: Sat Dec 16, 2023 11:09 am

Very slow processing speed

Post by Loki@99 »

Hi,

For some reason, OCR processing is very slow with this file (40 minutes 15 seconds on my device with only PDFXCE running, nothing in background)
File sample_Slow OCR.pdf
(49.07 MiB) Downloaded 46 times

Device specification
- PDFXCE 10.3.1 build 387
- CPU : Intel Core i5 i5-1130G7
  • 4 Cores/8 Threads
  • Base frequency : 1.80 Ghz / Turbo frequency : 4.00 Ghz
- RAM : 16 Go DDR4 at 3733 Mhz
- SSD NVME Gen 3

I'm aware that OCR can be a heavy task and that my device isn't a high-end one but it doesn't take that much time when processing other pretty similar PDF (I mean the pages layout).

PDFXCE OCR settings
OCR engine : Enhanced (FineReader)
image.png

I wonder if there is an area for improvement at your side.

Thanks for investigating,
Major Stylus topics
- RemoveAnnotationsWithEraser T#6903
- MiniPopupMenuOnTextSelection T#6894
- AbnormalSpikes forum.pdf-xchange.com/viewtopic.php?p=179935&hilit=spikes#p179935
- ForceEraserPreview forum.pdf-xchange.com/viewtopic.php?t=42380
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 10910
Joined: Wed Jan 03, 2018 6:52 pm

Re: Very slow processing speed

Post by Daniel - PDF-XChange »

Hello, Loki@99

This is a very heavy document, with some fuzzyness and a decent amount of blemishes, as well as being entirely image based (thankfully the images are not particularly high resolution). All of these items require extra processing to be handled properly by the OCR and so in cases like this, OCR can see extended processing time.
I will pass this along to the Dev team, but I expect this will be one of the cases where overnight actions will not have a big effect in the next release, and it will just see gradual improvements to processing speed over time instead.

I will say though, that selecting "high" accuracy is not the correct choice in this case.
Auto should almost always be used since it determines the correct mode to use on a region by region basis, but the accuracy setting defines the quality of the document, not the OCR being performed. Since this is a fuzzy file, if manually controlling it, you should actually be using medium or even low accuracy for this file. This will avoid over-processing and over-analyzing the page content which result in slower speeds and more "extraneous" characters in improper locations.

Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Post Reply