How to improve OCR performance
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
-
CFCF
- User
- Posts: 1
- Joined: Mon Dec 23, 2019 5:53 am
How to improve OCR performance
All,
I own a powerful 8 core (16 with hyper threading) Win10 64 bit PC with 32 GB of RAM whose power I'd like to employ for OCR.
I've just upgraded my installation to PDF Exchange Editor Plus V8 Build 335.0 with enhanced OCR plugin.
No matter what settings I chose in the OCR dialog or in Settings/Performance (16 threads), CPU consumption in Win10 task manager doesn't rise beyond 35% during OCR.
OCR of larger PDF's should be perfect for parallelization so I'd hope to find a way how the OCR plugin makes better use of my compute resources.
Thanks for your insights
Christoph
I own a powerful 8 core (16 with hyper threading) Win10 64 bit PC with 32 GB of RAM whose power I'd like to employ for OCR.
I've just upgraded my installation to PDF Exchange Editor Plus V8 Build 335.0 with enhanced OCR plugin.
No matter what settings I chose in the OCR dialog or in Settings/Performance (16 threads), CPU consumption in Win10 task manager doesn't rise beyond 35% during OCR.
OCR of larger PDF's should be perfect for parallelization so I'd hope to find a way how the OCR plugin makes better use of my compute resources.
Thanks for your insights
Christoph
-
Stefan - PDF-XChange
- Site Admin
- Posts: 19930
- Joined: Mon Jan 12, 2009 8:07 am
Re: How to improve OCR performance
Hello CHristoph,
I am checking with colleagues from the dev team to see if the EOCR engine is affected by these settings, and if not - what can be done.
Season's greetings,
Stefan
I am checking with colleagues from the dev team to see if the EOCR engine is affected by these settings, and if not - what can be done.
Season's greetings,
Stefan
-
Vasyl - PDF-XChange
- Site Admin
- Posts: 2488
- Joined: Thu Jun 30, 2005 4:11 pm
Re: How to improve OCR performance
Hi Christoph.
We found an issue that limits the number of threads that can be used for OCR, on x64 systems. We will fix it in the upcoming build.
Sorry for the inconvenience and thanks for the report.
Cheers.
We found an issue that limits the number of threads that can be used for OCR, on x64 systems. We will fix it in the upcoming build.
Sorry for the inconvenience and thanks for the report.
Cheers.
PDF-XChange Co. LTD (Project Developer)
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
-
Timur Born
- User
- Posts: 885
- Joined: Tue Jun 26, 2012 1:50 pm
Re: How to improve OCR performance
I only just noticed that I was still using 334, which was limited in its number of OCR threads (3 full load threads maximum). Just tested 336 and happy to say that it makes full use of all my CPU cores now. It creates more threads than CPU cores, which may or may not be intentional? But in the end it speeds up OCR considerably.
-
Timur Born
- User
- Posts: 885
- Joined: Tue Jun 26, 2012 1:50 pm
Re: How to improve OCR performance
Unfortunately with "Fine Page Content" the "Rasterizing" and especially "Applying results of recognition" parts seem to be mostly single-threaded and correspondingly can take a long time to complete.
-
Stefan - PDF-XChange
- Site Admin
- Posts: 19930
- Joined: Mon Jan 12, 2009 8:07 am
Re: How to improve OCR performance
Hello Timur,
I will check with Vasyl if there can be any improvements in both of those steps and we will post any further news as soon as we get them!
Cheers,
Stefan
I will check with Vasyl if there can be any improvements in both of those steps and we will post any further news as soon as we get them!
Cheers,
Stefan
-
Stefan - PDF-XChange
- Site Admin
- Posts: 19930
- Joined: Mon Jan 12, 2009 8:07 am
Re: How to improve OCR performance
Hello Timur,
Our devs said that they will investigate what can be done for those two steps of the OCR process, and I've made a ticket for it:
#5101: OCR Performance optimisations for "Fine Page Content" and "Rasterizing" steps of the process
So we will post again here as soon as there are any further news.
Regards,
Stefan
Our devs said that they will investigate what can be done for those two steps of the OCR process, and I've made a ticket for it:
#5101: OCR Performance optimisations for "Fine Page Content" and "Rasterizing" steps of the process
So we will post again here as soon as there are any further news.
Regards,
Stefan
-
DrStoertebecker
- User
- Posts: 1
- Joined: Mon Feb 15, 2021 8:50 am
Re: How to improve OCR performance
Same problem here:
my CPU is a 16-core Ryzen 9 59050x with 32GB of RAM. I am running PDF-XChange Editor Plus (Version: 9.0 (Build 352.0) (Feb 4 2021; 17:55:44) 64bit) on Windows 10 Home (19041.1.amd64fre.vb_release.191206-1406).
When using OCR multi-threading is pretty much non-existent. Doing OCR on large files with several hundred pages sometimes takes over half an hour. CPU-utilization idles at around 5% all the time with only one core (constantly changing) being used at around 30-80%.
My first instinct was that the software is not very good at distributing the pages within a single document over different threads. So I tried OCR on a large number of files simultaniously using batch-processing in "PDF-tools". Same problem: CPU-utilization is around 5% and OCR takes forever.
I also tried changing multi-threading in the options from "automatic" to "16 cores" - no effect.
The weird thing is: Every once in a while with some files OCR does suddenly use 16 cores/32 threads at around 95% core-usage and everything works extremely fast and smooth. However, I could not establish any rules behind this behaviour so far (depending on file size or similar). It all seems quite random to me.
For the record: The problem is most annoying when I am using OCR because it does take forever to finish a job. But I have the impression that multi-threading does not work very well in general. For instance, when I am printing a large document to PDF using the "PDF X-Change Standard PDF printer" it also takes a very long time and CPU-utilization is mostly below 5% with only one core doing all the work.
I would be very grateful for a solution to this problem. Looking at my CPU and its extremely low utilization I assume I could cut the time for many jobs by over 90% if multi-threading would work properly.
Thanks in advance!
Sincerely,
Chris
my CPU is a 16-core Ryzen 9 59050x with 32GB of RAM. I am running PDF-XChange Editor Plus (Version: 9.0 (Build 352.0) (Feb 4 2021; 17:55:44) 64bit) on Windows 10 Home (19041.1.amd64fre.vb_release.191206-1406).
When using OCR multi-threading is pretty much non-existent. Doing OCR on large files with several hundred pages sometimes takes over half an hour. CPU-utilization idles at around 5% all the time with only one core (constantly changing) being used at around 30-80%.
My first instinct was that the software is not very good at distributing the pages within a single document over different threads. So I tried OCR on a large number of files simultaniously using batch-processing in "PDF-tools". Same problem: CPU-utilization is around 5% and OCR takes forever.
I also tried changing multi-threading in the options from "automatic" to "16 cores" - no effect.
The weird thing is: Every once in a while with some files OCR does suddenly use 16 cores/32 threads at around 95% core-usage and everything works extremely fast and smooth. However, I could not establish any rules behind this behaviour so far (depending on file size or similar). It all seems quite random to me.
For the record: The problem is most annoying when I am using OCR because it does take forever to finish a job. But I have the impression that multi-threading does not work very well in general. For instance, when I am printing a large document to PDF using the "PDF X-Change Standard PDF printer" it also takes a very long time and CPU-utilization is mostly below 5% with only one core doing all the work.
I would be very grateful for a solution to this problem. Looking at my CPU and its extremely low utilization I assume I could cut the time for many jobs by over 90% if multi-threading would work properly.
Thanks in advance!
Sincerely,
Chris
-
Stefan - PDF-XChange
- Site Admin
- Posts: 19930
- Joined: Mon Jan 12, 2009 8:07 am
Re: How to improve OCR performance
Hello DrStoertebecker,
On our last meeting with the devs, this subject was discussed, and our devs did tell me that we are currently looking at ways to indeed allow multi threading to work fully when performing compute heavy tasks like OCR. There are some things that need to be tested, and to ensure that this will not have negative impacts elsewhere, but we are definitely working on this multithreading and will have it out as soon as possible (but no specific ETA yet)!
Kind regards,
Stefan
On our last meeting with the devs, this subject was discussed, and our devs did tell me that we are currently looking at ways to indeed allow multi threading to work fully when performing compute heavy tasks like OCR. There are some things that need to be tested, and to ensure that this will not have negative impacts elsewhere, but we are definitely working on this multithreading and will have it out as soon as possible (but no specific ETA yet)!
Kind regards,
Stefan
-
Jensen Head
- User
- Posts: 874
- Joined: Mon Sep 13, 2021 8:12 am
Re: How to improve OCR performance
At the end of 2018, Abbyy released the ABBYY FineReader Engine Performance Guide document [1], which, in particular, contained:
From this I can assume that with large documents, sufficient RAM size and a large number of CPU threads, OCR performance is not significantly affected by the type of RAM, its frequency, the system bus, or the speed of the drive.
However, even with the latest version of PDF-XChange, I often encounter a situation of (tens of) minutes of waiting for OCR when in the process manager neither the CPU, nor the fullness of RAM, nor the intensity of exchange with drives exceeds 50%. I would be happy to upgrade my computer, but I don’t see what exactly is the bottleneck in the computer when using enhanced OCR?
[1] https://static1.abbyy.com/abbyycommedia/20728/abbyy-finereader-engine-performance-guide-en.pdf
[2] https://support.abbyy.com/hc/en-us/articles/360016579559-What-affects-OCR-processing-speed-in-SDK-products
[3] https://support.abbyy.com/hc/en-us/articles/360019087739-How-to-increase-the-performance-of-FineReader-PDF-on-a-multicore-CPU
[4] https://help.abbyy.com/en-us/finereaderserver/14/perf_guide/performancefrs/
Upd. I've been using build is 10.3.0.386 almost since its release date. I didn't notice any significant difference in OCR speed.
In the articles “What affects OCR processing speed in SDK products” [2], "How to increase the performance of FineReader PDF on a multicore CPU" [3], and "ABBYY FineReader Server 14 Performance" [4], Abbyy only praises multi-threaded recognition, but also does not touch on other hardware factors that affect OCR performance.Memory requirements
Parallel processing: 350 MB RAM x number of CPU cores + additional 450 MB RAM
Parallel processing of documents in Arabic, Chinese, Japanese or Korean: 850 MB x number of CPU cores + 750MB RAM
How to Increase the Overall Processing Speed in FineReader Engine
There are several possibilities to improve the performance of your system:
- Fine-tune the image preprocessing settings to deliver the highest document quality for the processing step.
- During the processing step, use one of the predefined processing profiles optimized for speed and the appropriate recognition mode -balanced or fast.
- Specify the correct recognition languages. Incorrect language can significantly slow down document processing. The more recognition languages are selected, the slower the speed of processing.
- Use the appropriate object (FRDocument or BatchProcessor) |and enable parallel processing.
- Specify appropriate parameters of analysis and recognition. For example, disable table detection and page orientation correction if images contain no tables and have correct page orientation.
- Omit the synthesis stage if the processed documents will be exported to TXT format or PDF Image Only format.
Use the Fast PDF Export Profile, when exporting the documents to the PDF format.- Use the special object (ExportFileWriter), which is designed for the export of very large multi-page documents into PDF format.
From this I can assume that with large documents, sufficient RAM size and a large number of CPU threads, OCR performance is not significantly affected by the type of RAM, its frequency, the system bus, or the speed of the drive.
However, even with the latest version of PDF-XChange, I often encounter a situation of (tens of) minutes of waiting for OCR when in the process manager neither the CPU, nor the fullness of RAM, nor the intensity of exchange with drives exceeds 50%. I would be happy to upgrade my computer, but I don’t see what exactly is the bottleneck in the computer when using enhanced OCR?
[1] https://static1.abbyy.com/abbyycommedia/20728/abbyy-finereader-engine-performance-guide-en.pdf
[2] https://support.abbyy.com/hc/en-us/articles/360016579559-What-affects-OCR-processing-speed-in-SDK-products
[3] https://support.abbyy.com/hc/en-us/articles/360019087739-How-to-increase-the-performance-of-FineReader-PDF-on-a-multicore-CPU
[4] https://help.abbyy.com/en-us/finereaderserver/14/perf_guide/performancefrs/
Upd. I've been using build is 10.3.0.386 almost since its release date. I didn't notice any significant difference in OCR speed.
Last edited by Jensen Head on Sat Jun 01, 2024 2:42 pm, edited 2 times in total.
-
Vasyl - PDF-XChange
- Site Admin
- Posts: 2488
- Joined: Thu Jun 30, 2005 4:11 pm
Re: How to improve OCR performance
Hi guys.
Obviously, it would be great for us to reproduce on our side what you have. And while this issue is hardware- and document-related (very likely) - so please provide the test document at least, if you can...
Note: the current official build is 10.3.0.386. If you are using an older version then please upgrade and try again. The issue may be addressed in the latest build...
Cheers.
Obviously, it would be great for us to reproduce on our side what you have. And while this issue is hardware- and document-related (very likely) - so please provide the test document at least, if you can...
Note: the current official build is 10.3.0.386. If you are using an older version then please upgrade and try again. The issue may be addressed in the latest build...
Cheers.
PDF-XChange Co. LTD (Project Developer)
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
-
Jensen Head
- User
- Posts: 874
- Joined: Mon Sep 13, 2021 8:12 am
Re: How to improve OCR performance
Test 4 OCR.pdf — https://drive.google.com/file/d/1L3vw0KvAWh2BHla6OFupbJr5nzDjgNyl/view (321 MB)Vasyl-Tracker Dev Team wrote: ↑Tue May 28, 2024 4:14 amprovide the test document at least, if you can.
Intel Core i7-3770K.avi — https://drive.google.com/file/d/1yb1_JXKy4YSJmuAxU3ZAf62sMhW4U22O/view (36 MB, half an hour, accelerated to 1 minute 30 seconds)
As you can see, active CPU usage (43 %) ends around 16:30, and then drops to 7—12 %, and remains at this level for at least another 15 minutes.
I used the following recognition options:
I sent you another example of this behavior on May 4th, 2024 at 19:36 by e-mail in a letter with the subject “PDF-XChange Editor Enhanced OCR Crashs”. Only then was PDF-XChange Editor Plus 10.2.1, build 385 (Enhanced OCR) used.Languages: Numbers, English
Accuracy: Auto
[ ] Detect skew of page content
[ ] Detect incorrect page rotation
[ ] Ignore text in graphics
[ ] Ignore company logos
[X] Ignore existing text on page
[X] Ignore comments on page
[X] Ignore form fields on page
Output Options
Type: Searchable Image
[ ] Fix content skew and incorrect page rotation
[ ] Draw lines for tables
[ ] Create a New Document
-
Stefan - PDF-XChange
- Site Admin
- Posts: 19930
- Joined: Mon Jan 12, 2009 8:07 am
Re: How to improve OCR performance
Hello Jensen Head,
Thanks for the sample files provided! I will pass those on to Vasyl and ask him to take a look!
p.s. Our devs asked me to make a ticket and add your sample files there - so that they can investigate and work on such improvements.
The ticket is #6950.
Kind regards,
Stefan
Thanks for the sample files provided! I will pass those on to Vasyl and ask him to take a look!
p.s. Our devs asked me to make a ticket and add your sample files there - so that they can investigate and work on such improvements.
The ticket is #6950.
Kind regards,
Stefan