Bulk OCR Existing Files in Folder  SOLVED

This Forum is for the use of End Users requiring help and assistance for Tracker Software's PDF-Tools.

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Stefan - PDF-XChange

bqxmprij
User
Posts: 173
Joined: Tue Dec 18, 2012 3:51 am

Bulk OCR Existing Files in Folder

Post by bqxmprij »

I have a lot of pdf files I need to review in a folder and subfolders. I want to OCR anything in the folder and subfolders. I don't understand the save option in PDF-Tools. The "Save Document" part of the OCR tool doesn't seem to have the option to OCR each document and save without renaming or saving a new document. How do I OCR an existing file, save it, and move on to the next? How do I do that?
User avatar
Ovg
User
Posts: 468
Joined: Tue Sep 05, 2017 4:56 pm

Re: Bulk OCR Existing Files in Folder

Post by Ovg »

20210502_183622.png
You do not have the required permissions to view the files attached to this post.
Last edited by Ovg on Sun May 02, 2021 3:41 pm, edited 1 time in total.
It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
bqxmprij
User
Posts: 173
Joined: Tue Dec 18, 2012 3:51 am

Re: Bulk OCR Existing Files in Folder

Post by bqxmprij »

Ovg,

Thank you for your post. I agree. That is the window and the option in the bottom right. See how it will save a new file with an OCR name? I don't want that. I want PDF-Tools to open the file, OCR it, save it, and move on without creating new files or changing the file name.
User avatar
Ovg
User
Posts: 468
Joined: Tue Sep 05, 2017 4:56 pm

Re: Bulk OCR Existing Files in Folder  SOLVED

Post by Ovg »

20210502_185334.png
You do not have the required permissions to view the files attached to this post.
It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
bqxmprij
User
Posts: 173
Joined: Tue Dec 18, 2012 3:51 am

Re: Bulk OCR Existing Files in Folder

Post by bqxmprij »

OVG, you are the best! For some reason it didn't think of just saving it with the same file name.

Now, I am wondering why some documents didn't OCR, but that is another issue.
User avatar
Ovg
User
Posts: 468
Joined: Tue Sep 05, 2017 4:56 pm

Re: Bulk OCR Existing Files in Folder

Post by Ovg »

bqxmprij wrote: Sun May 02, 2021 7:52 pm Now, I am wondering why some documents didn't OCR, but that is another issue.

Hi, bqxmprij
Check OCR settings:

20210503_102850.png
You do not have the required permissions to view the files attached to this post.
It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19919
Joined: Mon Jan 12, 2009 8:07 am

Re: Bulk OCR Existing Files in Folder

Post by Stefan - PDF-XChange »

Hello Ovg,

Many thanks for the help! Indeed that might be the reason why some files were skipper for bqxmprij.

@bqxmprij - please let us know if OVG's suggestion helped you sort everything out?

Kind regards,
Stefan
bqxmprij
User
Posts: 173
Joined: Tue Dec 18, 2012 3:51 am

Re: Bulk OCR Existing Files in Folder

Post by bqxmprij »

Of the three options, I used "do not OCR but continue processing." I don't know why some were not OCR'd.

I think there are 3 types of documents:
1. Documents with full text (e.g., computer generated pdfs) or any text.
2. Documents with no text (e.g., a scan).
3. Documents with both some text and some areas could be OCR'd but don't have text.

I think the options only contemplate 1 and 2. How do you OCR a document in category 3? In other words, I think we need (or let me know of) an option that reviews a document and OCRs non-text areas that could be OCR'd and ignores areas that already have text.
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 11888
Joined: Wed Jan 03, 2018 6:52 pm

Re: Bulk OCR Existing Files in Folder

Post by Daniel - PDF-XChange »

Hi, bqxmprij

To accomplish that, you would need to use the "ocr document" option (yes this does mean that all files, even those already containing text will be processed and cause the tool to take extra time), instead of the "do not OCR" option (which automatically skips any document containing any text based content at all).
With the OCR document function enabled, click "more options", and check off the options as you need:
image.png
-The "skip pages" option will skip processing any page which contains any text based content at all, so enabling this would likely result un you skipping some pages in section 3.
-The "Ignore existing text on page" option will instead process the entire page, and skip areas which text already exists (meaning you will not get overlapping text). This process is the longest of the options presented to you, but will also give the most complete result.

Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
bqxmprij
User
Posts: 173
Joined: Tue Dec 18, 2012 3:51 am

Re: Bulk OCR Existing Files in Folder

Post by bqxmprij »

So, operator error.

Thank you!
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 11888
Joined: Wed Jan 03, 2018 6:52 pm

Bulk OCR Existing Files in Folder

Post by Daniel - PDF-XChange »

:)
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
User avatar
Jensen Head
User
Posts: 699
Joined: Mon Sep 13, 2021 8:12 am

Re: Bulk OCR Existing Files in Folder

Post by Jensen Head »

TrackerSupp-Daniel wrote: Mon May 03, 2021 7:13 pmThe "Ignore existing text on page" option will instead process the entire page, and skip areas which text already exists (meaning you will not get overlapping text).
I would add that at the moment the "Ignore existing text on page" option does not take into account invisible text, i.e. obtained using the "Output Options" / "Type: Searchable Image" setting. Thus, the application considers that the text in the images is not recognized, and recognizes it again, duplicating the already existing text blocks. This may be undesirable for two reasons. First, when copying several paragraphs and pasting them into another application, you can end up with consecutively repeating pieces of text. You may not notice this (very bad), or spend time fixing a broken text fragment (bad). Secondly, after being indexed by some search engines, instead of the text "ignore existing text on page" in the preview in the web search results, you will get "iiggnnoorree eexxiissttiinngg tteexxtt oonn ppaaggee".

This problem was discussed in the topics "OCR option — ignore existing text on page" (#35211) and "Multiple OCR Runs —> Duplicate Text Objects" (#34214).
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 11888
Joined: Wed Jan 03, 2018 6:52 pm

Re: Bulk OCR Existing Files in Folder

Post by Daniel - PDF-XChange »

Hello, Jensen Head

Are you running the current latest release (366.0)? That issue should have been fixed already, I will need to run some tests, but last I checked, it was properly ignoring all areas of the page that contain text content, visible or otherwise.

[Update: I ran that test in the current release, it seems that the handling for this was changed, if there is invisible text, it is entirely replaced with the editable text. So no duplication occurs as you were worried about, but you are correct that it is not ignored, it is simply replaced. It seems only visible text areas are actually ignored.]

Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
User avatar
Jensen Head
User
Posts: 699
Joined: Mon Sep 13, 2021 8:12 am

Re: Bulk OCR Existing Files in Folder

Post by Jensen Head »

You're right, in version 366.0, recognizing a "Searchable Image" document with the "Ignore existing text on page" checkbox disabled does not result in duplicate text blocks. Thank you!
User avatar
Dimitar - PDF-XChange
Site Admin
Posts: 2608
Joined: Mon Jan 15, 2018 9:01 am

Bulk OCR Existing Files in Folder

Post by Dimitar - PDF-XChange »

:)
Capt. Michael
User
Posts: 2
Joined: Fri Oct 03, 2025 7:44 am

Re: Bulk OCR Existing Files in Folder

Post by Capt. Michael »

I found this thread while trying to find out how to use the function in my other big name PDF software "recognize text in multiple files using ocr" so I can select dozens of files in a folder and perform OCR on all of them without combining them or creating a porfolio. The version of PDF-XChange Editor Plus v. 10.7.3.401 I am in 7-day evaluation does not look like the above pictures. Can you show me how to perform OCR on many files or a entire folder of files with one command?
Willy Van Nuffel
User
Posts: 2782
Joined: Wed Jan 18, 2006 12:10 pm

Re: Bulk OCR Existing Files in Folder

Post by Willy Van Nuffel »

Hello,

For bulk operations on PDF-files, you will have to use PDF-XChange "PDF-Tools", not "PDF-XChange Editor".

See: https://www.pdf-xchange.com/product/pdf-tools

More information about how to OCR pages is available in the on-line manual.

See: https://help.pdf-xchange.com/pdfxt10/ocr-pages_t.html

Kind regards.
User avatar
Dimitar - PDF-XChange
Site Admin
Posts: 2608
Joined: Mon Jan 15, 2018 9:01 am

Re: Bulk OCR Existing Files in Folder

Post by Dimitar - PDF-XChange »

Thanks for the input, Willy,

Indeed the product needed for batch operations is PDF Tools.
Capt. Michael
User
Posts: 2
Joined: Fri Oct 03, 2025 7:44 am

Re: Bulk OCR Existing Files in Folder

Post by Capt. Michael »

Thank you for the information on PDF-Tools. I looked at it and seems it comes with PDF-XChange Editor. Is it possible to get ti with PDF-XChange Editor Plus? The primary reason I want this is for bulk OCR text recognition in multiple files so I want the superior text recognition of the PDF-XChange Editor Plus version and the multiple file capabilities of the PDF-Tools. Would I need to buy the PDF-Tools and PDF-XChange Editor Plus separately an if so will they work together?
User avatar
rakunavi
User
Posts: 1925
Joined: Sat Sep 11, 2021 5:04 am

Re: Bulk OCR Existing Files in Folder

Post by rakunavi »

Hello Capt. Michael,

PDF-XChange Editor PRO should meet your requirements.

Best regards,
rakunavi
TOP desires for PDFXCE
forum.pdf-xchange.com/viewtopic.php?t=39665 LassoTool
forum.pdf-xchange.com/viewtopic.php?t=38554 CmtGarbled
forum.pdf-xchange.com/viewtopic.php?t=37353 FulScrMultiMon
forum.pdf-xchange.com/viewtopic.php?t=41002 DisableTouchSelect
User avatar
Dimitar - PDF-XChange
Site Admin
Posts: 2608
Joined: Mon Jan 15, 2018 9:01 am

Bulk OCR Existing Files in Folder

Post by Dimitar - PDF-XChange »

:)