Bulk OCR Existing Files in Folder SOLVED
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Stefan - PDF-XChange
- 
				bqxmprij
 - User
 - Posts: 173
 - Joined: Tue Dec 18, 2012 3:51 am
 
Bulk OCR Existing Files in Folder
I have a lot of pdf files I need to review in a folder and subfolders. I want to OCR anything in the folder and subfolders. I don't understand the save option in PDF-Tools. The "Save Document" part of the OCR tool doesn't seem to have the option to OCR each document and save without renaming or saving a new document. How do I OCR an existing file, save it, and move on to the next? How do I do that?
			
			
									
																
						- 
				Ovg
														 - User
 - Posts: 468
 - Joined: Tue Sep 05, 2017 4:56 pm
 
Re: Bulk OCR Existing Files in Folder
You do not have the required permissions to view the files attached to this post.
							
					Last edited by Ovg on Sun May 02, 2021 3:41 pm, edited 1 time in total.
									
			
													It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
			
						PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
- 
				bqxmprij
 - User
 - Posts: 173
 - Joined: Tue Dec 18, 2012 3:51 am
 
Re: Bulk OCR Existing Files in Folder
Ovg,
Thank you for your post. I agree. That is the window and the option in the bottom right. See how it will save a new file with an OCR name? I don't want that. I want PDF-Tools to open the file, OCR it, save it, and move on without creating new files or changing the file name.
			
			
									
																
						Thank you for your post. I agree. That is the window and the option in the bottom right. See how it will save a new file with an OCR name? I don't want that. I want PDF-Tools to open the file, OCR it, save it, and move on without creating new files or changing the file name.
- 
				Ovg
														 - User
 - Posts: 468
 - Joined: Tue Sep 05, 2017 4:56 pm
 
Re: Bulk OCR Existing Files in Folder SOLVED
You do not have the required permissions to view the files attached to this post.
			
													It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
			
						PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
- 
				bqxmprij
 - User
 - Posts: 173
 - Joined: Tue Dec 18, 2012 3:51 am
 
Re: Bulk OCR Existing Files in Folder
OVG, you are the best! For some reason it didn't think of just saving it with the same file name. 
Now, I am wondering why some documents didn't OCR, but that is another issue.
			
			
									
																
						Now, I am wondering why some documents didn't OCR, but that is another issue.
- 
				Ovg
														 - User
 - Posts: 468
 - Joined: Tue Sep 05, 2017 4:56 pm
 
Re: Bulk OCR Existing Files in Folder
Hi, bqxmprij
Check OCR settings:
You do not have the required permissions to view the files attached to this post.
			
													It's impossible to lead us astray for we don't care even to choose the way.
PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
			
						PDF-XChange PRO, 10.1.1 (Build 381) / W7 SP1 x64
- 
				Stefan - PDF-XChange
														 - Site Admin
 - Posts: 19919
 - Joined: Mon Jan 12, 2009 8:07 am
 
Re: Bulk OCR Existing Files in Folder
Hello Ovg,
Many thanks for the help! Indeed that might be the reason why some files were skipper for bqxmprij.
@bqxmprij - please let us know if OVG's suggestion helped you sort everything out?
Kind regards,
Stefan
			
			
									
																
						Many thanks for the help! Indeed that might be the reason why some files were skipper for bqxmprij.
@bqxmprij - please let us know if OVG's suggestion helped you sort everything out?
Kind regards,
Stefan
- 
				bqxmprij
 - User
 - Posts: 173
 - Joined: Tue Dec 18, 2012 3:51 am
 
Re: Bulk OCR Existing Files in Folder
Of the three options, I used "do not OCR but continue processing." I don't know why some were not OCR'd. 
I think there are 3 types of documents:
1. Documents with full text (e.g., computer generated pdfs) or any text.
2. Documents with no text (e.g., a scan).
3. Documents with both some text and some areas could be OCR'd but don't have text.
I think the options only contemplate 1 and 2. How do you OCR a document in category 3? In other words, I think we need (or let me know of) an option that reviews a document and OCRs non-text areas that could be OCR'd and ignores areas that already have text.
			
			
									
																
						I think there are 3 types of documents:
1. Documents with full text (e.g., computer generated pdfs) or any text.
2. Documents with no text (e.g., a scan).
3. Documents with both some text and some areas could be OCR'd but don't have text.
I think the options only contemplate 1 and 2. How do you OCR a document in category 3? In other words, I think we need (or let me know of) an option that reviews a document and OCRs non-text areas that could be OCR'd and ignores areas that already have text.
- 
				Daniel - PDF-XChange
														 - Site Admin
 - Posts: 11888
 - Joined: Wed Jan 03, 2018 6:52 pm
 
Re: Bulk OCR Existing Files in Folder
Hi, bqxmprij
To accomplish that, you would need to use the "ocr document" option (yes this does mean that all files, even those already containing text will be processed and cause the tool to take extra time), instead of the "do not OCR" option (which automatically skips any document containing any text based content at all).
With the OCR document function enabled, click "more options", and check off the options as you need: -The "skip pages" option will skip processing any page which contains any text based content at all, so enabling this would likely result un you skipping some pages in section 3.
-The "Ignore existing text on page" option will instead process the entire page, and skip areas which text already exists (meaning you will not get overlapping text). This process is the longest of the options presented to you, but will also give the most complete result.
Kind regards,
			
			
						To accomplish that, you would need to use the "ocr document" option (yes this does mean that all files, even those already containing text will be processed and cause the tool to take extra time), instead of the "do not OCR" option (which automatically skips any document containing any text based content at all).
With the OCR document function enabled, click "more options", and check off the options as you need: -The "skip pages" option will skip processing any page which contains any text based content at all, so enabling this would likely result un you skipping some pages in section 3.
-The "Ignore existing text on page" option will instead process the entire page, and skip areas which text already exists (meaning you will not get overlapping text). This process is the longest of the options presented to you, but will also give the most complete result.
Kind regards,
You do not have the required permissions to view the files attached to this post.
			
													Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
			
						PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
- 
				bqxmprij
 - User
 - Posts: 173
 - Joined: Tue Dec 18, 2012 3:51 am
 
Re: Bulk OCR Existing Files in Folder
So, operator error. 
Thank you!
			
			
									
																
						Thank you!
- 
				Daniel - PDF-XChange
														 - Site Admin
 - Posts: 11888
 - Joined: Wed Jan 03, 2018 6:52 pm
 
Bulk OCR Existing Files in Folder
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
			
						PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
- 
				Jensen Head
														 - User
 - Posts: 699
 - Joined: Mon Sep 13, 2021 8:12 am
 
Re: Bulk OCR Existing Files in Folder
I would add that at the moment the "Ignore existing text on page" option does not take into account invisible text, i.e. obtained using the "Output Options" / "Type: Searchable Image" setting. Thus, the application considers that the text in the images is not recognized, and recognizes it again, duplicating the already existing text blocks. This may be undesirable for two reasons. First, when copying several paragraphs and pasting them into another application, you can end up with consecutively repeating pieces of text. You may not notice this (very bad), or spend time fixing a broken text fragment (bad). Secondly, after being indexed by some search engines, instead of the text "ignore existing text on page" in the preview in the web search results, you will get "iiggnnoorree eexxiissttiinngg tteexxtt oonn ppaaggee".TrackerSupp-Daniel wrote: ↑Mon May 03, 2021 7:13 pmThe "Ignore existing text on page" option will instead process the entire page, and skip areas which text already exists (meaning you will not get overlapping text).
This problem was discussed in the topics "OCR option — ignore existing text on page" (#35211) and "Multiple OCR Runs —> Duplicate Text Objects" (#34214).
- 
				Daniel - PDF-XChange
														 - Site Admin
 - Posts: 11888
 - Joined: Wed Jan 03, 2018 6:52 pm
 
Re: Bulk OCR Existing Files in Folder
Hello, Jensen Head
Are you running the current latest release (366.0)? That issue should have been fixed already, I will need to run some tests, but last I checked, it was properly ignoring all areas of the page that contain text content, visible or otherwise.
[Update: I ran that test in the current release, it seems that the handling for this was changed, if there is invisible text, it is entirely replaced with the editable text. So no duplication occurs as you were worried about, but you are correct that it is not ignored, it is simply replaced. It seems only visible text areas are actually ignored.]
Kind regards,
			
			
									
													Are you running the current latest release (366.0)? That issue should have been fixed already, I will need to run some tests, but last I checked, it was properly ignoring all areas of the page that contain text content, visible or otherwise.
[Update: I ran that test in the current release, it seems that the handling for this was changed, if there is invisible text, it is entirely replaced with the editable text. So no duplication occurs as you were worried about, but you are correct that it is not ignored, it is simply replaced. It seems only visible text areas are actually ignored.]
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
			
						PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
- 
				Jensen Head
														 - User
 - Posts: 699
 - Joined: Mon Sep 13, 2021 8:12 am
 
Re: Bulk OCR Existing Files in Folder
You're right, in version 366.0, recognizing a "Searchable Image" document with the "Ignore existing text on page" checkbox disabled does not result in duplicate text blocks. Thank you!
			
			
									
																
						- 
				Dimitar - PDF-XChange
														 - Site Admin
 - Posts: 2608
 - Joined: Mon Jan 15, 2018 9:01 am
 
- 
				Capt. Michael
 - User
 - Posts: 2
 - Joined: Fri Oct 03, 2025 7:44 am
 
Re: Bulk OCR Existing Files in Folder
I found this thread while trying to find out how to use the function in my other big name PDF software "recognize text in multiple files using ocr" so I can select dozens of files in a folder and perform OCR on all of them without combining them or creating a porfolio.  The version of PDF-XChange Editor Plus v. 10.7.3.401 I am in 7-day evaluation does not look like the above pictures.  Can you show me how to perform OCR on many files or a entire folder of files with one command?
			
			
									
																
						- 
				Willy Van Nuffel
 - User
 - Posts: 2782
 - Joined: Wed Jan 18, 2006 12:10 pm
 
Re: Bulk OCR Existing Files in Folder
Hello,
For bulk operations on PDF-files, you will have to use PDF-XChange "PDF-Tools", not "PDF-XChange Editor".
See: https://www.pdf-xchange.com/product/pdf-tools
More information about how to OCR pages is available in the on-line manual.
See: https://help.pdf-xchange.com/pdfxt10/ocr-pages_t.html
Kind regards.
			
			
									
																
						For bulk operations on PDF-files, you will have to use PDF-XChange "PDF-Tools", not "PDF-XChange Editor".
See: https://www.pdf-xchange.com/product/pdf-tools
More information about how to OCR pages is available in the on-line manual.
See: https://help.pdf-xchange.com/pdfxt10/ocr-pages_t.html
Kind regards.
- 
				Dimitar - PDF-XChange
														 - Site Admin
 - Posts: 2608
 - Joined: Mon Jan 15, 2018 9:01 am
 
Re: Bulk OCR Existing Files in Folder
Thanks for the input, Willy,
Indeed the product needed for batch operations is PDF Tools.
			
			
									
																
						Indeed the product needed for batch operations is PDF Tools.
- 
				Capt. Michael
 - User
 - Posts: 2
 - Joined: Fri Oct 03, 2025 7:44 am
 
Re: Bulk OCR Existing Files in Folder
Thank you for the information on PDF-Tools.  I looked at it and seems it comes with PDF-XChange Editor.  Is it possible to get ti with PDF-XChange Editor Plus?  The primary reason I want this is for bulk OCR text recognition in multiple files so I want the superior text recognition of the PDF-XChange Editor Plus version and the multiple file capabilities of the PDF-Tools.  Would I need to buy the PDF-Tools and PDF-XChange Editor Plus separately an if so will they work together?
			
			
									
																
						- 
				rakunavi
														 - User
 - Posts: 1925
 - Joined: Sat Sep 11, 2021 5:04 am
 
Re: Bulk OCR Existing Files in Folder
Hello Capt. Michael,
PDF-XChange Editor PRO should meet your requirements.
rakunavi
			
			
									
													PDF-XChange Editor PRO should meet your requirements.
- https://www.pdf-xchange.com/product/pdf-xchange-pro
(PDF-XChange Editor Plus) + (PDF-Tools) + (PDF-XChange Printer Standard) = PDF-XChange Editor PRO 
rakunavi
TOP desires for PDFXCE
forum.pdf-xchange.com/viewtopic.php?t=39665 LassoTool
forum.pdf-xchange.com/viewtopic.php?t=38554 CmtGarbled
forum.pdf-xchange.com/viewtopic.php?t=37353 FulScrMultiMon
forum.pdf-xchange.com/viewtopic.php?t=41002 DisableTouchSelect
			
						forum.pdf-xchange.com/viewtopic.php?t=39665 LassoTool
forum.pdf-xchange.com/viewtopic.php?t=38554 CmtGarbled
forum.pdf-xchange.com/viewtopic.php?t=37353 FulScrMultiMon
forum.pdf-xchange.com/viewtopic.php?t=41002 DisableTouchSelect
- 
				Dimitar - PDF-XChange
														 - Site Admin
 - Posts: 2608
 - Joined: Mon Jan 15, 2018 9:01 am