I have a long term offline archive that includes some 20,000 PDF's and growing. Each month a few hundred new PDF's are added, some small number of them are added without OCR. I don't have control of the creation of the PDF's they are sent to me by donors.
While PDF Tools seems way too complicated and has a complex wall of function, I think I've worked out how to use it from batch to scan directories and add OCR to existing PDF's that do NOT have it.
Before I upgrade my existing license could somebody please confirm that PDF TOOLS can
1. be invoked entirely from the command line once setup (High accuracy, language choice, overwrite input etc.)
2. check an existing PDF that is NOT password protected has OCR, if yes exit, if NO
3. if edit protection is on, set it off
4. Run OCR against the pdf
5. Save the new PDF with the same name as the input file and overwrite.
If the answer to those is yes, then can I use the UI version of PDF-XChange Editor while running the cmdline PDF Tools or is the UI blocked while PDF Tools runs?
++Mark.
https://ctproduced.com
Can PDF Tools OCR and switch Edit on/off via CMD Line?
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Stefan - PDF-XChange
-
4mc
- User
- Posts: 74
- Joined: Tue Apr 27, 2021 12:42 am
-
Daniel - PDF-XChange
- Site Admin
- Posts: 12930
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Can PDF Tools OCR and switch Edit on/off via CMD Line?
Hello, 4mc
All of that would be possible, with 1 exception, and 1 important note.
The important note is that for #3, you do need to know the password which is in use on the document, or have the certificate which was used on the file installed locally. If you do not have either, we cannot remove the security from the file.
The exception is that you cannot switch the options on or off via CMD, you would need to make two different custom tools, one with editable text enabled, and one with searchable text enabled. Then run the two selectively on files that need one or the other (however, the OCR engine does by default ignore areas with existing page text, so if you only wanted to ignore regions that do not need OCR, that is built in).
Beyond that, Tools is a separate application from the Editor, the two can most definitely run independently. The only time you would encounter a conflict is when attempting to process a file which is actively open in the other software. An error may occur, such as a processing fail due to the file being in use, or saving in whichever ran last would overwrite and possible lead to losing the changes made by the first.
Kind regards,
All of that would be possible, with 1 exception, and 1 important note.
The important note is that for #3, you do need to know the password which is in use on the document, or have the certificate which was used on the file installed locally. If you do not have either, we cannot remove the security from the file.
The exception is that you cannot switch the options on or off via CMD, you would need to make two different custom tools, one with editable text enabled, and one with searchable text enabled. Then run the two selectively on files that need one or the other (however, the OCR engine does by default ignore areas with existing page text, so if you only wanted to ignore regions that do not need OCR, that is built in).
Beyond that, Tools is a separate application from the Editor, the two can most definitely run independently. The only time you would encounter a conflict is when attempting to process a file which is actively open in the other software. An error may occur, such as a processing fail due to the file being in use, or saving in whichever ran last would overwrite and possible lead to losing the changes made by the first.
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
[email protected]
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
[email protected]
-
4mc
- User
- Posts: 74
- Joined: Tue Apr 27, 2021 12:42 am
Re: Can PDF Tools OCR and switch Edit on/off via CMD Line?
Thanks, I think I understand. Just to clarify.
3. What I have here is a lot of pdf's that are saved as PDF 2A complaint. When I open them to manually OCR I'm prompted to EDIT - there is no password.
3. What I have here is a lot of pdf's that are saved as PDF 2A complaint. When I open them to manually OCR I'm prompted to EDIT - there is no password.
-
Daniel - PDF-XChange
- Site Admin
- Posts: 12930
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Can PDF Tools OCR and switch Edit on/off via CMD Line?
Hello, 4mc
Ah, yes removing specializaed PDF conformance is most certainly possible, my apologies for the misunderstanding there. You should have no trouble with that. This article may also help with setting up custom tools: https://www.pdf-xchange.com/knowledgeba ... -PDF-Tools
Kind regards,
Ah, yes removing specializaed PDF conformance is most certainly possible, my apologies for the misunderstanding there. You should have no trouble with that. This article may also help with setting up custom tools: https://www.pdf-xchange.com/knowledgeba ... -PDF-Tools
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
[email protected]
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
[email protected]
-
4mc
- User
- Posts: 74
- Joined: Tue Apr 27, 2021 12:42 am
Re: Can PDF Tools OCR and switch Edit on/off via CMD Line?
In case anyone stumbles across this forum entry at sometime in the future - I can confirm I'm very happy with how PDF-Tools can be configured to scan a given file/directory tree; test for OCR presence; if OCR is included, bypass the PDF and if it is not to scan and add OCR either saving a new PDF or replacing the existing PDF.
I ran a scan across a 4GB SSD containing 22,000 books and magazines and added OCR to nearly 800 PDF's. PDF-Tools ran for nearly 54-hours to do this, so be careful what you ask for, you just might get it!
I ran a scan across a 4GB SSD containing 22,000 books and magazines and added OCR to nearly 800 PDF's. PDF-Tools ran for nearly 54-hours to do this, so be careful what you ask for, you just might get it!
-
Sean - PDF-XChange
- Site Admin
- Posts: 1091
- Joined: Wed Sep 14, 2016 5:42 pm
Re: Can PDF Tools OCR and switch Edit on/off via CMD Line?
Sean Godley
Technical Writer
PDF-XChange Co LTD
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623
Technical Writer
PDF-XChange Co LTD
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623