Search for PDFs that have not been OCR'd

Please post any requests or ideas you may have for new features for the end User Version of PDF-Tools here.

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange, Tracker - Clarion Support, John - Tracker Supp, Support Staff, moderators

ragkag
User
Posts: 17
Joined: Sat Jun 10, 2023 1:38 pm

Search for PDFs that have not been OCR'd

Post by ragkag »

Is there a way to search PDFs and indicate which ones have not been OCR'd
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 12937
Joined: Wed Jan 03, 2018 6:52 pm

Re: Search for PDFs that have not been OCR'd

Post by Daniel - PDF-XChange »

Hello, ragkag

In PDF-Tools, the "OCR pages" action offers a checkbox to skip documents which contain text objects. While this is not a catch all (for example, it would skip a scanned document that has had 1 page inserted which contains base some content text) it is able to avoid reprocessing documents that do already have text in place.
image.png
Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
[email protected]
ragkag
User
Posts: 17
Joined: Sat Jun 10, 2023 1:38 pm

Re: Search for PDFs that have not been OCR'd

Post by ragkag »

I have about 20000+ documents. Will PDFTools handle this.
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19942
Joined: Mon Jan 12, 2009 8:07 am

Re: Search for PDFs that have not been OCR'd

Post by Stefan - PDF-XChange »

Hello ragkag,

It should, however please try with e.g. 1000-2000 files at a time, so in case anything happens - you do not need to wait for Tools to process all 20 000 files to see there are some errors.

Kind regards,
Stefan