Hi,
I'm trying to fix the OCR on this PDF file which likely had an encoding issue (I don't know how the original OCR was performed).
For the "fix", I tried to Rasterize the file (so that I can OCR with PDFXCE after) with the following tweaks in order to have a good quality
- Compression : JPEG
- JPEG Quality : High
- 300 DPI
The issue is that after the Rasterize action, the file size became very large (537MB).
I'm aware that it is intended as the PDF will exclusively contain high definition images.
I tried to Save as Optimized with the "Standard" profile but unfortunately the file size is still way larger (169MB) than the original file (58MB) and the quality has slightly deteriorated.
Maybe you have a solution for my issue,
Thanks for investigating,
Trying to fix a PDF OCR encoding issue SOLVED
Moderators: Daniel - PDF-XChange, PDF-XChange Support, Vasyl - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
Trying to fix a PDF OCR encoding issue
Major Stylus topics
- RemoveAnnotationsWithEraser T#6903
- MiniPopupMenuOnTextSelection T#6894
- AbnormalSpikes forum.pdf-xchange.com/viewtopic.php?p=179935&hilit=spikes#p179935
- ForceEraserPreview forum.pdf-xchange.com/viewtopic.php?t=42380
- RemoveAnnotationsWithEraser T#6903
- MiniPopupMenuOnTextSelection T#6894
- AbnormalSpikes forum.pdf-xchange.com/viewtopic.php?p=179935&hilit=spikes#p179935
- ForceEraserPreview forum.pdf-xchange.com/viewtopic.php?t=42380
- Daniel - PDF-XChange
- Site Admin
- Posts: 10921
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Trying to fix a PDF OCR encoding issue
Hello, Loki@99
There should be no need to rasterize the page in this situation. The only time rasterizing is requires before OCR is when you only want to generate "searchable" text, but with to leave the facsimile of the original scanned pages intact.
If you are looking to enable editing for the text, you can simply uncheck the "ignore text" option to get the same effect without creating a number of large images in the file: This will allow the OCR to replace the existing text on the page, with a new editable text layer.
Kind regards,
There should be no need to rasterize the page in this situation. The only time rasterizing is requires before OCR is when you only want to generate "searchable" text, but with to leave the facsimile of the original scanned pages intact.
If you are looking to enable editing for the text, you can simply uncheck the "ignore text" option to get the same effect without creating a number of large images in the file: This will allow the OCR to replace the existing text on the page, with a new editable text layer.
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: Trying to fix a PDF OCR encoding issue
Hi,
Fortunately, uncheking Ignore existing text on page and selecting Searchable Image output settings fixed the issue.
There's still content left from the old OCR but it's no big deal as the search feature works now.
Old content
Thanks for your assistance,
Well, that's exactly the reason why I'm trying to fix that PDF OCR.The only time rasterizing is requires before OCR is when you only want to generate "searchable" text, but with to leave the facsimile of the original scanned pages intact
Fortunately, uncheking Ignore existing text on page and selecting Searchable Image output settings fixed the issue.
There's still content left from the old OCR but it's no big deal as the search feature works now.
Old content
Thanks for your assistance,
Major Stylus topics
- RemoveAnnotationsWithEraser T#6903
- MiniPopupMenuOnTextSelection T#6894
- AbnormalSpikes forum.pdf-xchange.com/viewtopic.php?p=179935&hilit=spikes#p179935
- ForceEraserPreview forum.pdf-xchange.com/viewtopic.php?t=42380
- RemoveAnnotationsWithEraser T#6903
- MiniPopupMenuOnTextSelection T#6894
- AbnormalSpikes forum.pdf-xchange.com/viewtopic.php?p=179935&hilit=spikes#p179935
- ForceEraserPreview forum.pdf-xchange.com/viewtopic.php?t=42380
- Stefan - PDF-XChange
- Site Admin
- Posts: 19807
- Joined: Mon Jan 12, 2009 8:07 am
- Contact:
Re: Trying to fix a PDF OCR encoding issue
Hello Loki@99,
You could delete the previous OCR result first, and then run our OCR - that way the wrong encoding text will be removed and not mess with your content pane.
Kind regards,
Stefan
You could delete the previous OCR result first, and then run our OCR - that way the wrong encoding text will be removed and not mess with your content pane.
Kind regards,
Stefan
Re: Trying to fix a PDF OCR encoding issue
Hi Stefan,
GIF - Delete the wrong encoding
Unfortunately, it's not possible as it will delete all data.Tracker Supp-Stefan wrote: ↑Tue Aug 06, 2024 11:18 am You could delete the previous OCR result first, and then run our OCR - that way the wrong encoding text will be removed and not mess with your content pane.
GIF - Delete the wrong encoding
Major Stylus topics
- RemoveAnnotationsWithEraser T#6903
- MiniPopupMenuOnTextSelection T#6894
- AbnormalSpikes forum.pdf-xchange.com/viewtopic.php?p=179935&hilit=spikes#p179935
- ForceEraserPreview forum.pdf-xchange.com/viewtopic.php?t=42380
- RemoveAnnotationsWithEraser T#6903
- MiniPopupMenuOnTextSelection T#6894
- AbnormalSpikes forum.pdf-xchange.com/viewtopic.php?p=179935&hilit=spikes#p179935
- ForceEraserPreview forum.pdf-xchange.com/viewtopic.php?t=42380
- Stefan - PDF-XChange
- Site Admin
- Posts: 19807
- Joined: Mon Jan 12, 2009 8:07 am
- Contact:
Re: Trying to fix a PDF OCR encoding issue SOLVED
Hello Loki@99,
Ahh! So the original image pixels have already been removed!
Yes in that case what you did above should help!
Kind regards,
Stefan
Ahh! So the original image pixels have already been removed!
Yes in that case what you did above should help!

Kind regards,
Stefan