OCR on which layer?

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

winkelmann
User
Posts: 88
Joined: Sun Jun 09, 2013 8:52 am

OCR on which layer?

Post by winkelmann »

I use the OCR intensively. Nut sometimes the option should get changed and the OCR made again.
So, it would better to delete the "wrong" text. But how?
I Know there shall be a layer., in PDF Tools the explanation shows it, too. But I cannot find this layer!
Goetz
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 11566
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR on which layer?

Post by Daniel - PDF-XChange »

Hello, winkelmann

Layers function differently in PDF, for the purpose of OCR, there is simply page objects, and comments. Comments cannot be affected by a page content tool like the OCR function, so if the problematic text is a "comment" (selectable with the "select comments" tool) You will need to "Flatten" the comments before you can run OCR on them (which in many cases, will actually place the text as base content items, and negate your needs to use the OCR tool at all.
In other cases, the "text" may be shapes/images, in which case it is not selectable as a text object at all, simply running our "Enhanced OCR" (Part of the Editor Plus) and choosing to create "editable text and images" from the OCR dialog will overwrite these objects.
Finally, there is the case where the page text is "invisible, searchable" text. This happens when a less powerful OCR has been run on the document at any point in the past. It is generally advised to manually remove these text items, as it is not always possible to perfectly remove all of them and they can negatively impact the OCR process (you would manually remove them with the "Edit > text" function on the Home tab, then select and press delete). If you do not wish to manually perform this step, you can run OCR while the items are still there, by disabling these options:
image.png
This should have the effect of allowing the OCR to run over the existing text, and attempt to replace it (note that depending on the exact positioning of the text, it is possible that you will instead get two overlapping text layers in some areas).

Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Willy Van Nuffel
User
Posts: 2769
Joined: Wed Jan 18, 2006 12:10 pm

Re: OCR on which layer?

Post by Willy Van Nuffel »

Some additional info that might help...

If it goes about a PDF that has been generated by a scanner (one large image per page and no text at all) and then, in a second step, OCR has been applied to make it searchable, then you can use the following method to remove the OCR-text:
- activate the Content-pane, via the View-ribbon > Panes
- in the Content-pane, click the Options... button, followed by Select > Text
- press the Delete-button on your keyboard

The OCR-text has now been removed and starting from this moment you can re-run OCR with the method and options that you like.

Kind regards.
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 11566
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR on which layer?

Post by Daniel - PDF-XChange »

Hello, Willy Van Nuffel

Excellent point, thank you for that willy, that is very helpful!

Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
winkelmann
User
Posts: 88
Joined: Sun Jun 09, 2013 8:52 am

Re: OCR on which layer?

Post by winkelmann »

I use enhanced OCR in PDF Exchange Editor Plus. Ans the was as described above (unmarked options as shown. Anyway, sometimes the text of the older try remains, too.
So, I would like to mark this text and delete. But I can mark and copy but not delete.
Is there a secret layer because I cannot find in the list of layers …?

PS: The last method I do not understand. My view looks like this:
grafik.png
You do not have the required permissions to view the files attached to this post.
Goetz
Willy Van Nuffel
User
Posts: 2769
Joined: Wed Jan 18, 2006 12:10 pm

Re: OCR on which layer?

Post by Willy Van Nuffel »

The result of the OCR should be visible in the Content-pane (not in the Layers -pane).

By clicking the tiny white triangle for a given page (in the Content-pane), you can open its content.

OCR-text is part of the base-content of the PDF-document, there is not separate layer for that.

If this information does not help you any further, then it might be good to post examples from before and after applying Enhanced OCR, with an indication of what you would like to be able to delete.

Kind regards.
winkelmann
User
Posts: 88
Joined: Sun Jun 09, 2013 8:52 am

Re: OCR on which layer?

Post by winkelmann »

OK, thanks.
Just the moment I found it.
Goetz
Willy Van Nuffel
User
Posts: 2769
Joined: Wed Jan 18, 2006 12:10 pm

Re: OCR on which layer?

Post by Willy Van Nuffel »

:-)
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19913
Joined: Mon Jan 12, 2009 8:07 am

OCR on which layer?

Post by Stefan - PDF-XChange »

:)