I use the OCR intensively. Nut sometimes the option should get changed and the OCR made again.
So, it would better to delete the "wrong" text. But how?
I Know there shall be a layer., in PDF Tools the explanation shows it, too. But I cannot find this layer!
OCR on which layer?
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
-
- Site Admin
- Posts: 11566
- Joined: Wed Jan 03, 2018 6:52 pm
Re: OCR on which layer?
Hello, winkelmann
Layers function differently in PDF, for the purpose of OCR, there is simply page objects, and comments. Comments cannot be affected by a page content tool like the OCR function, so if the problematic text is a "comment" (selectable with the "select comments" tool) You will need to "Flatten" the comments before you can run OCR on them (which in many cases, will actually place the text as base content items, and negate your needs to use the OCR tool at all.
In other cases, the "text" may be shapes/images, in which case it is not selectable as a text object at all, simply running our "Enhanced OCR" (Part of the Editor Plus) and choosing to create "editable text and images" from the OCR dialog will overwrite these objects.
Finally, there is the case where the page text is "invisible, searchable" text. This happens when a less powerful OCR has been run on the document at any point in the past. It is generally advised to manually remove these text items, as it is not always possible to perfectly remove all of them and they can negatively impact the OCR process (you would manually remove them with the "Edit > text" function on the Home tab, then select and press delete). If you do not wish to manually perform this step, you can run OCR while the items are still there, by disabling these options: This should have the effect of allowing the OCR to run over the existing text, and attempt to replace it (note that depending on the exact positioning of the text, it is possible that you will instead get two overlapping text layers in some areas).
Kind regards,
Layers function differently in PDF, for the purpose of OCR, there is simply page objects, and comments. Comments cannot be affected by a page content tool like the OCR function, so if the problematic text is a "comment" (selectable with the "select comments" tool) You will need to "Flatten" the comments before you can run OCR on them (which in many cases, will actually place the text as base content items, and negate your needs to use the OCR tool at all.
In other cases, the "text" may be shapes/images, in which case it is not selectable as a text object at all, simply running our "Enhanced OCR" (Part of the Editor Plus) and choosing to create "editable text and images" from the OCR dialog will overwrite these objects.
Finally, there is the case where the page text is "invisible, searchable" text. This happens when a less powerful OCR has been run on the document at any point in the past. It is generally advised to manually remove these text items, as it is not always possible to perfectly remove all of them and they can negatively impact the OCR process (you would manually remove them with the "Edit > text" function on the Home tab, then select and press delete). If you do not wish to manually perform this step, you can run OCR while the items are still there, by disabling these options: This should have the effect of allowing the OCR to run over the existing text, and attempt to replace it (note that depending on the exact positioning of the text, it is possible that you will instead get two overlapping text layers in some areas).
Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 2770
- Joined: Wed Jan 18, 2006 12:10 pm
Re: OCR on which layer?
Some additional info that might help...
If it goes about a PDF that has been generated by a scanner (one large image per page and no text at all) and then, in a second step, OCR has been applied to make it searchable, then you can use the following method to remove the OCR-text:
- activate the Content-pane, via the View-ribbon > Panes
- in the Content-pane, click the Options... button, followed by Select > Text
- press the Delete-button on your keyboard
The OCR-text has now been removed and starting from this moment you can re-run OCR with the method and options that you like.
Kind regards.
If it goes about a PDF that has been generated by a scanner (one large image per page and no text at all) and then, in a second step, OCR has been applied to make it searchable, then you can use the following method to remove the OCR-text:
- activate the Content-pane, via the View-ribbon > Panes
- in the Content-pane, click the Options... button, followed by Select > Text
- press the Delete-button on your keyboard
The OCR-text has now been removed and starting from this moment you can re-run OCR with the method and options that you like.
Kind regards.
-
- Site Admin
- Posts: 11566
- Joined: Wed Jan 03, 2018 6:52 pm
Re: OCR on which layer?
Hello, Willy Van Nuffel
Excellent point, thank you for that willy, that is very helpful!
Kind regards,
Excellent point, thank you for that willy, that is very helpful!
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 88
- Joined: Sun Jun 09, 2013 8:52 am
Re: OCR on which layer?
I use enhanced OCR in PDF Exchange Editor Plus. Ans the was as described above (unmarked options as shown. Anyway, sometimes the text of the older try remains, too.
So, I would like to mark this text and delete. But I can mark and copy but not delete.
Is there a secret layer because I cannot find in the list of layers …?
PS: The last method I do not understand. My view looks like this:
So, I would like to mark this text and delete. But I can mark and copy but not delete.
Is there a secret layer because I cannot find in the list of layers …?
PS: The last method I do not understand. My view looks like this:
You do not have the required permissions to view the files attached to this post.
Goetz
-
- User
- Posts: 2770
- Joined: Wed Jan 18, 2006 12:10 pm
Re: OCR on which layer?
The result of the OCR should be visible in the Content-pane (not in the Layers -pane).
By clicking the tiny white triangle for a given page (in the Content-pane), you can open its content.
OCR-text is part of the base-content of the PDF-document, there is not separate layer for that.
If this information does not help you any further, then it might be good to post examples from before and after applying Enhanced OCR, with an indication of what you would like to be able to delete.
Kind regards.
By clicking the tiny white triangle for a given page (in the Content-pane), you can open its content.
OCR-text is part of the base-content of the PDF-document, there is not separate layer for that.
If this information does not help you any further, then it might be good to post examples from before and after applying Enhanced OCR, with an indication of what you would like to be able to delete.
Kind regards.
-
- User
- Posts: 88
- Joined: Sun Jun 09, 2013 8:52 am
-
- Site Admin
- Posts: 19913
- Joined: Mon Jan 12, 2009 8:07 am