PDFs with OCR: display problem for bookmarks text

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

Mirabella
User
Posts: 22
Joined: Sun Jan 28, 2024 5:37 pm

PDFs with OCR: display problem for bookmarks text

Post by Mirabella »

Hello!

This is a suggestion regarding a recurring problem with displaying bookmarks text in PDFs processed by OCR:

- When you create a bookmark with this type of PDF, the display of the bookmark text on the page is "cropped", it is partially hidden and sometimes even almost entirely (cf. image and, as an attachment, an example of a PDF with this problem).

Hence this suggestion: in the dialogue box, could there be an option to adjust the display of the bookmark by creating a margin above it (in millimetres)?

This would make reading easier and avoid the tedious task of having to modify the cropped view/destination of each bookmark one by one...

Thank you in advance for your help! :)

Kind regards
Mirabella
You do not have the required permissions to view the files attached to this post.
User avatar
Sean - PDF-XChange
Site Admin
Posts: 611
Joined: Wed Sep 14, 2016 5:42 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Sean - PDF-XChange »

Hi Mirabella,

This sounds like a very unusual bug - document text should never be obscured by the creation of a bookmark. Can you please break down the steps you are taking that is resulting in this issue, starting from when you perform OCR?

Kind regards,
Sean Godley
Technical Writer
PDF-XChange Co LTD
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623
Mirabella
User
Posts: 22
Joined: Sun Jan 28, 2024 5:37 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Mirabella »

Hi Sean,

Thanks for your prompt reply!

Here are the steps, which are actually quite standard:
1. Creation of the OCR or the PDF is already an OCR ;
2. Creation of bookmarks with the standard options (cf. image => font name/size, tolerance 0.3, no restriction, Allow multiline bookmark titles) ;
3. Part of the bookmark text is hidden.

I have only observed this result for PDFs with OCR.
I am attaching another example of a PDF (OCR) where the problem is even more pronounced: after creating the bookmarks (same settings as before), the text is almost entirely hidden, if not completely.

More generally, I think a margin option for displaying bookmarks (for all types of PDFs) would be very useful, as it would allow users to choose how they want their bookmarks to be displayed, with a more or less ‘spacious’ view depending on their preferences.

Thank you in advance for your help,

Mirabella
You do not have the required permissions to view the files attached to this post.
User avatar
Sean - PDF-XChange
Site Admin
Posts: 611
Joined: Wed Sep 14, 2016 5:42 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Sean - PDF-XChange »

Hi Mirabella,

So you're referring to the text in the Bookmarks Pane? That was not clear in your first post, which showed only images of the document itself.

Please also clarify which feature you are using to create the bookmarks, as there's a few ways to do that.

Many thanks,
Sean Godley
Technical Writer
PDF-XChange Co LTD
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623
Mirabella
User
Posts: 22
Joined: Sun Jan 28, 2024 5:37 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Mirabella »

Hi Sean,

Sorry if there was any ambiguity in my first message!

UPDATE: I have attached below a video clip describing the problem :
bookmark text (visibility).rar

===> The PDF with which I made the video capture :
Cropped display, example 2.pdf

I am not referring to the bookmarks panel, but to the destination on the PDF page AFTER clicking on a bookmark.

To be perfectly clear :
1. In the bookmarks panel, you click on a bookmark;
2. Then you look at the page, i.e. the destination where the bookmark takes you on the PDF page, and you see that the text on the page is barely visible (partially or almost entirely in some cases).

In the attached PDF file, the problem is really obvious.

Kind regards,

Mirabella
You do not have the required permissions to view the files attached to this post.
User avatar
Sean - PDF-XChange
Site Admin
Posts: 611
Joined: Wed Sep 14, 2016 5:42 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Sean - PDF-XChange »

Hi Mirabella,

Thanks - I understand now. Can you please clarify which feature you are using to create the bookmarks?

Note that you can edit the location of bookmarks at any time using the following steps:

1. Right click the bookmark in the bookmarks pane, and click Properties.

2. In the Properties pane, click the three dots on the right of the bookmark action. The Edit Action List dialog box will open:

image.png

3. Select the bookmark and click Edit. The Edit Action dialog box will open:

image(1).png

You can then determine the exact location of bookmarks as desired - I rectified the issue you were experiencing using the "Use Rectangle" option.

I hope that helps.
You do not have the required permissions to view the files attached to this post.
Sean Godley
Technical Writer
PDF-XChange Co LTD
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623
Mirabella
User
Posts: 22
Joined: Sun Jan 28, 2024 5:37 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Mirabella »

Hi Sean,

I am familiar with this method: its disadvantage is that it is very tedious, as it must be applied manually for each bookmark one by one, which takes a lot of time and makes the task unmanageable...

Hence this suggestion:
If, when creating bookmarks, there was an option to define a display margin for the bookmark, this would be set automatically in one go for all bookmarks!

Thanks for your help! :)

Mirabella
User avatar
Sean - PDF-XChange
Site Admin
Posts: 611
Joined: Wed Sep 14, 2016 5:42 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Sean - PDF-XChange »

Hi Mirabella,

Could you please clarify the feature/process that you are using to create the bookmarks?

Many thanks,
Sean Godley
Technical Writer
PDF-XChange Co LTD
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623
Mirabella
User
Posts: 22
Joined: Sun Jan 28, 2024 5:37 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Mirabella »

Hi Sean,

I use the method "Generate Bookmarks From Page Text".

In order to be as detailed as possible, I have attached a video capture and the PDF file used.

=> Video capture
Video capture - cropped display.rar

=> Test PDF file
Cropped display, example.pdf

I hope this is helpful!

Kind regards

Mirabella
You do not have the required permissions to view the files attached to this post.
Mirabella
User
Posts: 22
Joined: Sun Jan 28, 2024 5:37 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Mirabella »

Hi,

I am somewhat surprised: the topic is marked as resolved, but it is not... :?

Kind regards

Mirabella
User avatar
Sean - PDF-XChange
Site Admin
Posts: 611
Joined: Wed Sep 14, 2016 5:42 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Sean - PDF-XChange »

Hi Mirabella,

Thanks - I have passed this on to the development team so that they can take a look.

Kind regards,
Sean Godley
Technical Writer
PDF-XChange Co LTD
Sales: +1 (250) 324-1621
Fax: +1 (250) 324-1623
Mirabella
User
Posts: 22
Joined: Sun Jan 28, 2024 5:37 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Mirabella »

Hi Sean,

Thank you for your feedback and initiative!

Kind regards

Mirabella
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 12157
Joined: Wed Jan 03, 2018 6:52 pm

Re: PDFs with OCR: display problem for bookmarks text

Post by Daniel - PDF-XChange »

Hello, Mirabella,

The "Resolved" tag is something that can either be set by the topic creator (in this case, yourself) or one of us - which we only use when a situation is resolved, but extended discussion is going on that we do not need to be involved in. According to the logs, it seems that you did so shortly after posting one of your prior messages. If you did not mean to do so, I expect that was a misclick, but no need to worry, it happens more often than you think, and it was evidently quite easy to remove! :)

Now, moving onto the actual error at hand. As far as I can tell this appears to be resolved in the current release.

The reason for the "cutoff" you see on bookmark creation is that the OCR which was performed on the file is itself offset:
image.png
image(1).png
The bookmark creation happens based on the actual "text" location, the image background has no relevance here and bookmarks are unaware of it. Since the text that is present has a slight vertical offset, the Bookmark links match that text position - as is intended.

I have tested removing all of the text content that is already present in these files, and re-processing it with both of available our OCR engines (set to "searchable text) and the results appear to be nearly perfect. The text position is ideal to ensure that bookmarks are created at the necessary position that the text passage remains visible when you do so:
image(2).png
image(3).png
Can I ask you to try re-processing the files and confirm if the issue is still present?

Also, I should note - it is very much possible, likely even (given my test results and the lack of notable changes to both OCR engines), that the text original layer present in these files was not from our OCR engine at all - and by extension the issue was a bit of a red herring.
By default our OCR process will not process areas where text already exists, and it is quite common these days for Scanners to have their own OCR processing when scanning a file to PDF. Please ensure that you manually remove the existing invisible text layer (use "Home > Edit text" then drag a box around your pages to select and and delete all blocks), before you run the OCR process.
After that, try adding bookmarks again, and you should find there is no more "cutoff" with the new bookmarks.

Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com