RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

DIV
User
Posts: 258
Joined: Fri Jun 23, 2017 1:47 am

RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by DIV »

I have a document with numerous Text Box objects (some with lengthy comments) and other Annotation objects (highlighting, strikethrough, stamp, ...) in it that I have added. Each one contains my username, "DIV", as the Author (apparently this is the username registered with Tracker). Admittedly, in my case, it's still partially anonymised. But in some instances it would be useful to:
  • remove all Author information from Annotation objects (and any others — form elements, say?)
  • whilst still retaining the content of those objects.
Currently this can be done manually by selecting some/all Annotations and editing the properties. However, that is somewhat inconvenient, and the bigger risk is that a user may overlook this potential leak of personal information — the user may look at the content, and find it already safe to share, but forget or not realise that the Author property exists.

image.png
The other existing alternative is to use the Sanitize Document feature to "Flatten Comments and Forms" which in my case appeared to entirely remove every annotation. That may not be desirable in all cases, as the annotations may contain (safe) content that needs to be preserved.

image(1).png

Therefore I suggest that one more tick-box option be added to the Sanitize Document feature.

—DIV
You do not have the required permissions to view the files attached to this post.
User avatar
Dimitar - PDF-XChange
Site Admin
Posts: 2637
Joined: Mon Jan 15, 2018 9:01 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by Dimitar - PDF-XChange »

Hello DIV,

Thank you for your suggestion.

I will pass it on to our developers for consideration.

I'm pretty sure they'll add such a feature if it doesn't break the PDF specs in some way.

Regards.
DIV
User
Posts: 258
Joined: Fri Jun 23, 2017 1:47 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by DIV »

Thanks, Dimitar.

As you say, compliance with the PDF specification(s) is required, but I doubt that this would pose a problem here, because the desired outcome is already available manually.
image(1).png
One quirk, incidentally, is that the 'blank' Author property is then shown greyed out as "<Not set>" in the Annotation Properties pane, whilst it's shown as "Anonym" in the Comments pane. I suppose either is OK, although I think I prefer the former convention. "Anonym" is a relatively uncommon word that could be misinterpreted by some as a person's name or username.

—DIV
You do not have the required permissions to view the files attached to this post.
DIV
User
Posts: 258
Joined: Fri Jun 23, 2017 1:47 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by DIV »

By the way, is it intended that "Flatten Comments and Forms" entirely removes every annotation (of any kind)? I'm not sure that users would necessarily expect that behaviour from the wording.

Firstly, "flatten" doesn't equate to "remove" in my thinking. I would think of "flatten" in relation to layers of images: the bottom image cannot be recovered if the overlaid image is "flattened" onto it; but the topmost image remains visible.

Secondly, "comments" might not be interpreted as broadly as "annotations", perhaps. (Actually, the usage in the GUI generally seems to sometimes use these terms interchangeably.)

So if that is indeed the intended behaviour for that option, then perhaps the phrasing could be tweaked.

—DIV
User avatar
Dimitar - PDF-XChange
Site Admin
Posts: 2637
Joined: Mon Jan 15, 2018 9:01 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by Dimitar - PDF-XChange »

Hi DIV,

Thank you for the additional input.

Of course, flattening cannot be associated with removal.

This means that any type of annotation will be converted to main content.

Do you have an example file whose contents were destroyed by this option?
DIV
User
Posts: 258
Joined: Fri Jun 23, 2017 1:47 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by DIV »

Leave it with me. I'll have to modify the existing file to make it suitable to post here.
—DIV
User avatar
Dimitar - PDF-XChange
Site Admin
Posts: 2637
Joined: Mon Jan 15, 2018 9:01 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by Dimitar - PDF-XChange »

Ok. Thanks.
DIV
User
Posts: 258
Joined: Fri Jun 23, 2017 1:47 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by DIV »

Hi, Dimitar.

As requested, I extracted one of the pages from the original document. I have replaced all of the sentences, but retaining the same object types ("text" or "text box") and the same general style & formatting as the original. I also stripped out a little of the metadata.

The resultant 'anonymised' excerpt is this file:
PDF_Sanitise_Test1.pdf
The blue text is in a Text Box, and there are numerous other Annotation objects present too. (You can see in the screenshot that for each of them the Author property was automatically set to "DIV".)
image(2).png
I then used the Sanitize Document feature with only the "Flatten Comments and Forms" option selected. (Ordinarily I would have selected most of the options, but I selected only one here, for clarity of testing.)
image.png
The output 'sanitised' version is this file:
PDF_Sanitise_Test1.sanitised.pdf
As you can see, all of the Annotation objects are gone (they're not visible, and they're not listed in the Comments pane either).
image(1).png
At one point I thought to myself that perhaps the comments became hidden because (although they were added last) they might have been obscured by the white rectangular input boxes that could have ended up in a higher layer that got flattened. However, as far as I can tell, there are not actually any white rectangular input boxes present — there're only a variety of individual straight black lines. Hence I discount that as a possible explanation.

—DIV
You do not have the required permissions to view the files attached to this post.
User avatar
Dimitar - PDF-XChange
Site Admin
Posts: 2637
Joined: Mon Jan 15, 2018 9:01 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by Dimitar - PDF-XChange »

Hi DIV,

Thank you for the file provided.

I was able to reproduce the problem on my end, so I will forward your report to our development team for further investigation.


Regards.
Mathew
User
Posts: 727
Joined: Thu Jun 19, 2014 7:30 pm

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by Mathew »

By the way, if you don't mind running a script, a simple workaround to erase all author names is to run this in the JavaScript console:

Code: Select all

for (let a of this.getAnnots()) a.author="";
User avatar
Dimitar - PDF-XChange
Site Admin
Posts: 2637
Joined: Mon Jan 15, 2018 9:01 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by Dimitar - PDF-XChange »

Thanks for the input, Mathew.
DIV
User
Posts: 258
Joined: Fri Jun 23, 2017 1:47 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by DIV »

Dimitar - Tracker Supp wrote: Mon Aug 07, 2023 1:56 pm I was able to reproduce the problem on my end, so I will forward your report to our development team for further investigation.
Thanks, Dimitar.
I would be interested to know whether there's anything specific in the file that could be causing the problem (nothing that I'm aware of so far!), or if it's just a general bug.
At least we know that it's not merely a quirk of my computer's set-up!
Mathew wrote: Mon Aug 07, 2023 11:47 pm By the way, if you don't mind running a script, a simple workaround to erase all author names is to run this in the JavaScript console:

Code: Select all

for (let a of this.getAnnots()) a.author="";
Thanks for the suggestion, Mathew.
For me, personally, remembering to delete the author names is the more difficult part. If it's added into the list of options for sanitising the document, that would be a great prompt/cue.

My 'manual' approach of selecting all annotations and then editing the collective Author property (blanking it) is not much more effort than running the script.
I could see more motivation to use that script in, say, some sort of automation code (e.g. auto-sanitise upon saving document?), and/or as an element of a larger script.

Looking at the 'big picture', I applaud you for contributing to the set of available examples of scripts, because these can provide a great foundation for other users to build new scripts for their own purposes that adopt or adapt your code.
(You can get a taste of my gradual journey starting from a low base to the next step and then a bigger step.)

—DIV
User avatar
Paul - PDF-XChange
Site Admin
Posts: 7418
Joined: Wed Mar 25, 2009 10:37 pm

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by Paul - PDF-XChange »

DIV wrote: Wed Aug 09, 2023 1:07 pm
Thanks for the suggestion, Mathew.
For me, personally, remembering to delete the author names is the more difficult part. If it's added into the list of options for sanitising the document, that would be a great prompt/cue.

—DIV
Just a suggestion, it might be handy as a button, you could actually make it your own custom button.
Best regards

Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
Mathew
User
Posts: 727
Joined: Thu Jun 19, 2014 7:30 pm

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by Mathew »

DIV wrote: Wed Aug 09, 2023 1:07 pm For me, personally, remembering to delete the author names is the more difficult part. If it's added into the list of options for sanitising the document, that would be a great prompt/cue.
Good point. Actually, I wish there was a way to run our own scripts from built-in menu items (ie if one of the options on Sanitize was "run a script...").

I do create quite a few toolbar scripts for myself for things that I have to do often, and I really appreciate the javascript support in PDF-Xchange, but like you said, it's an extra step. I think Tracker has something in the works to make it easier to save and run little code snippets but it's not working yet...
MedBooster
User
Posts: 1411
Joined: Mon Nov 15, 2021 8:38 pm

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by MedBooster »

I agree that it should be easier to see whether your username or author name remains anywhere in the document, in comments, or even base content (?)

It seems Mathew's Javascript script removes it everywhere, so I'll remember to use that if needed.
My wishlist https://forum.pdf-xchange.com/viewtopic.php?p=187394#p187394
Disable SPACE page navigation, fix kb shortcut for highlighting advanced search tool search field, bookmarks with numbers, toolbar small icon size, AltGr/Ctrl+Alt keyboard issues
User avatar
Paul - PDF-XChange
Site Admin
Posts: 7418
Joined: Wed Mar 25, 2009 10:37 pm

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by Paul - PDF-XChange »

Mathew wrote: Wed Aug 09, 2023 6:31 pm I think Tracker has something in the works to make it easier to save and run little code snippets but it's not working yet...
Indeed, RT#5877: FR: Customize toolbars "New command" completion

This will allow one to make a command that has an actual Command ID that can be used anywhere the current commands get used, even Keyboard Shortcuts. You would be able to run JavaScript snippets from it.

What is does not offer is the ability to run a script from within an existing dialogue.
I wish there was a way to run our own scripts from built-in menu items (ie if one of the options on Sanitize was "run a script...").
I am afraid that is not on the cards.
Best regards

Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
DIV
User
Posts: 258
Joined: Fri Jun 23, 2017 1:47 am

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by DIV »

MedBooster wrote: Wed Aug 09, 2023 8:17 pm I agree that it should be easier to see whether your username or author name remains anywhere in the document, in comments, or even base content (?)
It seems Mathew's Javascript script removes it everywhere, so I'll remember to use that if needed.
While I commend Mathew for providing the script, AFAIK it does not remove an author name or username "everywhere". Actually, it is specifically targeted to remove the contents of the Author property of any Annotation objects (also called Comments), which can include text boxes, highlighting, strikethrough, underlining, and so on.
The Author property is automatically set to the user's username (as registered in Editor) when the Annotation object is created. At least, that's what happens within Tracker's Editor application. However, in principle, other software used to annotate a PDF might use something else as the default (a company name, perhaps, or the Windows username). And users can manually modify the Author property.
Notably, the script does not search for the username. No matter what the contents of the Author property, the script will remove those contents (and only those contents).

The script will not touch anything in "base content": not 'ordinary' text, and certainly not images. Nor will it examine metadata, including the nominated Author of the PDF document (as from pressing Ctrl+D).

As I mentioned, one of the benefits of sharing scripts in the forum is that they can be adopted and/or adapted by other users. Mathew's existing script could potentially be extended to create a new (longer) script by adding new code that would have the function of removing specific metadata, or searching the base text, say.

In summary, the existing short script is targeted to a very specific use that wasn't covered by the existing Sanitize Document feature.
Nevertheless, the Sanitize Document feature has a much broader capability, and so I reckon that it should probably still be the first choice for most users.

—DIV
MedBooster
User
Posts: 1411
Joined: Mon Nov 15, 2021 8:38 pm

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by MedBooster »

DIV wrote: Fri Aug 11, 2023 8:22 am
MedBooster wrote: Wed Aug 09, 2023 8:17 pm I agree that it should be easier to see whether your username or author name remains anywhere in the document, in comments, or even base content (?)
It seems Mathew's Javascript script removes it everywhere, so I'll remember to use that if needed.
While I commend Mathew for providing the script, AFAIK it does not remove an author name or username "everywhere". Actually, it is specifically targeted to remove the contents of the Author property of any Annotation objects (also called Comments), which can include text boxes, highlighting, strikethrough, underlining, and so on.
The Author property is automatically set to the user's username (as registered in Editor) when the Annotation object is created. At least, that's what happens within Tracker's Editor application. However, in principle, other software used to annotate a PDF might use something else as the default (a company name, perhaps, or the Windows username). And users can manually modify the Author property.
Notably, the script does not search for the username. No matter what the contents of the Author property, the script will remove those contents (and only those contents).

The script will not touch anything in "base content": not 'ordinary' text, and certainly not images. Nor will it examine metadata, including the nominated Author of the PDF document (as from pressing Ctrl+D).

As I mentioned, one of the benefits of sharing scripts in the forum is that they can be adopted and/or adapted by other users. Mathew's existing script could potentially be extended to create a new (longer) script by adding new code that would have the function of removing specific metadata, or searching the base text, say.

In summary, the existing short script is targeted to a very specific use that wasn't covered by the existing Sanitize Document feature.
Nevertheless, the Sanitize Document feature has a much broader capability, and so I reckon that it should probably still be the first choice for most users.

—DIV
Does any base content contain author information though? If you edit it?
My wishlist https://forum.pdf-xchange.com/viewtopic.php?p=187394#p187394
Disable SPACE page navigation, fix kb shortcut for highlighting advanced search tool search field, bookmarks with numbers, toolbar small icon size, AltGr/Ctrl+Alt keyboard issues
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 12153
Joined: Wed Jan 03, 2018 6:52 pm

Re: RFE: add new option to remove Author data from Annotation objects through Sanitize Document

Post by Daniel - PDF-XChange »

Hello, MedBooster

Base content shouldn't retain author information, that field is not present for those items, it should only be comments and the document metadata fields you see in Ctrl+D (which can be cleaned with the sanitize feature).

Also, @DIV, have passed along the request to the dev team. Sorry for the long delay on an answer there. I cannot promise that it will be implemented, but they are aware now that users are looking for such an option.

Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com