Working with paragraphs: display, fixing, correct automatic tagging  SOLVED

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

User avatar
Jensen Head
User
Posts: 626
Joined: Mon Sep 13, 2021 8:12 am

Working with paragraphs: display, fixing, correct automatic tagging

Post by Jensen Head »

I have a question about how to enable the display of paragraphs, correct their incorrect placement, and simplify the placement of the <P>-tag in large texts (hundreds of pages). After the explanations in the #44978, I realized that paragraphs are not the essence of the PDF format, they are not there, they are a convention introduced by the editor itself for the convenience of the user. Topic #30444 reported that older versions of PDF-XChange Editor had a context menu action and hotkey to set a selected piece of text to paragraph status, but that feature was removed.
  1. If I select everything in the test document through the Content Pane, all paragraphs of each page are added to the tags as one paragraph tag. Even if they are designated by first-line indentations and additional line spacing of the first line.
  2. The same happens when adding text selected by selecting everything using the Text Element tool.
  3. The same happens when adding text selected by selecting everything using the Edit Text tool.
  4. The same happens when adding text selected by selecting everything using the Select Text tool.
  5. If I save this document as .txt, the visually existing eight paragraphs are preserved, but six paragraphs are added to them that are not in the text (that is, additional paragraph breaks are inserted).
  6. If I select only one paragraph before adding a paragraph tag, it is inserted correctly, affecting only one paragraph. But if the document has, for example, 300 pages, ten paragraphs on each page, then I will have to manually add 3000 tags. And there can be many more pages, as well as paragraphs, as well as documents that require processing. It's not fun =)
When searching for paragraph-related commands that would help me make my work easier, I found only the following:
  1. Show Uniform Text Style in Comments (Always display the same standard font size and standard paragraph style for text within comments).
  2. Increase Indent (Move paragraph farther away from the margin).
  3. Decrease Indent (Move paragraph closer to the margin).
  4. Remove List (Remove list from paragraphs).
  5. Words from the Same Paragraph (Find words from the same paragraph).
Test document for paragraph tagging.pdf
You do not have the required permissions to view the files attached to this post.
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 11315
Joined: Wed Jan 03, 2018 6:52 pm

Re: Working with paragraphs: display, fixing, correct automatic tagging  SOLVED

Post by Daniel - PDF-XChange »

Hello, Jensen Head

There is a distinct separation between "accessibility tagging" with its application of the paragraph (and similar) "tags", and the "paragraph formatting" options, most of which are now present in the properties pane while editing text. I apologize for not explaining this clearly enough in my previous posts.
Jensen Head wrote: Mon Jun 02, 2025 8:42 am After the explanations in the #44978, I realized that paragraphs are not the essence of the PDF format, they are not there, they are a convention introduced by the editor itself for the convenience of the user.
First, I should clarify, the "paragraph" tags in question from the more recent post, are "accessibility tags". They are not there for the convenience of the user, but to enable functions through which third party software can properly interpret the document, in a pre-defined order. No aspect of these tags is intended as something directly visible for the reader of the file, nor do they have any direct relation to the text formatting options which would benefit someone editing the file (in most cases, their use makes later editing more cumbersome).
  • Accessibility tags need to be drawn manually at this time. No it is not possible to simply "select all" text objects, and add the paragraph tag, then get a nicely separated set of tags. The reason accessibility needs to be done fully manually is because a computer is not great at determining reading order, and what information is most important, or which items are "cliffnotes" of otherwise unimportant side-notes. Eventually this may become a possibility, but for now, a human touch is needed.

    Accessibility tagging is supposed to to be the final step performed after a document is completed, and no further editing will happen. It also only serves to help accessibility software (such as vision aiding tools), to operate in the intended consumption order. If you will not be sharing the document publicly, or with someone who needs these accessibility functions, there is no purpose to such tagging, and it will not make editing your file any easier. In-fact, if you may need to later make any level of extended changes to the file, after adding these, you will likely need to perform tedious maintenance to ensure the new content is correctly tagged and ordered, so it is very important to understand when it is time to use this function, and when it is not.

    Before adding tags, you should ask yourself, Do you actually have any need to add these items, and do you need to do so right now? Will it benefit anyone else reading your docuemnts at this point in time? The tools are not easy to use, and while some improvements likely will be made in time, it is an ongoing process, and not a quick one. In the meantime, use of these functions is quite cumbersome, due to the complexity and strict accessibility requirements.
Relating to the later question the "paragraph options" menu.
  • This is accessible when text is selected by pushing Ctrl_H (will only affect the current line, unless you actively select the text while in text editing mode). Or you can get to the same area via the right click context menu. However, this menu has no relation to the accessibility options which you are investigating here today.
    image.png
    As I mentioned in other posts, the concept of a paragraph does not actually exist in PDF. These settings are presented as a tool to reposition and reflow the selected text objects relevant to one another, it is "emulating" the existence of a paragraph structure, hence the name of the tool, but it is not a real paragraph object, since they simply do not exist. To see text in its actual native format in PDF, you need to use the "Edit Objects > text" tool, instead of the "Edit Text" tool (Since edit text enables text object grouping in an effort to make the unwieldy PDF format, more usable in a human capacity.
Moving on to your processes today:
Jensen Head wrote: Mon Jun 02, 2025 8:42 am If I save this document as .txt, the visually existing eight paragraphs are preserved, but six paragraphs are added to them that are not in the text (that is, additional paragraph breaks are inserted).
As before, the concept of tags only serves as an aid for third party accessibility tools, it does not, and cannot have an impact on the export functions. When you export to a txt file, the "paragraph" formatting follows the same rules as the "edit text" tool, with some special "post-processing" that is handled entirely by these options:
image(1).png
Once again, the concept of a paragraph is emulated, and assumed based on the proximity to other text items. If 4 "lines" of text are all positions in a similar pattern, within a few "points" of one another, they will be assumed and emulated as a congruent object for display purposes.
Similarly, there generally are not actual return characters present in a document (Though it is possible to insert them manually), and so the "insert line breaks" option does this automatically, based on the whether or not the following text item, according to the content order, is inline horizontally.

Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
User avatar
Jensen Head
User
Posts: 626
Joined: Mon Sep 13, 2021 8:12 am

Re: Working with paragraphs: display, fixing, correct automatic tagging

Post by Jensen Head »

Thank you very much for the detailed explanations.

Yes, I understand that accessibility tags (including the paragraph accessibility tag) have nothing to do with the exported text, and are only used by voice-reading applications for use by people with poor vision. In fact, if you reread my first message, you will see that I did not claim that accessibility tags should somehow affect the export. I described various options for working with what is usually called a "paragraph" regardless of whether there is an equivalent for it in the file structure, and whether these entities are related to each other or not.

Many of the scanned editions I work with do not exist in accessible versions (in fact, most of them are not even OCRed), and as a hobby I create editions of these books that can later be used by people with disabilities. But at the same time, I try to preserve the original appearance of the edition as much as possible, including the printing features. So yes, adding accessibility tags is a conscious step on my part. Once again, thank you for the clarifications, after which I will be able to better plan my labor costs.

Regarding the features of paragraphs as the second concept, I hope I have now also understood everything correctly. Returning to the question of their display and editing their markup in the case of scanned text, the user of PDF-XChange Editor (reproduction editor) is severely limited in terms of paragraph re-markup, and, in the case of a desire to obtain a "clean" document (for example, in epub format), it is easier to format the correctness of paragraphs after export than before (since there is nothing to format, because, as you have repeatedly reminded, they do not exist in the document, no matter what is written in the export settings).
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 11315
Joined: Wed Jan 03, 2018 6:52 pm

Re: Working with paragraphs: display, fixing, correct automatic tagging

Post by Daniel - PDF-XChange »

Hello, Jensen Head
Jensen Head wrote: Tue Jun 03, 2025 7:03 pm in the case of a desire to obtain a "clean" document (for example, in epub format), it is easier to format the correctness of paragraphs after export than before (since there is nothing to format, because, as you have repeatedly reminded, they do not exist in the document, no matter what is written in the export settings).
Yes, in most cases, editing core text elements will be considerably simpler in a flow based format than it will be in any PDF software. While PDF editing tools do exist, as a coordinate based format without the simplified "flowing text" that Doc, Epub, and so on make use of, editing paragraphs, and in particular, ensuring those changes have a continual impact on other nearby items, especially from one page to the next, is extremely difficult to do properly.

Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com