Add Language field to Text Properties  SOLVED

Forum for the PDF-XChange Editor - Free and Licensed Versions

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

User avatar
Jensen Head
User
Posts: 626
Joined: Mon Sep 13, 2021 8:12 am

Add Language field to Text Properties

Post by Jensen Head »

This is an important element of document accessibility.

Ideally, in the case of document recognition using several languages, the OCR engine should itself assign language tags to the created text objects, and in the case of recognizing the entire document, also to the document, depending on the predominant language of the document or the first page with text (I believe this should be discussed separately) .

Currently, the following set of fields are displayed in Text Properties:
§ Style
  • Fill Color
  • Stroke Color
  • Border Width
  • Rendering Mode
  • Font
  • Font Size
§ Font Details
  • Name
  • Embedded
  • Type
  • Encoding
  • Actual Font
  • Actual Font Type
  • Object Number
Thus, it is not possible to set the language of selected text elements.

[ℹ] Related links

1. Accessibility and Usability at Penn State / Language Tagging / PDF (2015) — accessibility.psu.edu/foreignlanguages/langtag/#pdf

2. W3C / Techniques for WCAG 2.0 (Techniques and Failures for Web Content Accessibility Guidelines 2.0) / PDF19: Specifying the language for a passage or phrase with the Lang entry in PDF documents — w3.org/TR/WCAG20-TECHS/PDF19.html

3.
Set the secondary language for the text (…). This is done on each tag that is in the secondary language. Select the tag / Right-click / Properties, and choose the language from the drop-down menu. If the language isn't in the menu, type in the language code listed here at ISO 639-2
community.adobe.com/t5/acrobat-discussions/pdf-accesibles-en-dos-idiomas-se-puede-hacer/m-p/13765225#bodyDisplay_2d908c73d37e97
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19913
Joined: Mon Jan 12, 2009 8:07 am

Re: Add Language field to Text Properties

Post by Stefan - PDF-XChange »

Hello Jensen Head,

Thanks for your comment! I will ask our devs working on accessibility to take a look at this post and your suggestions. I can not at this point make any promises as to if or when such a feature might be available in our products.

p.s. It appears like it is possible to specify the Language tag for paragraphs for accessibility:
image_2025_01_02T22_04_09_768Z.png
And I've created a ticket for the FR part of your post - so that such tags could be added by the OCR engines in the future as well:
#7250: OCR: Add accessibility tags

Kind regards,
Stefan
You do not have the required permissions to view the files attached to this post.
User avatar
Jensen Head
User
Posts: 626
Joined: Mon Sep 13, 2021 8:12 am

Re: Add Language field to Text Properties

Post by Jensen Head »

Stefan - PDF-XChange wrote: Thu Jan 02, 2025 11:40 amIt appears like it is possible to specify the Language tag for paragraphs for accessibility
I can use the Select / Text command to select all text objects in a document to assign a language to them in bulk:
͏
PDFXEdit (2025-05-28 11-32-21).png
͏
But I can't seem to do the same with paragraphs. Should I make a separate feature request for this, or am I satisfied with this feature request in this thread?

Also, some users may find it more convenient to check the Apply selected to all text objects in the document checkbox in the Reading Options block of the Advanced tab of the Document Properties dialog box. However, this should be coordinated with the functionality of setting up multiple languages ​​for a document (relevant for terminological translation dictionaries, bilingual books for foreign language learners, and user manuals).
You do not have the required permissions to view the files attached to this post.
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 11333
Joined: Wed Jan 03, 2018 6:52 pm

Re: Add Language field to Text Properties  SOLVED

Post by Daniel - PDF-XChange »

Hello, Jensen Head

A "paragraph" does not actually exist in a PDF, we just do a very good job of pretending they do, and the new accessibility tags offer it as a "block" object to aid screen-readers with their content ordering. Practically speaking, no paragraph of text in a PDF document has ever been a single congruent object, and the spec does not seem likely to change this.

Page text objects (which can sometimes be a single floating point character) are not intended to contain this language data. The "tags" method Stefan mentioned above are the "accessibility" features your links refer to, but it is not possible to add these to the base content.

The Request Stefan made is for the OCR engine to automatically create those "tagged" areas, and attempt to assign the language automatically, based on what it has detected. No special data will be present in the actual text content, but all would be added to this secondary "accessibility tag" region that is created.

As for defining multiple languages in the document properties; as we have mentioned before, that is not the intent of the document properties, and will not be changing at this time. If the Specification changes to indicate it should be a common case, we will reconsider it then.

Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
User avatar
Jensen Head
User
Posts: 626
Joined: Mon Sep 13, 2021 8:12 am

Re: Add Language field to Text Properties

Post by Jensen Head »

Stefan, Daniel, thank you for your help! Thanks to you, I now understand that accessibility functionality is an add-on to the basic entities of content, a kind of layer, an intermediary between the content and the specialized tool for reading the document. And the terms "block", "paragraph", "tags" and their properties are not the content itself, but allow you to do things with it that would be difficult or impossible to do without them.

I think I figured out how to assign language properties to text objects. In fact, my approach is wrong now, since it should be part of a comprehensive work on creating the document tag structure. In addition to all text blocks receiving the correct Paragraph tag, their hierarchical arrangement should reflect the actual structure of the document. This means that headings, subheadings and the main text should be correctly nested inside each other, despite using the same tag type. Also, keep in mind that if the document contains tables, lists or images, their tags (Table, L, Figure) do not need to be converted to Paragraph, but processed in the same way, making sure that the order of traversing the blocks on each page corresponds to the reading order ("Reading Order").

In my case, I do it this way, correct me if I'm doing something fundamentally wrong:

1. "The document has no tags. Create Tags Root to continue tagging document".
2. New Tag / Type: P, Title: <language name>.
3. Edit Objects / Edit Text / Select all blocks of one language with Ctrl.
4. "Create Tag from Selection" (the selected paragraph blocks are added as nested tags to the tag created above).
5. Repeat steps 2 through 4 for other document languages.
6. Repeat steps 2 through 4 for other data types for all document languages.

I haven't figured out yet how to automatically remove empty <Span> containers, how to automatically move all objects from XForm containers to the root, how to delete tags without objects (the Delete Empty Tags command doesn't work). Also, I don't understand yet how to get to the tag (or tags) associated with this object from a paragraph block in the page space or from the Content Pane. But these are not related to the topic, which can be considered closed.

As for defining multiple languages ​​in the document properties — maybe then it would be more accurate to call this property not "Language", but "Primary language of the document" (or something like that), specifying that this is the language of the title page, or cover, or output data of the document. This will remove all questions about what to do with this document tag in the case of multilingual documents.

PS. and, as always, forgive me for my lack of understanding and speech impediment, I know English very poorly.
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 11333
Joined: Wed Jan 03, 2018 6:52 pm

Re: Add Language field to Text Properties

Post by Daniel - PDF-XChange »

Hello, Jensen Head

Yes, you are essentially on the right path now. It is worth noting that most objects in a document do not need to have a language set. "Default" means that they will inherit the "language" from the document properties (This is part of why you should only have a single language present there). You only need to specify a language for content tags which are on content not written in the Document's "primary" language. You later suggested changing the name of this option, but it is currently only "language"[singular], and this seems to be the general presentation offered by many other apps as well, so it is unlikely to change.

Regarding your latter questions about xforms, I believe these need to remain in place, as they hold some of the flags for tagging. In essence, a tagged PDF will look much more complex in the content panel. Part of this is necessity, since it is as before, designed to help other applications to interpret the file content, and to communicate accurately with other computers, you need to leave nothing to the imagination (after all, that is something most computers I know tend to lack at this point in time).

In essence, configuring tags will be very difficult in most situations. We are essentially giving a human the tools to try to explain to a computer, in words the computer will understand, how it should explain the content to another human.

Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com