Languages can be divided into two groups: those with spaces between words, such as English, and those without spaces between words, such as Japanese and Chinese. When PDF-XChange Editor treats multiple text elements in a PDF as flow text without line breaks, it looks as if they are simply combined with spaces between them, without taking into account the differences in characteristics between languages. While this is fine for most languages, it is somewhat problematic for languages such as Japanese and Chinese, which do not have spaces between words, and I report below.
Each issue is independent of the other, and I have compared PDF files created from Japanese, Traditional Chinese and Latin Alphabet text files using "Convert Text Files to PDF" feature in PDF-XChange Editor. For Japanese and Latin alphabet, the PDF was output with default settings. For Traditonal Chinese, "New Paragraph Mode" was set to "Each newline character starts a new paragraph".
- Issue 1: Saving as plain text
When saving as a plain text file, even though the "Insert line breaks" option is disabled and the "Insert breaks after each paragraph" option is enabled, spaces are inserted on all lines and line breaks are not inserted after each paragraph. If the same settings are tried for Latin Alphabet, the output text will not contain unnecessary spaces, and line breaks will be inserted correctly for each paragraph.
- Issue 2: Export to Word document
In the options for exporting to a word document, even though the layout setting is set to "Retain Flowing Text", spaces are inserted on every line, and line breaks are not inserted after each paragraph. On the other hand, when you try the same settings for Latin Alphabet text, the output file does not contain any unnecessary spaces, and line breaks will be inserted correctly for each paragraph. Although there is a difference between a text file and a Word document, basically the same thing applies as described above for a text file, so I will not go into details.
- Issue 3: Read Selected Text Out Loud
When reading selected text out loud from the Accessibility tab, some SAPI text-to-speech engines produce unnatural silence intervals on each line. In the Latin Alphabet, the text is played back naturally as flow text, even with lines in between. For some text-to-speech engines, it seems that the text-to-speech engine itself removes space, and only in such cases the speech is played back naturally without unnatural silence intervals.
In the verification shown in the waveform diagrams below, the same Voice engine (VW Misaki) was specified in Acrobat Reader DC and PDF-XChange Editor, and the first and second paragraphs of a Japanese PDF file were read out loud. The yellow markers in the PDF-XChange Editor waveform indicate unnatural silence intervals, which correspond to the red line in the Acrobat Reader DC waveform. At the same time, it also corresponds to the same numbered section shown in the Japanese sample.
- Issue 4: Advanced Search
If a newline is included in the result of a search, it is displayed as a space. The effect is less severe than the above three, since only the results are displayed, but the search results are slightly more difficult to read. In Latin Alphabet, search results do not contain unnecessary spaces and are easy to read.
In the video, I selected the appropriate characters in the Japanese and Traditional Chinese texts, respectively, and performed an advanced search on them. Note the red box in each search result. You will notice that there is an unnecessary space corresponding to a line break in the original PDF file.
Above four issues are reported separately, but the root cause might be the same. Incidentally, if you activate edit mode, select a block of text, and copy and paste it to the clipboard, the text does not contain any wasted spaces, even in the current build 368.
Hoping that the above information will be of some help to you.
Thank you so much for your continued support.
Best regards,
rakunavi
- PDF-XChange Editor Plus Version: 9.5 build 368.0
- OS Version: Windows 11 Home 22H2 Build 22621.1555
- PC Model: Lenovo IdeaPad C340-15IWL, HP All-in-One 22-c0xx