Summary
If PDF-Tools receives plain-text files in OEM-855 encoding as input, it doesn't recognize them correctly, resulting in garbage output. This is dangerous if the user doesn't notice it, as it can lead to the loss of user data.
Steps to Reproduce
1. Create a Cyrillic document in OEM-855 encoding with the doc extension. This format was once very common in the ex-Soviet Union area. You can use the document attached to the PDF provided as an example, or generate your own.
͏
2. Create a tool consisting of "Choose Input Files," "Change Document Properties," "Sanitize Document," and "Export PDF to Microsoft Word Document" (a smaller set might be sufficient, but this one reproduces the error 100% of the time). I also added my own version of the tool to the attachments of the provided PDF.
3. Drag the .doc document onto the tool (or use the extended file selection dialog).
Expected Behavior
A docx with readable text will be created.
͏
͏
Actual Behavior
͏
In this case, no errors are generated.
Environment
- OS: Microsoft Windows 10.0.26200.7462
- Microsoft Office LTSC Professional Plus 2024 - en-us 16.0.1793220602
- PDF-Tools 10.7.6, build 404
Additional Context
I assume this problem is not specific to the specified encoding.
͏
[bug] OEM-855 encoding in incoming files is not recognized correctly
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Stefan - PDF-XChange
-
Jensen Head
- User
- Posts: 823
- Joined: Mon Sep 13, 2021 8:12 am
[bug] OEM-855 encoding in incoming files is not recognized correctly
You do not have the required permissions to view the files attached to this post.
-
Vladimir G - Tracker Dev
- User
- Posts: 91
- Joined: Thu Nov 30, 2017 1:24 pm
Re: [bug] OEM-855 encoding in incoming files is not recognized correctly
Hello Jensed Head,
Due to the very rare use of the OEM-855 encoding, I cannot say whether this behavior will be changed soon. However, you can insert the Create PDF from Text action between the Choose Input Files and Change Document Properties actions in your tool and set the encoding manually.
Best regards,
Due to the very rare use of the OEM-855 encoding, I cannot say whether this behavior will be changed soon. However, you can insert the Create PDF from Text action between the Choose Input Files and Change Document Properties actions in your tool and set the encoding manually.
Best regards,
You do not have the required permissions to view the files attached to this post.
Vladimir
Software Developer
PDF-XChange Co. LTD
Software Developer
PDF-XChange Co. LTD