Getting OCR to recognize non-standard diacritics (Chinese pinyin)?

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: Tracker Support, TrackerSupp-Daniel, Sean - Tracker, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
lyanna
User
Posts: 1
Joined: Sat Sep 28, 2024 2:15 am

Getting OCR to recognize non-standard diacritics (Chinese pinyin)?

Post by lyanna »

I have run OCR on some scanned Chinese books, and while it has no problem with the Chinese characters themselves, it can't encode pinyin properly. What I mean by that is the document looks fine, but if you try to copy any pinyin and paste it elsewhere it becomes nonsense, e.g. shTyong. It's seem like the tone marks are confusing the OCR, and I'm not sure how to fix this. I chose Simplified CN, Traditional CN, and English for the languages, but pinyin isn't an option, probably because it's not exactly language. Yet you can type in pinyin so it must be possible to encode it properly.
User avatar
Dimitar - Tracker Supp
Site Admin
Posts: 2016
Joined: Mon Jan 15, 2018 9:01 am

Re: Getting OCR to recognize non-standard diacritics (Chinese pinyin)?

Post by Dimitar - Tracker Supp »

Hello lyanna,

Welcome to our Forum.

May I ask you for a copy of one of the files you are having this issue with, as well as its converted copy?

Also, please send us a screenshot of the OCR settings you are using.

You may send us the files to our support email address: sales@pdf-xchange.com

Regards.
Post Reply