Getting OCR to recognize non-standard diacritics (Chinese pinyin)?

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

Post Reply
lyanna
User
Posts: 1
Joined: Sat Sep 28, 2024 2:15 am

Getting OCR to recognize non-standard diacritics (Chinese pinyin)?

Post by lyanna »

I have run OCR on some scanned Chinese books, and while it has no problem with the Chinese characters themselves, it can't encode pinyin properly. What I mean by that is the document looks fine, but if you try to copy any pinyin and paste it elsewhere it becomes nonsense, e.g. shTyong. It's seem like the tone marks are confusing the OCR, and I'm not sure how to fix this. I chose Simplified CN, Traditional CN, and English for the languages, but pinyin isn't an option, probably because it's not exactly language. Yet you can type in pinyin so it must be possible to encode it properly.
User avatar
Dimitar - PDF-XChange
Site Admin
Posts: 2268
Joined: Mon Jan 15, 2018 9:01 am

Re: Getting OCR to recognize non-standard diacritics (Chinese pinyin)?

Post by Dimitar - PDF-XChange »

Hello lyanna,

Welcome to our Forum.

May I ask you for a copy of one of the files you are having this issue with, as well as its converted copy?

Also, please send us a screenshot of the OCR settings you are using.

You may send us the files to our support email address: sales@pdf-xchange.com

Regards.
Post Reply