Latin and Gothic Letters for OCR
Moderators: Daniel - PDF-XChange, PDF-XChange Support, Vasyl - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
Latin and Gothic Letters for OCR
Hi there,
I would like to ask if you could please add Latin to the list of additional OCR-languages. But for me even more important would be the possibility to ocr books with Gothic/Black letters. Especially I am looking for the so called "Unger-Fraktur" as many German books from the 19th century have been printed using these letters. Do you think this is possible?
Thanks a lot
Ludwig
I would like to ask if you could please add Latin to the list of additional OCR-languages. But for me even more important would be the possibility to ocr books with Gothic/Black letters. Especially I am looking for the so called "Unger-Fraktur" as many German books from the 19th century have been printed using these letters. Do you think this is possible?
Thanks a lot
Ludwig
-
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: Latin and Gothic Letters for OCR
We will add Slovakian, Swedish, and German "fraktur" language data in the final release of the editor. We will not have direct Latin support, though results using English (or even other Latin alphabet) language selection will be fairly good since the word dictionary weighting is fairly weak (ie, it will not dominate results too seriously).
-Walter
-Walter
Re: Latin and Gothic Letters for OCR
I am very much looking forward to the final release then! Do you have a rough time horizon?
- Paul - PDF-XChange
- Site Admin
- Posts: 7356
- Joined: Wed Mar 25, 2009 10:37 pm
- Contact:
Re: Latin and Gothic Letters for OCR
Hi Ludwig,
Walter tells me this should be available in the next few weeks.
hth
Walter tells me this should be available in the next few weeks.
hth
Best regards
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
-
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: Latin and Gothic Letters for OCR
Ludwig, I have prepared the Fraktur language pack and sent it to our installation guys. It may be a few days before it becomes available on the website but I thought I would update you to let you know that it will be very soon. It will work with both the viewer and the editor.
-Walter
-Walter
Re: Latin and Gothic Letters for OCR
Thanks Walter! This is really good news.
- Stefan - PDF-XChange
- Site Admin
- Posts: 19794
- Joined: Mon Jan 12, 2009 8:07 am
- Contact:
-
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: Latin and Gothic Letters for OCR
Ludwig, I have attached the language pack to this post, because I guess it will still be a few days since our installer people are very busy with the new editor release. You will have to place them in your language directory yourself, and we cannot provide support for this since we will have a proper installer generated pretty shortly. Languages for the *viewer* are placed in a directory called "ocrdats" off the main Viewer installation directory, e.g.:
C:\Program Files\Tracker Software\PDF Viewer\ocrdats
In the editor, you will have to find PluginsData\OCRLanguages, e.g.:
C:\Program Files\Tracker Software\PDF Editor\PluginsData\OCRLanguages
Copy all the .lng and .dat files into those directories and you should see the Fraktur choices in your OCR preferences / run dialog.
-Walter
C:\Program Files\Tracker Software\PDF Viewer\ocrdats
In the editor, you will have to find PluginsData\OCRLanguages, e.g.:
C:\Program Files\Tracker Software\PDF Editor\PluginsData\OCRLanguages
Copy all the .lng and .dat files into those directories and you should see the Fraktur choices in your OCR preferences / run dialog.
-Walter
- Attachments
-
- Fraktur-Language-Pack.7z
- (1.46 MiB) Downloaded 512 times
Re: Latin and Gothic Letters for OCR
Hi Walter,
Thank you very much for the files. Using the Viewer Pro (not the Editor) I tried the new German Fraktur (don't really know what Swedish and Slovakian Fraktur is though, so I didn't try those) on three books so far: Very promissing! Great job!
Ludwig
Thank you very much for the files. Using the Viewer Pro (not the Editor) I tried the new German Fraktur (don't really know what Swedish and Slovakian Fraktur is though, so I didn't try those) on three books so far: Very promissing! Great job!
Ludwig
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Contact:
Re: Latin and Gothic Letters for OCR
Great! I'll pass the message along to Walter 

If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Re: Latin and Gothic Letters for OCR
Hi, is there a way to train the OCR programm for better Fraktur-letter-detection? I found out that the programm systematically misreads "ch" what is turned into just "c" then. For example "Bezeicnung" instead of "Bezeichnung".
-
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: Latin and Gothic Letters for OCR
Not at the moment. We may release a tool to help with training in the future. However, if you feel ambitious you can email us at support@pdf-xchange.com and I can point you in the right direction, but can't provide detailed support for it - you'd be on your own.
Re: Latin and Gothic Letters for OCR
will be may added Croatian language ?Walter-Tracker Supp wrote:We will add Slovakian, Swedish, and German "fraktur" language data in the final release of the editor. We will not have direct Latin support, though results using English (or even other Latin alphabet) language selection will be fairly good since the word dictionary weighting is fairly weak (ie, it will not dominate results too seriously).
-Walter
thank you
-
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: Latin and Gothic Letters for OCR
Croatian will be available on or before the next build, anticipated in about a month's time. Meanwhile you can use any other language we provide which uses the same diacritics, if applicable (I'm not familiar with Croatian myself), because the word dictionary coupling is weak.
I will update this forum posting once we have included it.
I will update this forum posting once we have included it.
Re: Latin and Gothic Letters for OCR
This thread is quite old; nevertheless I wished to exress my big thanks for the "german Fraktur" ocr set! I had been desperately searching for this Feature!
- John - Tracker Supp
- Site Admin
- Posts: 5223
- Joined: Tue Jun 29, 2004 10:34 am
- Contact:
Re: Latin and Gothic Letters for OCR
Pleasure 

If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com
Re: Latin and Gothic Letters for OCR
(Revised for clarity)
Hi, I would like to know which "OCR language" option can OCR the diacritics in the IAST set [1]. They include these 17-pair diacritic characters:
Ā Ī Ū ṚṜ ḶḸḺ Ṃ ṄÑṆ Ḥ Ṭ Ḍ ŚṢ
ā ī ū ṛṝ ḷḹḻ ṃ ṅñṇ ḥ ṭ ḍ śṣ
The examples of IAST text can be seen in [2], [3]. The [2] is a scanned PDF, an example I want to OCR.
I am a native Chinese user unfamiliar with the "OCR language" choice in this situation. So, I blindly tried some Europe-related OCR language options, as listed in [4], but none was the correct choice.
Best Regards.
YC Niu
Reference
[1] Diacritics in IAST set
https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration
[2] Example 1 of IAST text (scanned PDF)
https://archive.org/details/dhatukatha-pts/PTS-Digha-Nikaya-vol-I-TWRD-Carpenter-1899/page/1/mode/2up
[3] Example 2 of IAST text (HTML; right-half side of the web page)
https://suttacentral.net/dn1/en/sujato?layout=sidebyside&reference=none¬es=asterisk&highlight=false&script=latin
[4] In this list, none of them are suitable for IAST.
localname="Čeština" name="Czech"
localname="Deutsch" name="German"
localname="Español" name="Spanish; Castilian"
localname="Français" name="French"
localname="Română" name="Romanian; Moldavian; Moldovan"
localname="Suomi" name="Finnish"
Hi, I would like to know which "OCR language" option can OCR the diacritics in the IAST set [1]. They include these 17-pair diacritic characters:
Ā Ī Ū ṚṜ ḶḸḺ Ṃ ṄÑṆ Ḥ Ṭ Ḍ ŚṢ
ā ī ū ṛṝ ḷḹḻ ṃ ṅñṇ ḥ ṭ ḍ śṣ
The examples of IAST text can be seen in [2], [3]. The [2] is a scanned PDF, an example I want to OCR.
I am a native Chinese user unfamiliar with the "OCR language" choice in this situation. So, I blindly tried some Europe-related OCR language options, as listed in [4], but none was the correct choice.
Best Regards.
YC Niu
Reference
[1] Diacritics in IAST set
https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration
[2] Example 1 of IAST text (scanned PDF)
https://archive.org/details/dhatukatha-pts/PTS-Digha-Nikaya-vol-I-TWRD-Carpenter-1899/page/1/mode/2up
[3] Example 2 of IAST text (HTML; right-half side of the web page)
https://suttacentral.net/dn1/en/sujato?layout=sidebyside&reference=none¬es=asterisk&highlight=false&script=latin
[4] In this list, none of them are suitable for IAST.
localname="Čeština" name="Czech"
localname="Deutsch" name="German"
localname="Español" name="Spanish; Castilian"
localname="Français" name="French"
localname="Română" name="Romanian; Moldavian; Moldovan"
localname="Suomi" name="Finnish"
Last edited by YC Niu on Sat Jun 15, 2024 5:40 am, edited 17 times in total.
- Daniel - PDF-XChange
- Site Admin
- Posts: 10910
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Latin and Gothic Letters for OCR
Hello, YC Niu
OCR is intended for direct character recognition, not necessarily transliteration, which is a very complex feature to implement. We will look into it, and see what we can offer, but I cannot promise that this will be possible. Nor can I say that this will be something we could offer an any short timeframe.
RT#6964: FR: transliterate phonetically
Kind regards,
OCR is intended for direct character recognition, not necessarily transliteration, which is a very complex feature to implement. We will look into it, and see what we can offer, but I cannot promise that this will be possible. Nor can I say that this will be something we could offer an any short timeframe.
RT#6964: FR: transliterate phonetically
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Re: Latin and Gothic Letters for OCR
Dan,
Thank you for your reply. My English is quite limited. To avoid my question not being clear and to avoid misunderstanding your answer, I tried to describe my question here in another way.
I want "direct character recognition" without "transliteration." The problem is I do not know which option of "OCR language" that can correctly recognize these 17-pair diacritic characters:
Ā Ī Ū ṚṜ ḶḸḺ Ṃ ṄÑṆ Ḥ Ṭ Ḍ ŚṢ
ā ī ū ṛṝ ḷḹḻ ṃ ṅñṇ ḥ ṭ ḍ śṣ
I want to OCR (direct character recognition) these 17-pair diacritic characters.
For clarity, I revised my original post, including the IAST-related info [1]-[3].
Best Regards,
YC Niu
Thank you for your reply. My English is quite limited. To avoid my question not being clear and to avoid misunderstanding your answer, I tried to describe my question here in another way.
I want "direct character recognition" without "transliteration." The problem is I do not know which option of "OCR language" that can correctly recognize these 17-pair diacritic characters:
Ā Ī Ū ṚṜ ḶḸḺ Ṃ ṄÑṆ Ḥ Ṭ Ḍ ŚṢ
ā ī ū ṛṝ ḷḹḻ ṃ ṅñṇ ḥ ṭ ḍ śṣ
I want to OCR (direct character recognition) these 17-pair diacritic characters.
For clarity, I revised my original post, including the IAST-related info [1]-[3].
Best Regards,
YC Niu
- Stefan - PDF-XChange
- Site Admin
- Posts: 19794
- Joined: Mon Jan 12, 2009 8:07 am
- Contact:
Re: Latin and Gothic Letters for OCR
Hello YC Niu,
The issue is that your text you want to recognize is already a transliteration of the original Sanskrit one. So the letters our OCR engine can recognize would not match any specific language and it's corresponding dictionary, that is why this is still a transliteration and quite a more complex task than recognizing letters written in the original script of a language.
Thanks for your clarification - I've added a note to the ticket Dan created so that our devs can check this further.
Kind regards,
Stefan
The issue is that your text you want to recognize is already a transliteration of the original Sanskrit one. So the letters our OCR engine can recognize would not match any specific language and it's corresponding dictionary, that is why this is still a transliteration and quite a more complex task than recognizing letters written in the original script of a language.
Thanks for your clarification - I've added a note to the ticket Dan created so that our devs can check this further.
Kind regards,
Stefan
Re: Latin and Gothic Letters for OCR
Hello, Stefan
Thank you for your explanation. Now I understand the difficulty, and it's clear that there is no ready-made solution. I'm ok with this. Also, thanks, Dan.
Best regard,
YC Niu
Thank you for your explanation. Now I understand the difficulty, and it's clear that there is no ready-made solution. I'm ok with this. Also, thanks, Dan.
Best regard,
YC Niu
- Daniel - PDF-XChange
- Site Admin
- Posts: 10910
- Joined: Wed Jan 03, 2018 6:52 pm
Latin and Gothic Letters for OCR

Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com