Hello!
I need very fast a bugfix!
res = OCR_LoadA(doc, FileName) will not work, if the Filename has "öäü" oder "ÖÄÜ" inside!
For example
res = OCR_LoadA(doc, "C:\Test.pdf") will work!
res = OCR_LoadA(doc, "C:\Testö.pdf") will not work!
But the biggest problem is, I need very urgend a bugfix!
CAN YOU HELP PLEASE?
OCR_LoadA not possible with umlauts
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Stefan - PDF-XChange
-
Stefan - PDF-XChange
- Site Admin
- Posts: 19942
- Joined: Mon Jan 12, 2009 8:07 am
Re: OCR_LoadA not possible with umlauts
Hello Dorwol,
I will pass this to Walter, and he will advise here in this topic a bit later today.
Best,
Stefan
I will pass this to Walter, and he will advise here in this topic a bit later today.
Best,
Stefan
-
Walter-Tracker Supp
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: OCR_LoadA not possible with umlauts
OCR_LoadA() only accepts ASCII strings (char*, "ascii string literal", LPSTR, CStringA, etc).
Use OCR_LoadW() and pass the filename as a wide string (ie, L"unicodestring", or WCHAR*/wchar_t*/LPWSTR/CStringW/BSTR/etc). That will take care of your umlaut or any other unicode character.
-Walter
Use OCR_LoadW() and pass the filename as a wide string (ie, L"unicodestring", or WCHAR*/wchar_t*/LPWSTR/CStringW/BSTR/etc). That will take care of your umlaut or any other unicode character.
-Walter
-
Walter-Tracker Supp
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: OCR_LoadA not possible with umlauts
BTW, if this doesn't work and you still have trouble, please email us at [email protected]. We watch the forum constantly, but an email will get our attention fastest.
-Walter
-Walter
-
Dorwol
- User
- Posts: 275
- Joined: Mon Aug 04, 2008 5:04 pm
Re: OCR_LoadA not possible with umlauts
Thank you for this very fast help. Marvelous! And yes, this works!Walter-Tracker Supp wrote:Use OCR_LoadW() and pass the filename as a wide string ....That will take care of your umlaut or any other unicode character.
...but...
1. "Umlauts" is no Unicode. It's part of the regulary ASCII Table (for example asc("ä") is ASCII-value 228 !).Walter-Tracker Supp wrote:OCR_LoadA() only accepts ASCII strings
2. OCR_SaveA() from your ocrtools.dll will work with "umlauts"
3. All your other components will work with "umlauts".
...so I am a little bit uncertain whether this is really correct.
-
Stefan - PDF-XChange
- Site Admin
- Posts: 19942
- Joined: Mon Jan 12, 2009 8:07 am
Re: OCR_LoadA not possible with umlauts
Hello Dorwol,
There are a myriad of ASCII variants, and the ASCII value interpreted at your end as "ä" could be a Greek Sigma (Σ) (Windows 1253), or the letter "д" in a Cyrillic version of the codetable (Windows-1251).
So ideally OCR_LoadA() should only be used if all of the symbols in the name and path are from the "English part" of the codetable. I will ask Walter to elaborate on why SaveA works and LoadA doesn't in your case.
Best,
Stefan
There are a myriad of ASCII variants, and the ASCII value interpreted at your end as "ä" could be a Greek Sigma (Σ) (Windows 1253), or the letter "д" in a Cyrillic version of the codetable (Windows-1251).
So ideally OCR_LoadA() should only be used if all of the symbols in the name and path are from the "English part" of the codetable. I will ask Walter to elaborate on why SaveA works and LoadA doesn't in your case.
Best,
Stefan
-
Dorwol
- User
- Posts: 275
- Joined: Mon Aug 04, 2008 5:04 pm
Re: OCR_LoadA not possible with umlauts
Yes, please!Tracker Supp-Stefan wrote:I will ask Walter to elaborate on why SaveA works and LoadA doesn't in your case.
-
Stefan - PDF-XChange
- Site Admin
- Posts: 19942
- Joined: Mon Jan 12, 2009 8:07 am
-
Walter-Tracker Supp
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: OCR_LoadA not possible with umlauts
The short answer is that you should only use the ASCII version of the load and save functions for characters #32 (space) to #126 (tilde) - the lower "standard" part of the ASCII character table. These include un-accented latin letters, numbers, and the symbols that happen to be on English keyboards (!@#$%^&*()_+-=, etc). Otherwise, use the wide / unicode version. It avoids all kinds of potential complications.
In essence, not all umlaut-a are created equal. Each character in ASCII is only represented by 8 bits (1 byte), which only gives 255 possible characters, so to accommodate different languages (european languages, cyrillic, greek, etc), extended code pages were developed. The extended regions stretch from #128 up to #255 (decimal), and the contents of this region may vary depending on the code page that is in effect. Letters which look the same may have different numbers in different code pages (or alternatively, the same number may have a different letter meaning depending on the code page in use).
For example, on code page 437 (which was the standard on many older pre-unicode omputers, and sometimes still used), LATIN SMALL LETTER A WITH DIAERESIS (umlaut a) is hex code #84 (decimal 132). In other common code pages (and also, unicode), the same letter is represented by hex code #E4 (decimal 228). So the meaning of umlaut-A is really not particularly clear cut in ASCII (even if most modern code pages try to use decimal 228 for it).
So best to just bypass the whole mess and use unicode functions.
As to why Load fails but Save works - well, Load must perfectly match whatever is on the filesystem to find the file. Save can save to any filename you want
-Walter
In essence, not all umlaut-a are created equal. Each character in ASCII is only represented by 8 bits (1 byte), which only gives 255 possible characters, so to accommodate different languages (european languages, cyrillic, greek, etc), extended code pages were developed. The extended regions stretch from #128 up to #255 (decimal), and the contents of this region may vary depending on the code page that is in effect. Letters which look the same may have different numbers in different code pages (or alternatively, the same number may have a different letter meaning depending on the code page in use).
For example, on code page 437 (which was the standard on many older pre-unicode omputers, and sometimes still used), LATIN SMALL LETTER A WITH DIAERESIS (umlaut a) is hex code #84 (decimal 132). In other common code pages (and also, unicode), the same letter is represented by hex code #E4 (decimal 228). So the meaning of umlaut-A is really not particularly clear cut in ASCII (even if most modern code pages try to use decimal 228 for it).
So best to just bypass the whole mess and use unicode functions.
As to why Load fails but Save works - well, Load must perfectly match whatever is on the filesystem to find the file. Save can save to any filename you want
-Walter