tuning OCR or outright editing mathematical formulae in OCR output
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
tuning OCR or outright editing mathematical formulae in OCR output
any suggestions for how to improve the OCR output for a text that is an older text written in a somewhat esoteric and not fully consistent formatting hierarchy, with lot of math and numbers, and the OCR isn't happy lol
You do not have the required permissions to view the files attached to this post.
-
- Site Admin
- Posts: 2431
- Joined: Mon Jan 15, 2018 9:01 am
Re: tuning OCR or outright editing mathematical formulae in OCR output
Hello, makesdocs,
May I ask you to send us one of the files you are having this issue with?
Regards.
May I ask you to send us one of the files you are having this issue with?
Regards.
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
sure - here are the 2 files I shared in the previous post
"Chap3-Rotated" is the final image PDF (orignally was TIF files, with some cropping and luminosity adjustments, and then merged into a PDF file. Then, we used pdf-exchange to rotate every other page, as the book was scanned on a flatbed, which required the book to be spun 180deg. for every other page to physically fit on the scanner.
"Chap3-Rotated OCRhigh" is the Chap3-Rotated file, post an OCR convesion with deskewing, set to 'high'.
let me know what you think
best
/D
"Chap3-Rotated" is the final image PDF (orignally was TIF files, with some cropping and luminosity adjustments, and then merged into a PDF file. Then, we used pdf-exchange to rotate every other page, as the book was scanned on a flatbed, which required the book to be spun 180deg. for every other page to physically fit on the scanner.
"Chap3-Rotated OCRhigh" is the Chap3-Rotated file, post an OCR convesion with deskewing, set to 'high'.
let me know what you think
best
/D
You do not have the required permissions to view the files attached to this post.
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
the forum is saying the pre-OCR file is too large - it's 51.35MB
You do not have the required permissions to view the files attached to this post.
-
- Site Admin
- Posts: 7388
- Joined: Wed Mar 25, 2009 10:37 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
Thanks for this makesdocs,
can you upload it to our file service please? https://www.pdf-xchange.com/knowle ... le-service
we will take a look.
can you upload it to our file service please? https://www.pdf-xchange.com/knowle ... le-service
we will take a look.
Best regards
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
-
- User
- Posts: 1825
- Joined: Sat Sep 11, 2021 5:04 am
Re: tuning OCR or outright editing mathematical formulae in OCR output
Hi makesdocs,
If you want EOCR to recognize mathematical formulas, you need to add the "Simple math formulas" language, is it registered? However, even if you use this dictionary, you should not expect much accuracy. The name of the dictionary starts with "Simple."
Best regards,
rakunavi
If you want EOCR to recognize mathematical formulas, you need to add the "Simple math formulas" language, is it registered? However, even if you use this dictionary, you should not expect much accuracy. The name of the dictionary starts with "Simple."

Best regards,
rakunavi
You do not have the required permissions to view the files attached to this post.
TOP desires for PDFXCE
forum.pdf-xchange.com/viewtopic.php?t=39665 LassoTool
forum.pdf-xchange.com/viewtopic.php?t=38554 CmtGarbled
forum.pdf-xchange.com/viewtopic.php?t=37353 FulScrMultiMon
forum.pdf-xchange.com/viewtopic.php?t=41002 DisableTouchSelect
forum.pdf-xchange.com/viewtopic.php?t=39665 LassoTool
forum.pdf-xchange.com/viewtopic.php?t=38554 CmtGarbled
forum.pdf-xchange.com/viewtopic.php?t=37353 FulScrMultiMon
forum.pdf-xchange.com/viewtopic.php?t=41002 DisableTouchSelect
-
- Site Admin
- Posts: 7388
- Joined: Wed Mar 25, 2009 10:37 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
Thanks for that Rakunavi.
you need to add the "Simple math formulas" language
I am just a tad embarrassed that you knew this and I didn't! But you actually use that a lot more than me I suspect. Thanks so much for pointing that out!
you need to add the "Simple math formulas" language
I am just a tad embarrassed that you knew this and I didn't! But you actually use that a lot more than me I suspect. Thanks so much for pointing that out!
Best regards
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
you folks are so responsive gotta say tyvm again
I'll check and load the simple math formulas language and see what that does
I uploaded the pre-OCR file just now
I'll check and load the simple math formulas language and see what that does
I uploaded the pre-OCR file just now
You do not have the required permissions to view the files attached to this post.
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
Just now looking at the OCR dictionaries, I see, in addition to English, Czech, Finnish, French, German, Polish & Spanish.
I think it might be worth removing those, as some of the incorrect OCR conversions have diacriticals etc...
I've added Simple Math and Chemisty dictionaries.
I moved the OCR language files as per the knowledgebase article linked below, and put them into an adjacent directory - is that enough to de-register them from the OCR engine?
https://www.pdf-xchange.com/knowledgebase/352-How-do-I-uninstall-or-remove-OCR-language-packs-from-PDF-XChange-Editor-and-PDF-XChange-Viewer
in any case, I'll give it a shot and report back
I think it might be worth removing those, as some of the incorrect OCR conversions have diacriticals etc...
I've added Simple Math and Chemisty dictionaries.
I moved the OCR language files as per the knowledgebase article linked below, and put them into an adjacent directory - is that enough to de-register them from the OCR engine?
https://www.pdf-xchange.com/knowledgebase/352-How-do-I-uninstall-or-remove-OCR-language-packs-from-PDF-XChange-Editor-and-PDF-XChange-Viewer
in any case, I'll give it a shot and report back
You do not have the required permissions to view the files attached to this post.
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
if in fact for my version (9.5.366.0), the changes of
1) adding simple math and chemistry dictionaries to OCR, and
2) moving the identified potentially superfluous OCR languages to another directory is sufficient to "de-register" them from the OCR engine
didn't do anything I've caught yet in a side by side comparison of the OCR outputs...
I'll wait for some guidance here, and go look at the other solutions in my other posts, as I think they will, even if considerable work, will address the quality needed for the digitization of the material
thanks again for any time and attention you all have shared - super helpful
D
1) adding simple math and chemistry dictionaries to OCR, and
2) moving the identified potentially superfluous OCR languages to another directory is sufficient to "de-register" them from the OCR engine
didn't do anything I've caught yet in a side by side comparison of the OCR outputs...
I'll wait for some guidance here, and go look at the other solutions in my other posts, as I think they will, even if considerable work, will address the quality needed for the digitization of the material
thanks again for any time and attention you all have shared - super helpful
D
-
- Site Admin
- Posts: 2431
- Joined: Mon Jan 15, 2018 9:01 am
Re: tuning OCR or outright editing mathematical formulae in OCR output
Hello, makesdocs,
Thanks for the files. We will run some tests and advise you on what would be the best approach to handle these types of content.
By the way, the article you are mentioning is not meant for the Enhanced OCR tool present in the version 9 products. Removing the tick on these languages is enough to disable them.
Regards.
Thanks for the files. We will run some tests and advise you on what would be the best approach to handle these types of content.
By the way, the article you are mentioning is not meant for the Enhanced OCR tool present in the version 9 products. Removing the tick on these languages is enough to disable them.
Regards.
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
copy that.
reason I looked it up is, I can't make any changes to the Already-Installed OCR languages... their check boxes are not editable, nor can I de-select any radio buttons...I can for Not installed items, just not the existing...
reason I looked it up is, I can't make any changes to the Already-Installed OCR languages... their check boxes are not editable, nor can I de-select any radio buttons...I can for Not installed items, just not the existing...
You do not have the required permissions to view the files attached to this post.
-
- Site Admin
- Posts: 11577
- Joined: Wed Jan 03, 2018 6:52 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
Hello, makesdocs
I believe what dimitar meant was that you can disable the languages by unchecking them in the language dropdown, within the OCR dialog. Currently there is no way to uninstall a language once it has been installed.
Following that, I should mention that we do not create the languages, we simply choose wether or not to make them available for download through our plugin. The creators of the Engine, ABBYY, handle the languages that are available and their creation. While they do sometimes offer new options, to my knowledge, they are no longer creating languages for their V13 releases (which we are using). This means that new languages will not likely be added until we are able to update to a newer release of their engine (which will be quite some time, as their new releases no longer support 32-bit OS and we are not yet ready to drop 32-bit support ourselves).
Kind regards,
I believe what dimitar meant was that you can disable the languages by unchecking them in the language dropdown, within the OCR dialog. Currently there is no way to uninstall a language once it has been installed.
Following that, I should mention that we do not create the languages, we simply choose wether or not to make them available for download through our plugin. The creators of the Engine, ABBYY, handle the languages that are available and their creation. While they do sometimes offer new options, to my knowledge, they are no longer creating languages for their V13 releases (which we are using). This means that new languages will not likely be added until we are able to update to a newer release of their engine (which will be quite some time, as their new releases no longer support 32-bit OS and we are not yet ready to drop 32-bit support ourselves).
Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
copy that.
so in playing around with the OCR languages, I am seeing the following behavior:
-I can only select languages via that are NOT installed, via the left hand checkbox (really select anywhere on the line, I think).
-The radio buttons are non-functional for me as an enduser - no language allows me to select the radio button toggle (perhaps this isn't a radio button, but only a 'status indicator'? (Only the installed languages have a green radio button)
is this all as expected?
also, are you confirming that moving the source OCR language files from the
\Program Files\Common Files\Tracker Software\Common\Dictionaries
and
C:\Program Files\Common Files\Tracker Software\Common\Lanuages
directories does not remove a language from OCR engine analysis?
so in playing around with the OCR languages, I am seeing the following behavior:
-I can only select languages via that are NOT installed, via the left hand checkbox (really select anywhere on the line, I think).
-The radio buttons are non-functional for me as an enduser - no language allows me to select the radio button toggle (perhaps this isn't a radio button, but only a 'status indicator'? (Only the installed languages have a green radio button)
is this all as expected?
also, are you confirming that moving the source OCR language files from the
\Program Files\Common Files\Tracker Software\Common\Dictionaries
and
C:\Program Files\Common Files\Tracker Software\Common\Lanuages
directories does not remove a language from OCR engine analysis?
-
- Site Admin
- Posts: 11577
- Joined: Wed Jan 03, 2018 6:52 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
Hello, makesdocs
Yes, that is expected, through the UI there is no way to uninstall languages after installation. as mentioned, earlier, Dimitar was not referring to the "add/update languages" window, but the ability ti "disable" an installed language, via the languages dropdown menu in the main OCR dialog.
As for manually removing the files, while it is indeed possible, it is not recommended as each language installs parts to various locations within the common files folder. If you desperately need to remove languages, I would suggest you uninstall our software, manually delete the "tracker software" folder from within common files entirely, and then reinstall the software, so that only the default languages are present. After that, carefully download only the languages that you need.
Kind regards,
Yes, that is expected, through the UI there is no way to uninstall languages after installation. as mentioned, earlier, Dimitar was not referring to the "add/update languages" window, but the ability ti "disable" an installed language, via the languages dropdown menu in the main OCR dialog.
As for manually removing the files, while it is indeed possible, it is not recommended as each language installs parts to various locations within the common files folder. If you desperately need to remove languages, I would suggest you uninstall our software, manually delete the "tracker software" folder from within common files entirely, and then reinstall the software, so that only the default languages are present. After that, carefully download only the languages that you need.
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
oh I see what you mean re: enable languages my bad.
It sounds like a re-install isn't necessary, if a lanuage that is not enabled isn't affecting OCR analysis function
we've gotten marginally acceptable quality for the final outputs post OCR, but it did take quite a while to do all the editing with comments to protect the images. it's tough because a lot of the fractions and other math that are inline to the text aren't OCR-ing properly, and I can't find a font that matches the kerning and typesetting...
now that we have a process, though, I think it's something we can live with. just going to take a month or two as we find time to effort it.
the searchable pdf is a big deal, as to my knowledge this ref. material has never been available in an indexed digital form...
It sounds like a re-install isn't necessary, if a lanuage that is not enabled isn't affecting OCR analysis function
we've gotten marginally acceptable quality for the final outputs post OCR, but it did take quite a while to do all the editing with comments to protect the images. it's tough because a lot of the fractions and other math that are inline to the text aren't OCR-ing properly, and I can't find a font that matches the kerning and typesetting...
now that we have a process, though, I think it's something we can live with. just going to take a month or two as we find time to effort it.
the searchable pdf is a big deal, as to my knowledge this ref. material has never been available in an indexed digital form...
-
- Site Admin
- Posts: 11577
- Joined: Wed Jan 03, 2018 6:52 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
Hello, makesdocs
Sadly there isn't much that can be done with regards to the formulae at the moment, so the comment block workaround is likely the best solution for now. Perhaps in a future update, when we move to a newer EOCR engine, we will see the ability to add a math/science formulae language to the OCR.
I should also note, I spoke with the Dev team and got approval for a request to have the editor offer uninstall options for OCR languages, instead of only install/update. While it may not be something we see soon, it should be coming along eventually.
RT#6353: FR: OCR add/update languages feature - Add uninstall option.
Kind regards,
Sadly there isn't much that can be done with regards to the formulae at the moment, so the comment block workaround is likely the best solution for now. Perhaps in a future update, when we move to a newer EOCR engine, we will see the ability to add a math/science formulae language to the OCR.
I should also note, I spoke with the Dev team and got approval for a request to have the editor offer uninstall options for OCR languages, instead of only install/update. While it may not be something we see soon, it should be coming along eventually.
RT#6353: FR: OCR add/update languages feature - Add uninstall option.
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
you all are very responsive. when I get a chance Im going to get the license purchased very mucy worth it in my estimation.
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
maybe an evolution of the software could allow for markup blocks to allow different libraries/dictionaries
example:
1) let document OCR have a chosen default language as set via OCR drop down (as currently chosen)
2) allow additions of markups for selected text, blocked images etc., to allow application of a different language/dictionary for those blocks, with libraries optimized for whatever (e.g., mathematical, chemical, scientific symbols instead of text) etc...
example:
1) let document OCR have a chosen default language as set via OCR drop down (as currently chosen)
2) allow additions of markups for selected text, blocked images etc., to allow application of a different language/dictionary for those blocks, with libraries optimized for whatever (e.g., mathematical, chemical, scientific symbols instead of text) etc...
TrackerSupp-Daniel wrote: ↑Thu Dec 29, 2022 8:33 pm Hello, makesdocs
Sadly there isn't much that can be done with regards to the formulae at the moment, so the comment block workaround is likely the best solution for now. Perhaps in a future update, when we move to a newer EOCR engine, we will see the ability to add a math/science formulae language to the OCR.
I should also note, I spoke with the Dev team and got approval for a request to have the editor offer uninstall options for OCR languages, instead of only install/update. While it may not be something we see soon, it should be coming along eventually.
RT#6353: FR: OCR add/update languages feature - Add uninstall option.
Kind regards,
-
- Site Admin
- Posts: 11577
- Joined: Wed Jan 03, 2018 6:52 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
Hello, makesdocs
While that all sounds great, there are two issues:
First being the "document specificity", the PDF specification does not allow us to store information like default OCR settings within the file, and trying to do so could result in file corruption if any other software opens the file and tries to modify that information, likewise, there should never be a reason to run OCR more than once on a single file, so even if this was possible, it wouldnt exactly be a useful feature, and thus likely wouldnt be implemented.
The second being that the OCR engine itself is what decides the fonts to use, based on the available system fonts, curated to avoid certain problematic options, and some specialized internal logic that the creators or the engine (ABBYY) have decided to use to determine the "closest" available font from that list. There is no way that we can directly control it, let alone specify what the prefer in specific areas/types of content. The Engine does not have functions in place for us to feed it the information selectively, nor facilities to give us the desired output at this time.
Perhaps in the future, when we are able to update to the latest version of ABBYY's engine, we can reach out to them asking for specialized features like this, but much like we do not update older builds of our software, they only offer basic support for troubleshooting, and some critical bugfixes for their older products, so new features that require changes on their end to implement are not on the table at this time.
Kind regards,
While that all sounds great, there are two issues:
First being the "document specificity", the PDF specification does not allow us to store information like default OCR settings within the file, and trying to do so could result in file corruption if any other software opens the file and tries to modify that information, likewise, there should never be a reason to run OCR more than once on a single file, so even if this was possible, it wouldnt exactly be a useful feature, and thus likely wouldnt be implemented.
The second being that the OCR engine itself is what decides the fonts to use, based on the available system fonts, curated to avoid certain problematic options, and some specialized internal logic that the creators or the engine (ABBYY) have decided to use to determine the "closest" available font from that list. There is no way that we can directly control it, let alone specify what the prefer in specific areas/types of content. The Engine does not have functions in place for us to feed it the information selectively, nor facilities to give us the desired output at this time.
Perhaps in the future, when we are able to update to the latest version of ABBYY's engine, we can reach out to them asking for specialized features like this, but much like we do not update older builds of our software, they only offer basic support for troubleshooting, and some critical bugfixes for their older products, so new features that require changes on their end to implement are not on the table at this time.
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
fair enough thanks for that detail. I was just thinking about how some other software we use works, and considering what might be intesting to see as a functionality in a product that could do OCR - it was just a suggestion box item anyway
-
- Site Admin
- Posts: 19913
- Joined: Mon Jan 12, 2009 8:07 am
Re: tuning OCR or outright editing mathematical formulae in OCR output
Hello makesdocs,
We appreciate you bringing that in - but as Daniel said - there's little control that we have over the current OCR engine.
What you can do is "zonal OCR" though - you can use the snapshot tool over the formula, and then OCR only that region (right click on the snapshot rectangle): And select to only use the "Mathematical formulas" 'language' for that OCR - so that the Engine does not try to use both e.g. English and Maths for that area.
And here is the result I got with e.g. the 9 and a quarter degrees at the bottom of your initial shot: Kind regards,
Stefan
We appreciate you bringing that in - but as Daniel said - there's little control that we have over the current OCR engine.
What you can do is "zonal OCR" though - you can use the snapshot tool over the formula, and then OCR only that region (right click on the snapshot rectangle): And select to only use the "Mathematical formulas" 'language' for that OCR - so that the Engine does not try to use both e.g. English and Maths for that area.
And here is the result I got with e.g. the 9 and a quarter degrees at the bottom of your initial shot: Kind regards,
Stefan
You do not have the required permissions to view the files attached to this post.
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
that's awesome I'll try that!
kinda what I was talking about, except my idea was to mark/block/prep an entire doc, not just a selection at the point of focus
very cool tho
kinda what I was talking about, except my idea was to mark/block/prep an entire doc, not just a selection at the point of focus
very cool tho
-
- Site Admin
- Posts: 11577
- Joined: Wed Jan 03, 2018 6:52 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
Hello, makesdocs
I hope that it helps!
Kind regards,
I hope that it helps!
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 1385
- Joined: Mon Nov 15, 2021 8:38 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
Is the PDF xce enhanced OCR actually able to recognise LaTex or other mathematical formats? With fractional lines and everything? Greek letters? Sin Cos and all of that?
My wishlist https://forum.pdf-xchange.com/viewtopic.php?p=187394#p187394
Disable SPACE page navigation, fix kb shortcut for highlighting advanced search tool search field, bookmarks with numbers, toolbar small icon size, AltGr/Ctrl+Alt keyboard issues
Disable SPACE page navigation, fix kb shortcut for highlighting advanced search tool search field, bookmarks with numbers, toolbar small icon size, AltGr/Ctrl+Alt keyboard issues
-
- Site Admin
- Posts: 11577
- Joined: Wed Jan 03, 2018 6:52 pm
Re: tuning OCR or outright editing mathematical formulae in OCR output
Hello, MedBooster
The OCr can recognize items that are valid "font" characters within an installed local font, provided that they are defined by one of the selected "languages" when running the OCR on the file. If, for example, you only had "english" selected, it would recognize the characters from the English dictionary, and attempt to prioritize arrangements that create real words in english.
Likewise, if you downloaded the "Greek", "C++", and "numbers" languages, and enabled them all, along with English, you would be able to capture a large amount of mathematical formulae. It is worth noting that the order you enable languages in will give different results. For example, If your language field looks like this: The OCR would place priority on text that fits into defined Greek words, and then any which do not fit that criteria, it would look to the following languages. Inverting this order would result in English words having priority, and after that other similarities. Due to the differeing character sets used between greek and english, this could result in some variance in the output.
Special characters like Sin and Cos, should be part of the "numbers" languages, but I cannot say with certainty how "deep" into special symbols it goes.
Kind regards,
The OCr can recognize items that are valid "font" characters within an installed local font, provided that they are defined by one of the selected "languages" when running the OCR on the file. If, for example, you only had "english" selected, it would recognize the characters from the English dictionary, and attempt to prioritize arrangements that create real words in english.
Likewise, if you downloaded the "Greek", "C++", and "numbers" languages, and enabled them all, along with English, you would be able to capture a large amount of mathematical formulae. It is worth noting that the order you enable languages in will give different results. For example, If your language field looks like this: The OCR would place priority on text that fits into defined Greek words, and then any which do not fit that criteria, it would look to the following languages. Inverting this order would result in English words having priority, and after that other similarities. Due to the differeing character sets used between greek and english, this could result in some variance in the output.
Special characters like Sin and Cos, should be part of the "numbers" languages, but I cannot say with certainty how "deep" into special symbols it goes.
Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com