Here's an example of a source book scan (an older manufacturing reference book), side by side with the OCR'd version.
The OCR conversion deskewed fairly well and did a pretty good job for character recognition (especially at the margins where there is the challenge with the bound book on a flatbed scanner)
It didn't recognize the sentence in bold type opening this paragraph. It has pickedup most of this, but not all of it in our 20 or so pages scanned. This book is written with a lot of hierarchical formatting like this, and losing it is not ideal.
is there a way to tune the OCR to capture more of this, or edit the font for selected text in the document if not?
sorry if this is a newbie question, I tried searching for posts on this but didn't find anything that seemed to relate to the issue. we just downloaded the demo, and are trying to get through evaluation (we are pleased so far, but do have questions!)
OCR isn't consistently pickup up hierarchical formatting consistently
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
OCR isn't consistently pickup up hierarchical formatting consistently
You do not have the required permissions to view the files attached to this post.
-
- Site Admin
- Posts: 19913
- Joined: Mon Jan 12, 2009 8:07 am
Re: OCR isn't consistently pickup up hierarchical formatting consistently
Hello makesdocs,
It is up to the OCR engine (and our Enhanced OCR uses ABBYY's Fine Reader engine) to recognize and put the font weight in the recognized text. Apparently with your sample our OCR does not recognize the bold there - and puts all the text with standard weight. There isn't really a way to tune the recognition engine, or train it on your or our end, and as such the option I see here is for you to manually make that text bold after the OCR has processed your initial file.
Here's how you can edit text inside a PDF file with the Editor:
https://www.pdf-xchange.com/knowle ... -documents
Kind regards,
Stefan
It is up to the OCR engine (and our Enhanced OCR uses ABBYY's Fine Reader engine) to recognize and put the font weight in the recognized text. Apparently with your sample our OCR does not recognize the bold there - and puts all the text with standard weight. There isn't really a way to tune the recognition engine, or train it on your or our end, and as such the option I see here is for you to manually make that text bold after the OCR has processed your initial file.
Here's how you can edit text inside a PDF file with the Editor:
https://www.pdf-xchange.com/knowle ... -documents
Kind regards,
Stefan
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: OCR isn't consistently pickup up hierarchical formatting consistently
copy that
can pdf-xchange handle formulae built in say, MS-Word, or do I need to edit using a graphic and insert?
can pdf-xchange handle formulae built in say, MS-Word, or do I need to edit using a graphic and insert?
-
- Site Admin
- Posts: 7388
- Joined: Wed Mar 25, 2009 10:37 pm
Re: OCR isn't consistently pickup up hierarchical formatting consistently
Hi makesdocs,
I'm not sure I completely understand what you mean, if you want to send a sample page with the content you are referring to we can investigate if there are issues.
regards
I'm not sure I completely understand what you mean, if you want to send a sample page with the content you are referring to we can investigate if there are issues.
regards
Best regards
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: OCR isn't consistently pickup up hierarchical formatting consistently
Hey Paul - I was thinking, perhaps I can re-create the formula using the MS-Word equation editor (which is I think formatted text, vs. an image), or anthoer in a math editor
and then insert that into the PDF file using text editing functions, IFF PDF-Xchange supported that
otherwise I can create the formula, and insert as an image, but that's a ton of work lol.
and then insert that into the PDF file using text editing functions, IFF PDF-Xchange supported that
otherwise I can create the formula, and insert as an image, but that's a ton of work lol.
-
- Site Admin
- Posts: 11577
- Joined: Wed Jan 03, 2018 6:52 pm
Re: OCR isn't consistently pickup up hierarchical formatting consistently
Hello, makesdocs
If you mean something like mathematical symbol formula, as in an appearance of the text, than yes, we should be able to handle it, our software has access to the same font libraries that are available to MS Word. That being said, you will likely want to use the image approach anyway, as if someone without access to that font open the file, they may be unable to see the text properly.
I should also note that taking a quick screenshot anywhere within windows is easy these days, simply press Win+Shift+S to take a snip of a section of the screen, then you only need to use Ctrl+V to paste that image into our Editor, no need to use the insert images function specifically.
Kind regards,
If you mean something like mathematical symbol formula, as in an appearance of the text, than yes, we should be able to handle it, our software has access to the same font libraries that are available to MS Word. That being said, you will likely want to use the image approach anyway, as if someone without access to that font open the file, they may be unable to see the text properly.
I should also note that taking a quick screenshot anywhere within windows is easy these days, simply press Win+Shift+S to take a snip of a section of the screen, then you only need to use Ctrl+V to paste that image into our Editor, no need to use the insert images function specifically.
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 32
- Joined: Sun Dec 25, 2022 7:28 pm
Re: OCR isn't consistently pickup up hierarchical formatting consistently
copy yeah Im figuring out how to work with it all
thanks so much for all your help the forum has been very helpful and responsive
thanks so much for all your help the forum has been very helpful and responsive
-
- Site Admin
- Posts: 11577
- Joined: Wed Jan 03, 2018 6:52 pm
OCR isn't consistently pickup up hierarchical formatting consistently

Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com