Dear all,
Our company had purchased your PDF-XChange Editor SDK, and now I need your assistance.
When I did OCR using "op.document.OCRPages" operation, I encountered some confusing issues,as shown below:
1.If I execute the "op.document.OCRPages" operation a few times, and it would puts accordance count of layers on the doc(IPXC_Document).And then If I convert this doc(IPXC_Document) to word document(.doc/.docx), I would found the .doc/.docx with many duplicated layers.
so, I want to know what cause this behavior? and how can I avoid this?
2. If I convert an ocred pdf(image only pdf) to word (.doc/.docx), I would find the .doc/.docx has two layers with the image layer on the top and the text layer on the behind.The problem is that the recognized text of the text layer is hidden,which I expect to be shown.
so,I want to know how to show the recognized text not to hide them?
Thanks in advance.
Editor SDK OCR iussues
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.
When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.
When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
-
Stefan - PDF-XChange
- Site Admin
- Posts: 19919
- Joined: Mon Jan 12, 2009 8:07 am
Re: Editor SDK OCR iussues
Hello Kyo,
When you perform the OCR process - with the correct settings it will take the current file contents - and do OCR on it.
It will then add the new OCR text layer on top of anything existing - without removing anything - that is why you end up with multiple layers of invisible text - as each OCR operation adds it's own layer on top of all the content already in the file.
The OCR process can not recognize if any of the existing content is already an OCR layer - and just adds it's own on top.
The Conversion from PDF to Word is handled by Word APIs, so why do they put the image on top of the text is beyond me. Also - the text will normally be invisible, so you need to make it black inside the Editor first before conversion to word (and you can also remove the image if desired).
Regards,
Stefan
When you perform the OCR process - with the correct settings it will take the current file contents - and do OCR on it.
It will then add the new OCR text layer on top of anything existing - without removing anything - that is why you end up with multiple layers of invisible text - as each OCR operation adds it's own layer on top of all the content already in the file.
The OCR process can not recognize if any of the existing content is already an OCR layer - and just adds it's own on top.
The Conversion from PDF to Word is handled by Word APIs, so why do they put the image on top of the text is beyond me. Also - the text will normally be invisible, so you need to make it black inside the Editor first before conversion to word (and you can also remove the image if desired).
Regards,
Stefan
-
kyo
- User
- Posts: 130
- Joined: Mon Oct 31, 2016 11:58 am
Re: Editor SDK OCR iussues
Dear Stefan,
Thank you for your reply.
I have almost understood what you mean.
So,Do you have a workaround to ensure only one ocred layer is added on the top of the pdf?
Please give me some advice about how to handle this in my program.
Any advice will be greately apprieciated.
Thanks.
Thank you for your reply.
I have almost understood what you mean.
So,Do you have a workaround to ensure only one ocred layer is added on the top of the pdf?
Please give me some advice about how to handle this in my program.
Any advice will be greately apprieciated.
Thanks.
-
Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
Re: Editor SDK OCR iussues
Hi kyo,
If you're running the OCR operation multiple times, there isn't any way to ensure that only one text layer, in total, is added. As Stefan said, it is impossible to differentiate between standard text and OCR'd text. The only way to avoid this is to only OCR a document once, or delete the text layers before running another OCR operation.
Thanks,
If you're running the OCR operation multiple times, there isn't any way to ensure that only one text layer, in total, is added. As Stefan said, it is impossible to differentiate between standard text and OCR'd text. The only way to avoid this is to only OCR a document once, or delete the text layers before running another OCR operation.
Thanks,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com