Size of OCRd text

PDF-XChange Editor SDK for Developers

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
charuvasudev
User
Posts: 16
Joined: Wed Feb 11, 2009 5:48 am

Size of OCRd text

Post by charuvasudev »

Hi,

I have used PDFXchange and tesseract for OCR in my project. And it works fine. My only concern is that the font size of the OCRd text which is added in the text layer of the pdf file is too small. When I do Ctrl+F and search for a key word, it highlights the text. The highlight is so small that it appears like a dot which is easily missed. Shouldn't the font size be automatcally calculated? It works as expected in your end user control. see the attached image for more clarity.

Here is the code-

Dim Op As PDFXEdit.IOperation = Inst1.CreateOp(nID)
Dim input As PDFXEdit.ICabNode = Op.Params.Root("Input")
Dim fsInst As PDFXEdit.IAFS_Inst = CType(Inst1.GetExtension("AFS"), PDFXEdit.IAFS_Inst)
Dim impPath As PDFXEdit.IAFS_Name = fsInst.DefaultFileSys.StringToName(OpenFileDialog1.FileName)
Dim stroutputpath As String = System.IO.Path.GetDirectoryName(OpenFileDialog1.FileName) & "\" & System.IO.Path.GetFileNameWithoutExtension(OpenFileDialog1.FileName) & DateTime.Now.ToString("MMddyyymmss") & ".pdf"
Dim fsInst1 As PDFXEdit.IAFS_Inst = CType(Inst1.GetExtension("AFS"), PDFXEdit.IAFS_Inst)
Dim outPath As PDFXEdit.IAFS_Name = fsInst1.DefaultFileSys.StringToName(stroutputpath)
Dim pxcInst As PDFXEdit.IPXC_Inst = CType(Inst1.GetExtension("PXC"), PDFXEdit.IPXC_Inst)
Dim resDoc As PDFXEdit.IPXC_Document = pxcInst.OpenDocumentFrom(impPath, Nothing)
input.v = resDoc
Dim options As PDFXEdit.ICabNode = Op.Params.Root("Options")
options("OutputType").v = 0
options("OCRNoTextPagesOnly").v = True
Try
Op.Do()
Catch ex As Exception
resDoc.Close()
Exit Sub
End Try
resDoc.WriteTo(outPath)
resDoc.Close()


Pls guide.
2025-12-25 12_17_25-.png
You do not have the required permissions to view the files attached to this post.
charuvasudev
User
Posts: 16
Joined: Wed Feb 11, 2009 5:48 am

Re: Size of OCRd text

Post by charuvasudev »

Hi,

Any update on my issue?
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 12609
Joined: Wed Jan 03, 2018 6:52 pm

Re: Size of OCRd text

Post by Daniel - PDF-XChange »

Hello, charuvasudev

You posted this during the holidays, we have a bit of a backlog, but the Dev team is working through things. I cannot promise when they will be able to respond, but we will be back to you once they have taken a look.

Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
User avatar
Vasyl - PDF-XChange
Site Admin
Posts: 2476
Joined: Thu Jun 30, 2005 4:11 pm

Re: Size of OCRd text

Post by Vasyl - PDF-XChange »

Hi, charuvasudev.

Maybe the difference is because the Editor EU uses the newer "op.document.OCRPages2" operation internally, instead of the much older and now-obsolete "op.document.OCRPages", which you use. I recommend starting to use the new version with the "SkipPagesWithText" parameter, as equivalent of "OCRNoTextPagesOnly" from the older version.

HTH.
PDF-XChange Co. LTD (Project Developer)

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Anushka
User
Posts: 13
Joined: Thu Nov 27, 2025 7:33 am

Re: Size of OCRd text

Post by Anushka »

Vasyl - PDF-XChange wrote: Thu Jan 08, 2026 10:35 pm Hi, charuvasudev.

Maybe the difference is because the Editor EU uses the newer "op.document.OCRPages2" operation internally, instead of the much older and now-obsolete "op.document.OCRPages", which you use. I recommend starting to use the new version with the "SkipPagesWithText" parameter, as equivalent of "OCRNoTextPagesOnly" from the older version.

HTH.
Hello Support Team, As per the above quoted suggestion i had updated the newer operation "op.document.OCRPages2" and parameter "SkipPagesWithText" it was working fine but now suddenly this operation gives me exception when I do Op.Do(). I tried moving back to the older operation "op.document.OCRPages" and parameter "OCRNoTextPagesOnly" the execption is not there but the OCR is not properly done as after the operation I don't get any text on searching and trying to select the text.

Below is the code i am trying for reference:

Code: Select all

Dim nID As Integer = pdfctrl.Inst.Str2ID("op.document.OCRPages", False)
'Dim nID As Integer = pdfctrl.Inst.Str2ID("op.document.OCRPages2", False)
Dim Op As PDFXEdit.IOperation = pdfctrl.Inst.CreateOp(nID)
Dim input As PDFXEdit.ICabNode = Op.Params.Root("Input")
Dim clbk As AuthCallback = New AuthCallback()

Dim doc As PDFXEdit.IPXV_Document = pdfctrl.Doc
input.v = doc
Dim options As PDFXEdit.ICabNode = Op.Params.Root("Options")
options("OutputType").v = 0
options("OCRNoTextPagesOnly").v = False
'options("SkipPagesWithText").v = False
Kindly help me identify what can be wrong here.
Thank You.
Anushka
User
Posts: 13
Joined: Thu Nov 27, 2025 7:33 am

Re: Size of OCRd text

Post by Anushka »

Anushka wrote: Tue Mar 31, 2026 12:04 pm
Vasyl - PDF-XChange wrote: Thu Jan 08, 2026 10:35 pm Hi, charuvasudev.

Maybe the difference is because the Editor EU uses the newer "op.document.OCRPages2" operation internally, instead of the much older and now-obsolete "op.document.OCRPages", which you use. I recommend starting to use the new version with the "SkipPagesWithText" parameter, as equivalent of "OCRNoTextPagesOnly" from the older version.

HTH.
Hello Support Team, As per the above quoted suggestion i had updated the newer operation "op.document.OCRPages2" and parameter "SkipPagesWithText" it was working fine but now suddenly this operation gives me exception when I do Op.Do(). I tried moving back to the older operation "op.document.OCRPages" and parameter "OCRNoTextPagesOnly" the execption is not there but the OCR is not properly done as after the operation I don't get any text on searching and trying to select the text.

Below is the code i am trying for reference:

Code: Select all

Dim nID As Integer = pdfctrl.Inst.Str2ID("op.document.OCRPages", False)
'Dim nID As Integer = pdfctrl.Inst.Str2ID("op.document.OCRPages2", False)
Dim Op As PDFXEdit.IOperation = pdfctrl.Inst.CreateOp(nID)
Dim input As PDFXEdit.ICabNode = Op.Params.Root("Input")
Dim clbk As AuthCallback = New AuthCallback()

Dim doc As PDFXEdit.IPXV_Document = pdfctrl.Doc
input.v = doc
Dim options As PDFXEdit.ICabNode = Op.Params.Root("Options")
options("OutputType").v = 0
options("OCRNoTextPagesOnly").v = False
'options("SkipPagesWithText").v = False
Kindly help me identify what can be wrong here.
Thank You.
Hello Support Team,
Related to the above query I wanted to add that the new method "op.document.OCRPages2" and parameter "SkipPagesWithText" gives me an exception stating "Error HRESULT E_FAIL has been returned from a call to a COM component." I tried this method even with the latest version of SDK (10.8.4.409) yet there is exception. Also where can I get the latest dlls of Plugins?

Thank You.
Anushka
User
Posts: 13
Joined: Thu Nov 27, 2025 7:33 am

Re: Size of OCRd text

Post by Anushka »

Anushka wrote: Wed Apr 01, 2026 6:58 am
Anushka wrote: Tue Mar 31, 2026 12:04 pm
Vasyl - PDF-XChange wrote: Thu Jan 08, 2026 10:35 pm Hi, charuvasudev.

Maybe the difference is because the Editor EU uses the newer "op.document.OCRPages2" operation internally, instead of the much older and now-obsolete "op.document.OCRPages", which you use. I recommend starting to use the new version with the "SkipPagesWithText" parameter, as equivalent of "OCRNoTextPagesOnly" from the older version.

HTH.
Hello Support Team, As per the above quoted suggestion i had updated the newer operation "op.document.OCRPages2" and parameter "SkipPagesWithText" it was working fine but now suddenly this operation gives me exception when I do Op.Do(). I tried moving back to the older operation "op.document.OCRPages" and parameter "OCRNoTextPagesOnly" the execption is not there but the OCR is not properly done as after the operation I don't get any text on searching and trying to select the text.

Below is the code i am trying for reference:

Code: Select all

Dim nID As Integer = pdfctrl.Inst.Str2ID("op.document.OCRPages", False)
'Dim nID As Integer = pdfctrl.Inst.Str2ID("op.document.OCRPages2", False)
Dim Op As PDFXEdit.IOperation = pdfctrl.Inst.CreateOp(nID)
Dim input As PDFXEdit.ICabNode = Op.Params.Root("Input")
Dim clbk As AuthCallback = New AuthCallback()

Dim doc As PDFXEdit.IPXV_Document = pdfctrl.Doc
input.v = doc
Dim options As PDFXEdit.ICabNode = Op.Params.Root("Options")
options("OutputType").v = 0
options("OCRNoTextPagesOnly").v = False
'options("SkipPagesWithText").v = False
Kindly help me identify what can be wrong here.
Thank You.
Hello Support Team,
Related to the above query I wanted to add that the new method "op.document.OCRPages2" and parameter "SkipPagesWithText" gives me an exception stating "Error HRESULT E_FAIL has been returned from a call to a COM component." I tried this method even with the latest version of SDK (10.8.4.409) yet there is exception. Also where can I get the latest dlls of Plugins?

Thank You.
Hello any update on this?
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 12609
Joined: Wed Jan 03, 2018 6:52 pm

Re: Size of OCRd text

Post by Daniel - PDF-XChange »

Hello, Anushka

I will have to ask for some patience, The Dev team was informed when your first post went up, but we have a very small team and they are quite busy at the moment. When they are able to reply, they will come back here.

Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com