Size of OCRd text

PDF-XChange Editor SDK for Developers

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
charuvasudev
User
Posts: 16
Joined: Wed Feb 11, 2009 5:48 am

Size of OCRd text

Post by charuvasudev »

Hi,

I have used PDFXchange and tesseract for OCR in my project. And it works fine. My only concern is that the font size of the OCRd text which is added in the text layer of the pdf file is too small. When I do Ctrl+F and search for a key word, it highlights the text. The highlight is so small that it appears like a dot which is easily missed. Shouldn't the font size be automatcally calculated? It works as expected in your end user control. see the attached image for more clarity.

Here is the code-

Dim Op As PDFXEdit.IOperation = Inst1.CreateOp(nID)
Dim input As PDFXEdit.ICabNode = Op.Params.Root("Input")
Dim fsInst As PDFXEdit.IAFS_Inst = CType(Inst1.GetExtension("AFS"), PDFXEdit.IAFS_Inst)
Dim impPath As PDFXEdit.IAFS_Name = fsInst.DefaultFileSys.StringToName(OpenFileDialog1.FileName)
Dim stroutputpath As String = System.IO.Path.GetDirectoryName(OpenFileDialog1.FileName) & "\" & System.IO.Path.GetFileNameWithoutExtension(OpenFileDialog1.FileName) & DateTime.Now.ToString("MMddyyymmss") & ".pdf"
Dim fsInst1 As PDFXEdit.IAFS_Inst = CType(Inst1.GetExtension("AFS"), PDFXEdit.IAFS_Inst)
Dim outPath As PDFXEdit.IAFS_Name = fsInst1.DefaultFileSys.StringToName(stroutputpath)
Dim pxcInst As PDFXEdit.IPXC_Inst = CType(Inst1.GetExtension("PXC"), PDFXEdit.IPXC_Inst)
Dim resDoc As PDFXEdit.IPXC_Document = pxcInst.OpenDocumentFrom(impPath, Nothing)
input.v = resDoc
Dim options As PDFXEdit.ICabNode = Op.Params.Root("Options")
options("OutputType").v = 0
options("OCRNoTextPagesOnly").v = True
Try
Op.Do()
Catch ex As Exception
resDoc.Close()
Exit Sub
End Try
resDoc.WriteTo(outPath)
resDoc.Close()


Pls guide.
2025-12-25 12_17_25-.png
You do not have the required permissions to view the files attached to this post.
charuvasudev
User
Posts: 16
Joined: Wed Feb 11, 2009 5:48 am

Re: Size of OCRd text

Post by charuvasudev »

Hi,

Any update on my issue?
User avatar
Daniel - PDF-XChange
Site Admin
Posts: 12516
Joined: Wed Jan 03, 2018 6:52 pm

Re: Size of OCRd text

Post by Daniel - PDF-XChange »

Hello, charuvasudev

You posted this during the holidays, we have a bit of a backlog, but the Dev team is working through things. I cannot promise when they will be able to respond, but we will be back to you once they have taken a look.

Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
User avatar
Vasyl - PDF-XChange
Site Admin
Posts: 2476
Joined: Thu Jun 30, 2005 4:11 pm

Re: Size of OCRd text

Post by Vasyl - PDF-XChange »

Hi, charuvasudev.

Maybe the difference is because the Editor EU uses the newer "op.document.OCRPages2" operation internally, instead of the much older and now-obsolete "op.document.OCRPages", which you use. I recommend starting to use the new version with the "SkipPagesWithText" parameter, as equivalent of "OCRNoTextPagesOnly" from the older version.

HTH.
PDF-XChange Co. LTD (Project Developer)

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.