Page 1 of 1

OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Posted: Mon Feb 15, 2021 1:03 pm
by baumunk
code:

Code: Select all

PDFPrinter.SetAsDefaultPrinter();
            PDFPrinter.Option["Save.ShowSaveDialog"] = "False";
            PDFPrinter.Option["Save.File"] = pdfaFile;
            PDFPrinter.Option["Saver.ShowProgress"] = "False";
            PDFPrinter.Option["General.PageLayout"] = "ShowNone";
            PDFPrinter.Option["General.HideUI"] = "True";
            PDFPrinter.Option["General.FullScreenMode"] = "ShowNone";
            PDFPrinter.Option["General.Specification"] = "-1"; 
            PDFPrinter.Option["Save.RunApp"] = "False";
            PDFPrinter.Option["Save.WhenExists"] = "Overwrite";
            PDFPrinter.SetRegInfo(dec_key);
            var printJob = new System.Diagnostics.Process
            {
                StartInfo = new ProcessStartInfo(pdfAppName)
                {
                    FileName = pdfFile,
                    Verb = "print",
                    WindowStyle = System.Diagnostics.ProcessWindowStyle.Hidden,
                    CreateNoWindow = true
                }
            };
            printJob.Start();[attachment=0]sample_pages_ocr_PDFA-1b.pdf[/attachment]
Files:

sample_pages_ocr.pdf after OCR

sample_pages_ocr_PDFA-1b.pdf After PDF/A (OCR is gone).

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Posted: Tue Feb 16, 2021 5:18 pm
by Paul - PDF-XChange
HI baumunk,

I am not sure how this applies to the SDK, but when I take your OCR's PDF and save it as PDF/A-1b there is an option to "Rasterize unembedded fonts" if I turn that off I get a PDF/A-1b where the text can be selected.
image.png
sample_pages_ocr_PDF-1b-Paul.pdf
Are you able to do a similar thing via your code? If not let me know and I will ask one of the devs to take a look at this.

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Posted: Wed Feb 17, 2021 7:17 am
by baumunk
Hello Paul,

I have not found anything about SDK:
https://help.pdf-xchange.com/pdfxdapi9sdk/
I have only These options (as a mask):
Options.JPG
Please ask the developers how I can achieve this.
We must have this.

With kind regards
Ernest Baumunk

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Posted: Wed Feb 17, 2021 9:13 pm
by Paul - PDF-XChange
I spoke to one of the dev team about this.

The reason your result does not include the selectable text is that your original document has invisible text from the OCR. The Editor can select it and save as PDF/A-1b and retain the selectability of the text. Printing this invisible text results in nothing printed for that text which is why it cannot be selected.

The long and short of it is that reprinting is the root of the issue. You should use the Editor and/or Editor SDK to convert to PDF/A-1b not the printer.

This is a failing with any printer, not just ours. You already have an OCR'd PDF. Why reprint that and loose data? Better to just convert the PDF to PDF/A without using the printer. If you have large numbers of PDFs that yo need to convert to PDF/A I suggest using PDF-Tools to batch the process.
image.png
Both the Editor and PDF-Tools can do this for you without resorting to reprinting, so the editor SDK.

I hope that helps.
sample_pages_ocr_Editor.pdf

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Posted: Thu Feb 18, 2021 7:02 am
by baumunk
Hello O'Rorke

Do you think:
PDF-XChange Editor SDK
https://www.pdf-xchange.com/product/pdf-xchange-editor-sdk

or
PDF-XChange Editor Simple SDK
https://www.pdf-xchange.com/product/pdf-xchange-editor-simple-sdk

Best regards
Ernest Baumunk

Re: OCR detection gone after it was sent to XChange PDF Driver zu PDF/A-1B

Posted: Thu Feb 18, 2021 9:24 am
by Sasha - PDF-XChange
Hello baumunk,

PDF-XChange Editor SDK is the one that you should use.

Cheers,
Alex