MultiPage input, single page output

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Stefan - PDF-XChange

cflynt
User
Posts: 5
Joined: Fri Aug 31, 2012 1:52 am

MultiPage input, single page output

Post by cflynt »

I'm using a slightly modified version of the Sample TEST_OCR/TEST_OCR.cpp file as a subroutine.

This is working OK on single-page files, but when I OCR a 4-page file the output is only one page.

Am I missing something in my calls?

int doOCR (char *input, wchar_t *languagePath, wchar_t *output) {
DWORD dwMaxLevel;
HRESULT hr;

// Set options for OCR
PXO_Options Options;

// New PXO_inputFields
PXO_InputFields InFields;
BSTR textout;

PXODocument Doc;
// Initialize the document.
// Replace NULL with your developer code and license key

OCR_Init(&Doc, 'SECRET', 'SECRET');

// Set the callback function
OCR_SetCallback(Doc,SampleCallback,(LPARAM)&dwMaxLevel);

// Load the specified PDF into the PXODocument input layer
hr = OCR_LoadA(Doc, input);

if (hr != 0) {
printf("Failed to load file. %s\n", input);
return -1;
}

Options.blacklist = NULL;
Options.whitelist = NULL;
Options.ImageFlags = OCR_Image_Autorotate; // check for minor rotation of images and correct it
Options.lang = PXO_English;
Options.raster_dpi = 300;
Options.RegionMode = OCR_Auto;
Options.DataPath = languagePath;
Options.accMode = 0; // this field is reserved and unused for now

hr = OCR_MakeSearchable(Doc, &Options, NULL);

SampleCallback(98,99,hr);

if ((hr != 0) && (hr != 0xc20a01f4)) {
printf("Make searchable failed. Error code: %xx\n", hr);
printf("TOERR: %xx\n", DS_x_TO_ERROR(hr));
}
else
{
// Save output layer to a PDF file
OCR_SaveW(Doc, output);
}
return (hr);
}
cflynt
User
Posts: 5
Joined: Fri Aug 31, 2012 1:52 am

Re: MultiPage input, single page output

Post by cflynt »

Being a bit less incoherent.
When the input file to the previous subroutine is a document with 4 pages (my only multi-page test so far), the
resulting output file is converted to text, but the new pdf document contains only the first page.
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: MultiPage input, single page output

Post by Walter-Tracker Supp »

I'm assuming the OCR_MakeSearchable() call returns no error?

Everything looks okay visually although I'll run a test with your code to make sure I'm not missing anything; in the meantime, maybe you could provide the PDF file, either as an attachment or by email to [email protected]? Does it do this with other PDFs or just the one?
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: MultiPage input, single page output

Post by Walter-Tracker Supp »

Just a quick update; I tested your code with a 7 page document and had no problem with the output. I'll await the other information I asked in my previous post. You may also want to check the return value of OCR_SaveW, i.e.:

Code: Select all

pseudocode: 

hr = OCR_SaveW(...)
if (( hr != 0) ...)
{
   printf("Save failed, error %xx\n", hr);
}
-Walter
cflynt
User
Posts: 5
Joined: Fri Aug 31, 2012 1:52 am

Re: MultiPage input, single page output

Post by cflynt »

The sample PDF file should be in your mailbox.
Thanks.
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: MultiPage input, single page output

Post by Walter-Tracker Supp »

Got it and responded!

I could not reproduce the issue with the included file. I was able to OCR all pages. Have you reproduced it with other files? What is the return code (HRESULT) from OCR_SaveW()?

Also note that you were using a text-based PDF for OCR. This works but will result in loss of information (Text -> Rasterized Image of the Page -> OCR Text). If you were just using it as a quick test of OCR that's fine (and you probably already know this), but for real work this isn't really the best workflow :)

-Walter