I'm using a slightly modified version of the Sample TEST_OCR/TEST_OCR.cpp file as a subroutine.
This is working OK on single-page files, but when I OCR a 4-page file the output is only one page.
Am I missing something in my calls?
int doOCR (char *input, wchar_t *languagePath, wchar_t *output) {
DWORD dwMaxLevel;
HRESULT hr;
// Set options for OCR
PXO_Options Options;
// New PXO_inputFields
PXO_InputFields InFields;
BSTR textout;
PXODocument Doc;
// Initialize the document.
// Replace NULL with your developer code and license key
OCR_Init(&Doc, 'SECRET', 'SECRET');
// Set the callback function
OCR_SetCallback(Doc,SampleCallback,(LPARAM)&dwMaxLevel);
// Load the specified PDF into the PXODocument input layer
hr = OCR_LoadA(Doc, input);
if (hr != 0) {
printf("Failed to load file. %s\n", input);
return -1;
}
Options.blacklist = NULL;
Options.whitelist = NULL;
Options.ImageFlags = OCR_Image_Autorotate; // check for minor rotation of images and correct it
Options.lang = PXO_English;
Options.raster_dpi = 300;
Options.RegionMode = OCR_Auto;
Options.DataPath = languagePath;
Options.accMode = 0; // this field is reserved and unused for now
hr = OCR_MakeSearchable(Doc, &Options, NULL);
SampleCallback(98,99,hr);
if ((hr != 0) && (hr != 0xc20a01f4)) {
printf("Make searchable failed. Error code: %xx\n", hr);
printf("TOERR: %xx\n", DS_x_TO_ERROR(hr));
}
else
{
// Save output layer to a PDF file
OCR_SaveW(Doc, output);
}
return (hr);
}
MultiPage input, single page output
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Stefan - PDF-XChange
-
cflynt
- User
- Posts: 5
- Joined: Fri Aug 31, 2012 1:52 am
Re: MultiPage input, single page output
Being a bit less incoherent.
When the input file to the previous subroutine is a document with 4 pages (my only multi-page test so far), the
resulting output file is converted to text, but the new pdf document contains only the first page.
When the input file to the previous subroutine is a document with 4 pages (my only multi-page test so far), the
resulting output file is converted to text, but the new pdf document contains only the first page.
-
Walter-Tracker Supp
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: MultiPage input, single page output
I'm assuming the OCR_MakeSearchable() call returns no error?
Everything looks okay visually although I'll run a test with your code to make sure I'm not missing anything; in the meantime, maybe you could provide the PDF file, either as an attachment or by email to [email protected]? Does it do this with other PDFs or just the one?
Everything looks okay visually although I'll run a test with your code to make sure I'm not missing anything; in the meantime, maybe you could provide the PDF file, either as an attachment or by email to [email protected]? Does it do this with other PDFs or just the one?
-
Walter-Tracker Supp
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: MultiPage input, single page output
Just a quick update; I tested your code with a 7 page document and had no problem with the output. I'll await the other information I asked in my previous post. You may also want to check the return value of OCR_SaveW, i.e.:
-Walter
Code: Select all
pseudocode:
hr = OCR_SaveW(...)
if (( hr != 0) ...)
{
printf("Save failed, error %xx\n", hr);
}
-
cflynt
- User
- Posts: 5
- Joined: Fri Aug 31, 2012 1:52 am
Re: MultiPage input, single page output
The sample PDF file should be in your mailbox.
Thanks.
Thanks.
-
Walter-Tracker Supp
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: MultiPage input, single page output
Got it and responded!
I could not reproduce the issue with the included file. I was able to OCR all pages. Have you reproduced it with other files? What is the return code (HRESULT) from OCR_SaveW()?
Also note that you were using a text-based PDF for OCR. This works but will result in loss of information (Text -> Rasterized Image of the Page -> OCR Text). If you were just using it as a quick test of OCR that's fine (and you probably already know this), but for real work this isn't really the best workflow
-Walter
I could not reproduce the issue with the included file. I was able to OCR all pages. Have you reproduced it with other files? What is the return code (HRESULT) from OCR_SaveW()?
Also note that you were using a text-based PDF for OCR. This works but will result in loss of information (Text -> Rasterized Image of the Page -> OCR Text). If you were just using it as a quick test of OCR that's fine (and you probably already know this), but for real work this isn't really the best workflow
-Walter