Override Document Colors - Issue Scanned PDF

The PDF-XChange Viewer for End Users
+++ FREE +++

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

carlhans
User
Posts: 8
Joined: Sun Jan 01, 2012 1:04 pm

Override Document Colors - Issue Scanned PDF

Post by carlhans »

Hello, what a beautiful product is this PDF XChange Viewer. I am the owner of Acrobat X Pro, however, i mostly use PDF-XChange Viewer for reading PDFs.

Accessibility - Override Document Colors doesnt work properly with scanned PDFs documents. I have it OCRed with clear scan. I am getting results, that OCR text color is changing, while background NOT. Only reader which can do it is Foxit.

Any idea, why this is happening?

Many thanks for this product!!

Carl
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19930
Joined: Mon Jan 12, 2009 8:07 am

Re: Override Document Colors - Issue Scanned PDF

Post by Stefan - PDF-XChange »

Hello Carl,

if you still have the original scanned image as background - this image will still be visible, and we won't override the white colour in it.
If you can provide a samle file - we might run some tests on it, but I do not think we have any plans to support such a colour override of portions of images.

Best,
Stefan
carlhans
User
Posts: 8
Joined: Sun Jan 01, 2012 1:04 pm

Re: Override Document Colors - Issue Scanned PDF

Post by carlhans »

Hi Stefan,
Thanks for reply. I am not sure, if i was clear. So let me explain once again.

Situation: Scanned document with OCR (Clear Scan technology from Adobe). If you google change background in scanned pdf, you see tons of frustration on forums dedicated to PDF.

One example from this forum, by user Ricaz with same issue.
https://forum.pdf-xchange.com/ ... background

For example - "Use Predefined High-Contrast Color Scheme" - "Green text on black" changes text to green, however the background stays as it is - white, not black.

I can send you sample page, but i am pretty sure, it is same for every scanned document.

Will be grateful for some explanation of this issue to better understand it.

Regards,

Carl
User avatar
Paul - PDF-XChange
Site Admin
Posts: 7445
Joined: Wed Mar 25, 2009 10:37 pm

Re: Override Document Colors - Issue Scanned PDF

Post by Paul - PDF-XChange »

Hi carlhans,

why not try scanning the PDF as in image then running the Free OCR that is included on the PDF-Xchange Viewer? Take the Adobe OCR right out of the picture, if you'll excuse the pun.

regards
Best regards

Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
carlhans
User
Posts: 8
Joined: Sun Jan 01, 2012 1:04 pm

Re: Override Document Colors - Issue Scanned PDF

Post by carlhans »

Hi Paul,

I did try your method used PDF-XChange OCR, both ways. Same result or to be precise, in this case not even text is changing, everything stays the same.

Its strange, scan is probably the most used "scenario" with pdf, yet not even Adobe can find proper solutions.

Hope we will find some solution to this.

Carl
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Override Document Colors - Issue Scanned PDF

Post by Walter-Tracker Supp »

I think it might help to explain how scanned pages are represented in a PDF.

Think of a PDF document as something like a real-world bound paper photo album. Each page starts out blank, but you can glue pictures in place (images), and you can write notes (text), and paste in various other things. The page "background" in a PDF is like the blank page of a photo album.

When you scan a document, it is like taking pictures of the pages of the document. To make a PDF out of the scan, these pictures are placed into the PDF (like gluing photographs into a photo album). When you place these images they lay on top of the real background, so that the background you see on each scanned page is not the *real* PDF background (the photo album's background paper) but part of the scanned "picture" of the page.

So when you select to change the PDF background, it is changing the background colour *underneath* the image of the scanned page. And you cannot see this change.

To do what you are asking, we would need to provide you with the ability to change *images* in a PDF. This is not impossible but it is not a feature we currently provide.

However, we have a feature in the works that may help you in an upcoming version. Our OCR capability will be expanded to allow you to create text PDF documents - to use the photo album analogy, it is like removing the *picture* of a scanned page completely and replacing it with typed out notes on the page. In this case, you can change background colour to your liking because the original scanned picture is no longer part of the equation.

It may be that there will be some provision for editing images as well, though I will leave that up to other members of the design team responsible for that part of the product to chime in.

-Walter
carlhans
User
Posts: 8
Joined: Sun Jan 01, 2012 1:04 pm

Re: Override Document Colors - Issue Scanned PDF

Post by carlhans »

Hi Walter!
Huge THANK YOU! What a brilliant explanation. I think many other users will now understand better this issue with scanned PDF.

Problem with OCR is precision. Its fine with 1 or 2 page docs, still not suitable for book scans. Not even ABBYY. Thats why i like clearscan which doesnt make a text from scan, but provides sort of improved "bitmap" layer. (Thats probably why it reacts on changing text color).

Could you give us recommendation or hint. In which application to process scanned materials so we can work with text and backgrounds? Photoshop, GIMP or something else?

Again huge thanks Walter for explanation.

Regards,

Carl
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Override Document Colors - Issue Scanned PDF

Post by Walter-Tracker Supp »

carlhans wrote:Hi Walter!
Huge THANK YOU! What a brilliant explanation. I think many other users will now understand better this issue with scanned PDF.

Problem with OCR is precision. Its fine with 1 or 2 page docs, still not suitable for book scans. Not even ABBYY. Thats why i like clearscan which doesnt make a text from scan, but provides sort of improved "bitmap" layer. (Thats probably why it reacts on changing text color).

Could you give us recommendation or hint. In which application to process scanned materials so we can work with text and backgrounds? Photoshop, GIMP or something else?

Again huge thanks Walter for explanation.

Regards,

Carl
Clearscan converts the images into text (like our future releases will do), although they use a special font scheme. The reason the background adjustments work on these documents is that after OCR you are working with a normal text document (not images of the scanned pages). However like all OCR you will suffer from the occasional loss of the original words when mistakes are made (although their font scheme may end up hiding these mistakes which would then only show up if you cut & paste or search).

I am also putting a feature request into our internal wishlist, which will be to provide whole page colour inversion (like a negative) which is a feature that certain accessibility schemes use. For a black text on white background document, this would invert it to white text on black background. I'm not sure how useful this is but maybe you can weigh in; it is definitely the simplest way to allow image manipulations in the PDF without having to turn our product into a fully featured publishing & authoring suite ;)
carlhans
User
Posts: 8
Joined: Sun Jan 01, 2012 1:04 pm

Re: Override Document Colors - Issue Scanned PDF

Post by carlhans »

Walter! You are hero!

I really wish you succeed with your OCR solution. Cause the flexibility and quality of your PDF reader is huuuge!

Yep, if i may give a insight as a user - inverting colors gives similar results, meaning sharpness on the eyes, only vice versa. But every option is ok, some users will certainly appreciate it.

What kind of OCR do you plan? Clearscan "version" with cleaning the fonts? Or just classic OCR with sort of invisible layer?

Carl
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Override Document Colors - Issue Scanned PDF

Post by Walter-Tracker Supp »

carlhans wrote:Walter! You are hero!

I really wish you succeed with your OCR solution. Cause the flexibility and quality of your PDF reader is huuuge!

Yep, if i may give a insight as a user - inverting colors gives similar results, meaning sharpness on the eyes, only vice versa. But every option is ok, some users will certainly appreciate it.

What kind of OCR do you plan? Clearscan "version" with cleaning the fonts? Or just classic OCR with sort of invisible layer?

Carl
Clearscan creates its own special fonts on the fly to try to match the appearance of document fonts. This isn't in the short term plans, and in fact I'm not even sure what the potential patent issues might be here (ie, the technique may be protected - although we haven't checked). What we are creating is the ability to make a text-based PDF, replacing the image-based pages with text-based pages (cutting out recognized photos & images and placing them as separate, cropped images). By extension this will allow these documents to become editable (e.g. in the viewer with editing capabilities, or via export to a format like RTF).
swissball
User
Posts: 2
Joined: Wed Sep 26, 2012 12:29 pm

if one page is ocr-ed, what to do than...

Post by swissball »

Hello Walter

thank you for your explanations.
Walter-Tracker Supp wrote:I think it might help to explain how scanned pages are represented in a PDF.
....
However, we have a feature in the works that may help you in an upcoming version. Our OCR capability will be expanded to allow you to create text PDF documents - to use the photo album analogy, it is like removing the *picture* of a scanned page completely and replacing it with typed out notes on the page. In this case, you can change background colour to your liking because the original scanned picture is no longer part of the equation.
....

-Walter
I did deploy the function OCR for one page (and saved it), but the background and font color did not change (although "use custom color scheme" is activated).
What must I change, to get my prefered background and font color.

Thank you for advice and for your time
greetings
Miguel
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19930
Joined: Mon Jan 12, 2009 8:07 am

Re: Override Document Colors - Issue Scanned PDF

Post by Stefan - PDF-XChange »

Hello Miguel,

The feature Walter is describing above is still not available. Currently the OCR tool in our Viewer will place an invisible layer of text on top of the original image. So font changing is not possible now, and you can also not see the background as it's behind the white area of your original image.

Best,
Stefan
swissball
User
Posts: 2
Joined: Wed Sep 26, 2012 12:29 pm

Re: Override Document Colors - Issue Scanned PDF

Post by swissball »

Tracker Supp-Stefan wrote:Hello Miguel,

The feature Walter is describing above is still not available. Currently the OCR tool in our Viewer will place an invisible layer of text on top of the original image. So font changing is not possible now, and you can also not see the background as it's behind the white area of your original image.

Best,
Stefan
Hi Stefan,

txs for quick reply.

I discoverd a rough (but not that practical) way to change the colors. First to do OCR, than to kopy the text and create a new document, in which you paste it.
But a lot of steps and not that flawless.

greetings
swiss
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19930
Joined: Mon Jan 12, 2009 8:07 am

Re: Override Document Colors - Issue Scanned PDF

Post by Stefan - PDF-XChange »

Hi Miguel,

Yes this is a good workaround, but as you noticed - quite a few steps to do by hand and the result is not always perfect.

As Walter commented - we are considering implementing improvements to our OCR tool in the future so stay tuned!

Cheers,
Stefan