HighlightTextByFile

PDF-XChange Viewer SDK for Developer's
(ActiveX and Simple DLL Versions)

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

jeffp
User
Posts: 923
Joined: Wed Sep 30, 2009 6:53 pm

HighlightTextByFile

Post by jeffp »

It doesn't seem that HighlightTextByFile is working properly. Or at least it's not working like Adobe.

Below is a link to my site where you can see how Adobe interprets the Hits file. Feel free to grab both of these file for testing on your end.

http://www.lucion.com/files/HitsTest.pd ... tsFile.xml

When I try to incorportate the HitsFile into the ViewerAX via the HighlightTextByFile call, the highlighting is off and it produces objects that get saved out to the file if I do a save or saveas. This is bad. The hits shouldn't get saved and the modified property should not get set to 1 on account of this.

In the case of Adobe, if you do a SaveAs on the document in the link above, the PDF saves without the hit highlighting which is what you'd expect.

Below is the code I'm using.

Code: Select all

procedure TPDFViewerAX.HighlightTextByFile(ADocID: Integer; AFileName: String);
var
  ADataIn, ADataOut: OLEVariant;
  AModified: Boolean;
begin
  if not IsValidDocID(ADocID) and not FileExists(AFileName) then exit;

  AModified := Modified[ADocID];
  try
    ADataIn := AFileName;
    FControl.DoDocumentVerb(ADocID, '', 'HighlightTextByFile', ADataIn, ADataOut, PXCVA_NoUI);
  except end;
  Modified[ADocID] := AModified;
end;
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19883
Joined: Mon Jan 12, 2009 8:07 am

Re: HighlightTextByFile

Post by Stefan - PDF-XChange »

Hello Jeffp,

Passed this case to the devs for an investigation and advise, and we will post here as soon as we have any news for you!

Best,
Stefan
User avatar
Vasyl - PDF-XChange
Site Admin
Posts: 2445
Joined: Thu Jun 30, 2005 4:11 pm

Re: HighlightTextByFile

Post by Vasyl - PDF-XChange »

Hi, jeffp.
When I try to incorportate the HitsFile into the ViewerAX via the HighlightTextByFile call, the highlighting is off and it produces objects that get saved out to the file if I do a save or saveas. This is bad. The hits shouldn't get saved and the modified property should not get set to 1 on account of this.
In the case of Adobe, if you do a SaveAs on the document in the link above, the PDF saves without the hit highlighting which is what you'd expect.
The "HighlightTextByFile" operation adds real annotations to the document, so it in fact modifies the document.

As you say - adobe ignores the modification of the document in this case, but, I think, it is not a good idea, because the document was actually changed by this operation...
For your case you may implement your own feature "Save/Save As Without Highlighting":

Code: Select all

OnSaveSaveAs_WithoutHighlighting()
{
     pdfViewer.DoVerb("", "ExecuteCommand", "Undo", 0, 0); // undo the highlighting, sure, if the 'HighlightTextByFile' was the last operation
     ...
     Save/SaveAs
};
And:
1. During testing of our 'HighlightTextByFile' I detected one issue:
Our control does not work properly if you pass the URL to 'OpenDocument' with a "#xml=<path_to_highlight_xml_file>" suffix. This issue will be fixed in the next build.
2. As I see, your HitsFile.xml was created by Adobe, maybe. So, note for you: Adobe's pdftext-composer and our pdftext-composer are different (the pdftext-composing is not documented by Adobe) - and can give different results...

Best
regards.
PDF-XChange Co. LTD (Project Developer)

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
jeffp
User
Posts: 923
Joined: Wed Sep 30, 2009 6:53 pm

Re: HighlightTextByFile

Post by jeffp »

Ok. I can do the customized SaveAs that you indicate, but in past posts I've made, I got the impression that you liked to do things like Adobe as much as possible.

As such, I sent you the Adobe file link above to illustrate the differences. But in your app, I'm opening the pdf and xml files on my file system not the web.

1. The main difference is that Adobe doesn't appear to add actual highlight objects. It appears to be a highlighing of words using what would appear when you do a search. It also adds two extra toolbar buttons when it encounters the xml file so that the user can go to next/previous hit, which is important.

2. Making the xml file work as a parameter in your Open document would be nice, but not necessary for me since I can just call HightlightTextByFile right after I open it. But for people wanting this out of the box who don't use the API, it would be good.

3. The HitsFile.xml was created by the DtSearch engine. I suspect other search engines would create something similar using the Adobe specs. As such, the coordinates of this file are off by a few characters when run by HighlighTextByFile. Are you saying this is the way it's going to stay in your ViewerAX? It seems like you would want to make sure this kind of xml file (done according to Adobe specs) would get translated correctly in the VeiwerAX.
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19883
Joined: Mon Jan 12, 2009 8:07 am

Re: HighlightTextByFile

Post by Stefan - PDF-XChange »

Thanks for the folow up comments jeffp,

As per our devs request I have made a ticket in our internal system:
#1320: Issue with "...#xml=..." tag in open-url
So that we can all follow up on this case, and track it correctly, and we will post here when there are any further news on the case.

Best,
Stefan
User avatar
Vasyl - PDF-XChange
Site Admin
Posts: 2445
Joined: Thu Jun 30, 2005 4:11 pm

Re: HighlightTextByFile

Post by Vasyl - PDF-XChange »

As I known - all pdftext-composers (such as DTSearch, Adobe, and our) produces the different output text (it may include different count of spaces between words, include or exclude the 'end-paragraph' symbols, expand or collapse ligatures, etc). The text in pdf is not determined exactly, in common case it is mosaic of text-items (text-item - one block with some symbols) anywhere on the page and any 2D-matrix can be specified for each. The internal text-composer collects by some heuristic algorithms (are not documented by Adobe) these text-items to the plain text...
So, if you pass the highlight-file created by DTSearch to Adobes' SDK or to our SDK (and vice versa) - it may(will) give you the some incorrect result - because target-SDK's text-composer can build some different text before highlighting...
PDF-XChange Co. LTD (Project Developer)

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19883
Joined: Mon Jan 12, 2009 8:07 am

Re: HighlightTextByFile

Post by Stefan - PDF-XChange »

Hello jeffp,

I got a notification that this problem is now resolved, and that we are finalizing build 200 which should be ready around the middle of the month.

Best,
Stefan
jeffp
User
Posts: 923
Joined: Wed Sep 30, 2009 6:53 pm

Re: HighlightTextByFile

Post by jeffp »

Great. I'll give it a try when 200 is posted.
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19883
Joined: Mon Jan 12, 2009 8:07 am

Re: HighlightTextByFile

Post by Stefan - PDF-XChange »

:)