Extract Text from Viewer in Javascript

PDF-XChange Viewer SDK for Developer's
(ActiveX and Simple DLL Versions)

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

muralidhar
User
Posts: 46
Joined: Tue Nov 30, 2010 9:07 am

Extract Text from Viewer in Javascript

Post by muralidhar »

Hi,

i want to extract text from pdf document loaded in PDFXChange viewer and also i want to save file in server.

how to do that?

please help in this reagard.

thanks in advance.
Regards,
Muralidhar.
User avatar
John - Tracker Supp
Site Admin
Posts: 5223
Joined: Tue Jun 29, 2004 10:34 am

Re: Extract Text from Viewer in Javascript

Post by John - Tracker Supp »

Please do see the examples provided - there are examples for extracting a documents text - for example there are 2 in the VB folder for text operations - and others for non VB developers - once you have reviewed all, if you still have problems then please do come back - though some may require you to adapt the examples given to your own need or Development tool/language, as you are expected to do some work yourself to get what you need.

note also this post has been moved to the correct forum.

Hope That helps
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
muralidhar
User
Posts: 46
Joined: Tue Nov 30, 2010 9:07 am

Re: Extract Text from Viewer in Javascript

Post by muralidhar »

Hi,

thanks for your reply.

i have already gone through the samples you mentioned. those are in C#, vb etc.
but i want to do the same in javascript.

i am trying to do that as follows, but i am getting javascript error

var datain;
var dataout;
document.all.PDFView.DoVerb("", "GetAllText", datain, dataout, 0);
======this is the error description
Line: 69
Error: Could not complete the operation due to error 82130001.
======

please help me.

thanks in advance.
Regards,
Muralidhar.
User avatar
Vasyl - PDF-XChange
Site Admin
Posts: 2445
Joined: Thu Jun 30, 2005 4:11 pm

Re: Extract Text from Viewer in Javascript

Post by Vasyl - PDF-XChange »

Hi,
i have already gone through the samples you mentioned. those are in C#, vb etc.
but i want to do the same in javascript.
1. The right C# code:

Code: Select all

int docId = 0;
PDFView.GetActiveDocument(out docId);
if (docId > 0)
{
      // 1. extract all text to file or stream
      object dataIn = "C:\\TextFromPDF.txt"; // also, you can pass the stream object as destination
      object dataOut = null;
      PDFView.DoDocumentVerb(docId, "", "GetAllText", dataIn, out dataOut, 0);
      // 2. OR extract all text to simple string
      object dataIn = null;
      object dataOut = null;
      PDFView.DoDocumentVerb(docId, "", "GetAllText", dataIn, out dataOut, 0);
      // if succeeded - dataOut contains the all text from document
      string str = (string)dataOut;
};
2. You cannot call the DoVerb function from PDF Java-Script. By JS you may obtain the word list from each page separately.
The simple script for this:

Code: Select all

var text = "";
var numWords;
for (var p = 0; p < this.numPages; p++)
{
     numWords = this.getPageNumWords(p);
     for (var w = 0; w < numWords; w++) 
     {
         text += this.getPageNthWord(p, w, false);
     }
}
text;
- pass this script to RunJavaScript(...), it will return the string result.

Note: this method may be too slow and will return some different result than the normal "GetAllText" operation...

HTH
PDF-XChange Co. LTD (Project Developer)

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.