Textextraction for attachments?

PDF-XChange Viewer SDK for Developer's
(ActiveX and Simple DLL Versions)

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

cew
User
Posts: 213
Joined: Tue Feb 01, 2011 8:14 am

Textextraction for attachments?

Post by cew »

How can I extract text from attachments?

Best,
cew
User avatar
Stefan - PDF-XChange
Site Admin
Posts: 19913
Joined: Mon Jan 12, 2009 8:07 am

Re: Textextraction for attachments?

Post by Stefan - PDF-XChange »

Hi cew,

Do you have in mind pdf attachments like e.g. in a portfolio?

It's not possible directly, so you will need to open the attachment first - and then perform the text extraction in the separately loaded file.

Best,
Stefan
cew
User
Posts: 213
Joined: Tue Feb 01, 2011 8:14 am

Re: Textextraction for attachments?

Post by cew »

Tracker Supp-Stefan wrote: Do you have in mind pdf attachments like e.g. in a portfolio?
How did you know that?
Ok, you just read my post about portfolios :)
Tracker Supp-Stefan wrote: It's not possible directly, so you will need to open the attachment first - and then perform the text extraction in the separately loaded file.
Do you have a code snippet on how to do that?

Best,
cew
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: Textextraction for attachments?

Post by Walter-Tracker Supp »

To extract text you will need to use the PDF Tools SDK or ActiveX Viewer.

See section "Text Extraction" in the PDF Tools SDK manual.

In the ActiveX viewer, use the methods "GetAllText" and "GetAllSelectedText". See Section 3.7 of the ActiveX manual titled "How to Extract Text from Document?"

e.g.

Code: Select all

// Gets all text to string variable (DataOut):
DoVerb("Documents[0]", "GetAllText", NULL, DataOut, 0);
// Gets all text to specified file:
DoVerb("Documents[0]", "GetAllText", "C:\PdfText.txt", NULL, 0);
// Gets all text to stream object which contained in DataIn:
DoVerb("Documents[0]", "GetAllText", DataIn, NULL, 0);