Page 1 of 1

Provide a tool to extract to RTF or other structured format

Posted: Sat Sep 25, 2004 2:33 am
by adrianus
It would be very useful to have a tool to extract from PDF to a formatted document eg. RTF or the format used by Open Office or to HTML.

Extraction needs to retain the paragraph structure properly (e.g. current text extraction seems to put a CR at the end of each line, which WORD interprets as a new Para ...)

Would also be useful to have an option in text extraction to only put CR at end of para, so that WORD or other editing tools would be able to recognise paragraphs ...

Posted: Sat Sep 25, 2004 10:14 am
by John - Tracker Supp
Hi Adrianus,

thanks for the suggestions and we can confirm that we are working towards this - we have a new parser in progress now that will extract such info and we hope tables etc too.

This may take a little time yet - but it is our objective to provide such functionality.

best

Extracting tables to Excel my priority

Posted: Thu Jan 06, 2005 9:12 am
by guyghk
If you could come up with something that extracted to Excel, I think that would be very popular.

There are a few products that claim to do it - I've just tested three with a view to getting one. However they are either:

- very limited in functionality, e.g. verypdf.com, which just extracts to txt and is not much better at preserving text positioning than the existing "Extract text from pdf" within PDF Tools;

- useless at dealing with complicated tables, e.g. abbyy.com;

- pretty good (though not 100%) at extracting tables, but over-priced, e.g. investintech.com at $90.

If PDF Exchange & PDF Tools are anything to go by, I'd expect you to come up with a better product at a better price. I'd definitely pay for an upgrade to this.

Guy

Posted: Thu Jan 06, 2005 12:18 pm
by John - Tracker Supp
We are looking this spring at introducing a number of new end user PDF2.... extraction tools - thanks for your input - Word and Excel formats are top of the list.

Posted: Wed May 25, 2005 3:49 am
by guyghk
Hello,

Is there any update on the timing of this? Not meaning to push but is it still in the pipeline?

Thanks,

Guy

Posted: Tue Jun 14, 2005 11:11 am
by John - Tracker Supp
Yes, but still some weeks away.

PDF to Word - further thoughts

Posted: Thu Oct 05, 2006 4:00 pm
by adrianus
Based on experience with several PDF to Word tools, an observed gap in functionality is some smarts in dealing with paragraphs. All of the tools I've seen so far don't seem to be able to do anything clever about reconstructing paragraphs from the structure generated in the pdf file. Instead each line in the paragraph is converted to word as a separate paragraph.

Will you be trying to address this issue in your conversion tool eventually?

Posted: Thu Oct 05, 2006 4:10 pm
by John - Tracker Supp
Hi,

have you tried the PDF to RTF extraction function in PDF-Tools ?

https://www.pdf-xchange.com/home/pr ... /pdftools/

Does this do as required and if not - please do provide some sample PDF's and RTF files (zipped) and we would be pleased to look into any issues found.

thanks

PDF to Word - further thoughts

Posted: Thu Oct 05, 2006 10:20 pm
by adrianus
I'm currently using PDF-Xchange Pro - my driver is V3.6 Build 0102 (I couldn't find where the version of the tools was held). The PDF to Word/RTF tool in my version can't do the sort of thing I was proposing. I will try to generate an example in due course - may take a week or two, though due to a few time pressures!

Regards

Posted: Fri Oct 06, 2006 10:17 am
by John - Tracker Supp
Ok thanks - Look forward to it.

In the meantime - you may want to take a look at PDFTransformer from ABBYY - it will almost certainly offer you what you need and uses an entirely different process to achieve the conversion - very impressive.

They also have a deal with us to bundle PDF-XChange in with this - hence the reason we have no objection in promoting another publishers products on this occassion.

http://www.pdftransformer.com

HTH

Re: PDF to Word - further thoughts

Posted: Sat Oct 07, 2006 11:24 am
by adrianus
I have two sample coversions ready as a zip file but I couldn't submit it as an attachment - kept getting the message that the maximum size for all attachments was exceeded. The attachment itself is only 122Kb, so no idea what prevents submitting it. Perhaps you could give me an alternative way of submitting it to you ...

In one of the examples, just the first line of every paragraph ends up with a paragraph break in Word which doesn't really match how one would really like to see the conversion done ...

In the other example, every line ends up with a paragraph break.

As a human reader, it's easy to see what should really happen to get a perfect conversion, but no doubt the format of the pdf file makes it harder for the software to work this out ... Nevertheless, it would be nice if it had an option for being a bit more clever ... e.g. detecting that a set of single spaced lines followed by a wider spacing are really a paragraph, or alternatively that an indent line marks a paragraph start etc...

I'm currently using PDF-Xchange Pro - my PDF Tools are V3.6 Build 0102

Posted: Sat Oct 07, 2006 11:29 am
by John - Tracker Supp
Hi,

Could you please email the files to usrfiles@tracker-software.com with a link in the body of the email back to this post and we will take a look.

Please zip the files sent.

Thanks

Also will increase your Forum personal attachments limit as I suspect overall you have exceeded your per user limit for all posts combined

PDF Tools - further suggestions

Posted: Sat Oct 07, 2006 1:29 pm
by adrianus
[quote="Tracker Support"]Hi,

Could you please email the files to usrfiles@tracker-software.com with a link in the body of the email back to this post and we will take a look.

Please zip the files sent.

Thanks

Also will increase your Forum personal attachments limit as I suspect overall you have exceeded your per user limit for all posts combined[/quote]

File emailed as requested. All attempts to submit were of a zip file ...

Regards

Posted: Sun Oct 08, 2006 10:08 am
by John - Tracker Supp
Thanks Andy,

will analyse the files and advise when we have some progress.

Many thanks.