convert to TXT : sticky or cut out lines?
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
-
jakesp
- User
- Posts: 7
- Joined: Thu Dec 24, 2009 9:15 pm
convert to TXT : sticky or cut out lines?
The original file contains cases made of a relatively stable header (an identifier) followed by a variable number of references (e.g. page numbers)
In the original pdf, a case may go over two lines (many references).
If I check "detect paragraph", the original cases are well restored (multiple-line cases on one line) BUT there are many "sticky" lines (2 or 3 of the original cases glued together)
If I uncheck "detect paragraph", there are no more sticky lines but the long original cases are cut in lines as they are laid out in the document.
I wish I could get the best of both options.
In the original pdf, a case may go over two lines (many references).
If I check "detect paragraph", the original cases are well restored (multiple-line cases on one line) BUT there are many "sticky" lines (2 or 3 of the original cases glued together)
If I uncheck "detect paragraph", there are no more sticky lines but the long original cases are cut in lines as they are laid out in the document.
I wish I could get the best of both options.
-
Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
Re: convert to TXT : sticky or cut out lines?
Hi jakesp,
Thanks for the post, however, I'm not entirely sure what is meant here. Could you send a sample file and screen-shot that illustrates the issue?
Cheers, let me know!
Thanks for the post, however, I'm not entirely sure what is meant here. Could you send a sample file and screen-shot that illustrates the issue?
Cheers, let me know!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
jakesp
- User
- Posts: 7
- Joined: Thu Dec 24, 2009 9:15 pm
Re: convert to TXT : sticky or cut out lines?
1 - Original file Stef_index.pdf
2 - Stef_index_txt.txt
“long” cases are correct ex.: Achon Ozanne-Anne around the 10th line
“sticky” lines are present ex Bélanger Marie Françoise Charlotte and Belou Jacques (before/after second page break)
3 - Stef_index_txt1.txt
« long » cases are “cut” into lines ex.: Achon Ozanne-Anne around the 10th line
no “sticky” lines
N.B. 2 and 3 are in the zip file
End of pages seen as blank lines (not very much appreciated)
2 - Stef_index_txt.txt
“long” cases are correct ex.: Achon Ozanne-Anne around the 10th line
“sticky” lines are present ex Bélanger Marie Françoise Charlotte and Belou Jacques (before/after second page break)
3 - Stef_index_txt1.txt
« long » cases are “cut” into lines ex.: Achon Ozanne-Anne around the 10th line
no “sticky” lines
N.B. 2 and 3 are in the zip file
End of pages seen as blank lines (not very much appreciated)
You do not have the required permissions to view the files attached to this post.
-
Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
Re: convert to TXT : sticky or cut out lines?
Hi jakesp,
Thanks for that, however, I'm afraid that I still don't understand the issue, as the documents appear to render as the text files suggest that they should. Could you please send some screen-shots that show what the issue is? Also, could you please provide clearer instructions to reproduce the issue?
Cheers, let me know!
Thanks for that, however, I'm afraid that I still don't understand the issue, as the documents appear to render as the text files suggest that they should. Could you please send some screen-shots that show what the issue is? Also, could you please provide clearer instructions to reproduce the issue?
Cheers, let me know!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
jakesp
- User
- Posts: 7
- Joined: Thu Dec 24, 2009 9:15 pm
Re: convert to TXT : sticky or cut out lines?
I forgot to zip the pdf file. Here it is
You do not have the required permissions to view the files attached to this post.
-
Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
Re: convert to TXT : sticky or cut out lines?
Hi Jake,
I'll actually need a screen-shot demonstrating the issue, as I'm not seeing any inconsistencies between the text file and the PDF itself, so could you send in the screen-shot that clearly demonstrates the issue that you're referring to.
Cheers,
I'll actually need a screen-shot demonstrating the issue, as I'm not seeing any inconsistencies between the text file and the PDF itself, so could you send in the screen-shot that clearly demonstrates the issue that you're referring to.
Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
jakesp
- User
- Posts: 7
- Joined: Thu Dec 24, 2009 9:15 pm
Re: convert to TXT : sticky or cut out lines?
I have documented in the attached Word file occurrences of the 3 "problems", "long cases", "sticky lines" and "page breaks" in the original pdf and in TXT and TXT1 (difference due to the paragraph end option). You could find many other examples by just comparing the 3 files.
You do not have the required permissions to view the files attached to this post.
-
Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
Re: convert to TXT : sticky or cut out lines?
Hi Jake,
Thanks for that - I understand now. We'll definitely consider the suggestion, when it comes time to implemnent new features, though I cannot give a definite promise that the feature will be implemnted.
Cheers, hope that helps!
Thanks for that - I understand now. We'll definitely consider the suggestion, when it comes time to implemnent new features, though I cannot give a definite promise that the feature will be implemnted.
Cheers, hope that helps!
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
jakesp
- User
- Posts: 7
- Joined: Thu Dec 24, 2009 9:15 pm
Re: convert to TXT : sticky or cut out lines?
I would love to know what is the new "feature" you are suggesting? If it is about the blank line added at page break, I would understand (a year ago I have already pointed at that "feature" and got .. the same kind of answer)
But about the "sticky" lines, I would think it is a bug excepted if you have found out that the original pdf was at fault, and you do not mention it. In that case, the ball would remain with Tracker Software because the pdf has been generated by "PDFXchange for geneatique"
But about the "sticky" lines, I would think it is a bug excepted if you have found out that the original pdf was at fault, and you do not mention it. In that case, the ball would remain with Tracker Software because the pdf has been generated by "PDFXchange for geneatique"
-
Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
Re: convert to TXT : sticky or cut out lines?
Hi jakesp,
Perhaps I still need further clarification on what you mean by "sticky lines", etc., as it seems to me that the driver is doing everything that it should be, from my understanding of the issue, hence why I believed that you were asking for a new feature or addition to an existing feature.
Please clarify what exactly is meant by "sticky or cut out lines" and provide images that clearly demonstrate what exactly is sticking that shouldn't be, or what is cut out and shouldn't be - I'm afraid that neither myself, nor any of my colleagues understand quite what you mean, at this point.
Thanks, I look forward to hearing back from you.
Perhaps I still need further clarification on what you mean by "sticky lines", etc., as it seems to me that the driver is doing everything that it should be, from my understanding of the issue, hence why I believed that you were asking for a new feature or addition to an existing feature.
Please clarify what exactly is meant by "sticky or cut out lines" and provide images that clearly demonstrate what exactly is sticking that shouldn't be, or what is cut out and shouldn't be - I'm afraid that neither myself, nor any of my colleagues understand quite what you mean, at this point.
Thanks, I look forward to hearing back from you.
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
-
jakesp
- User
- Posts: 7
- Joined: Thu Dec 24, 2009 9:15 pm
Re: convert to TXT : sticky or cut out lines?
Let me explain again what I have shown in PDFTOOLS_example1.docx (in the zip file sent sometimes ago).
"sticky lines" are two (sometimes 3) "pdf original logically distinct" lines glued together as a continuous line in the txt output. They exist if I check "Detect paragraph" in Text save type set-up (I refer to that type of output as TXT).
The advantage of that option is that it respects the "original logically continuous" lines (logical records extending over several pdf lines).
If I do not check "Detect paragraph", I do not find "sticky lines"; each pdf line gives the corresponding txt line, even with the "original logically continuous" lines.
I do not ask for a new feature; I just noticed the effects of the "detect paragraph" option. Either it is some kind of PDFTools malfunctioning (end of paragraph are occasionally forgotten or misinterpreted), or the pdf original file does not contain systematically the proper code (end of paragraph).
If you opt for the second possibility, and that you find that the pdf "misses" some end of paragraph marks, then you will have to look at the way this pdf was generated : using the "PDFChange 5.0 pour Généatique"
I add here in the documentation of this case the original pdf file
"sticky lines" are two (sometimes 3) "pdf original logically distinct" lines glued together as a continuous line in the txt output. They exist if I check "Detect paragraph" in Text save type set-up (I refer to that type of output as TXT).
The advantage of that option is that it respects the "original logically continuous" lines (logical records extending over several pdf lines).
If I do not check "Detect paragraph", I do not find "sticky lines"; each pdf line gives the corresponding txt line, even with the "original logically continuous" lines.
I do not ask for a new feature; I just noticed the effects of the "detect paragraph" option. Either it is some kind of PDFTools malfunctioning (end of paragraph are occasionally forgotten or misinterpreted), or the pdf original file does not contain systematically the proper code (end of paragraph).
If you opt for the second possibility, and that you find that the pdf "misses" some end of paragraph marks, then you will have to look at the way this pdf was generated : using the "PDFChange 5.0 pour Généatique"
I add here in the documentation of this case the original pdf file
You do not have the required permissions to view the files attached to this post.
-
jakesp
- User
- Posts: 7
- Joined: Thu Dec 24, 2009 9:15 pm
Re: convert to TXT : sticky or cut out lines?
Can I expect some kind of an answer or should I find another tree to bark at ...???
-
Stefan - PDF-XChange
- Site Admin
- Posts: 19930
- Joined: Mon Jan 12, 2009 8:07 am
Re: convert to TXT : sticky or cut out lines?
Hi jakesp,
Sorry we've missed replying to your post! I will ask Will to take a look at it and give a detailed reply when he comes to work a bit later today!
Regards,
Stefan
Sorry we've missed replying to your post! I will ask Will to take a look at it and give a detailed reply when he comes to work a bit later today!
Regards,
Stefan
-
Paul - PDF-XChange
- Site Admin
- Posts: 7445
- Joined: Wed Mar 25, 2009 10:37 pm
Re: convert to TXT : sticky or cut out lines?
Hi jakesp,
Will asked me to take a look at this to get some clarification. My apologies if we are missing something obvious here.
To summarize, if you could confirm, we need to first ascertain if the PDF itself has a problem, if so look at the driver and why, if not look at why PDF-Tools is inconsistent with those cases.
Does that accurately describe the issue?
regards
Will asked me to take a look at this to get some clarification. My apologies if we are missing something obvious here.
I think this is the crux of the issue. I will have one of my engineers take a look at stef_index.pdf to see if there are issues with the formatting at the end of a /line/paragraph.Either it is some kind of PDFTools malfunctioning (end of paragraph are occasionally forgotten or misinterpreted), or the pdf original file does not contain systematically the proper code (end of paragraph).
- indeed if there are issues with the PDF itself and it was created using PDF-XChange 5.0 we need to look into that. From what format wat the conversion done and would we be able to have access to the source document please?If you opt for the second possibility, and that you find that the pdf "misses" some end of paragraph marks, then you will have to look at the way this pdf was generated : using the "PDFChange 5.0 pour Généatique"
To summarize, if you could confirm, we need to first ascertain if the PDF itself has a problem, if so look at the driver and why, if not look at why PDF-Tools is inconsistent with those cases.
Does that accurately describe the issue?
regards
Best regards
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com