[Bug] Extra Spaces in Copied Text
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
-
lbdyck
- User
- Posts: 58
- Joined: Thu Feb 07, 2008 3:34 pm
[Bug] Extra Spaces in Copied Text
When I attempt to copy some text (code example) I am seeing extra spaces when I paste the info. See the attached zip file for the sample page from the pdf and a text file showing the results of pasting.
You do not have the required permissions to view the files attached to this post.
----------
Lionel B. Dyck <><
Lionel B. Dyck <><
-
quant
- User
- Posts: 151
- Joined: Fri Jan 18, 2008 2:48 pm
This is in fact happening in quite many pdf's. I had similar problems with another pdf software. The developer's reply was sth like ... sometimes there is no space between words (in the pdf file structure), so the program has to guess as to how many spaces should be there in the extracted text, and whether there should be any space at all ... not easy
-
Podhorny
- User
- Posts: 88
- Joined: Tue Oct 09, 2007 8:03 am
It looks correct, in PDF there always 2 spaces, try to copy also text above - it has only one space between words:
Code: Select all
You must update the PROFILE EXEC for any user ID that will be running
GOMMAIN. The default user ID is OPMGRM1. The PROFILE EXEC for these user
IDs should include the following statements:
/* Sample lines to include in OPMGRM1 PROFILE EXEC */
’CP SET RUN ON’
’ACCESS 194 D’-
Ivan - Tracker Software
- Site Admin
- Posts: 3603
- Joined: Thu Jul 08, 2004 10:36 pm
To be honest, into your PDF sample text contains two spaces between words -- try to use Text selection tool and you will see this.
PDF-XChange Co Ltd. (Project Director)
When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
-
quant
- User
- Posts: 151
- Joined: Fri Jan 18, 2008 2:48 pm
That is exactly the point. How do you know that there are two spaces? Visually it looks like, but that line could just be made to "fill the line" (as opposed to be left aligned), so there is still one space between the words, it's just a bit bigger.Ivan - Tracker Software wrote:To be honest, into your PDF sample text contains two spaces between words -- try to use Text selection tool and you will see this.
See the example I provide. The comparison of extracted text from Adobe and pdf-xchange:
... no comparison, because this editor removes double spaces, haha, this is funny. OK, extract this sentence:
"Recently, the characterization of q-optimal equivalent martingale mea-
sures in market models with jumps has been studied in several papers."
You will see double space extracted in pdf-xchange, but there are no double spaces in the original text, neither in Adobe, it's just that the line is filled.
You do not have the required permissions to view the files attached to this post.
-
Ivan - Tracker Software
- Site Admin
- Posts: 3603
- Joined: Thu Jul 08, 2004 10:36 pm
There are no spaces at all into the text you send. This text specified into the PDF into the following way:
please note numbers like -464, -462, etc. - they means distance between pieces of text. Viewer analyzes this and convert to spaces. Number of spaces depends of font's metric.
Code: Select all
[(Recen)27(tly)84(,)-495(the)-464(c)27(haracterization)-462(of)]TJ
/F2 10.91 Tf
164.08 0 TD
[(q)]TJ
/F7 10.91 Tf
5.26 0 TD
[(-optimal)-462(equiv)54(alen)28(t)-462(martingale)-464(mea-)]TJ
PDF-XChange Co Ltd. (Project Director)
When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
-
quant
- User
- Posts: 151
- Joined: Fri Jan 18, 2008 2:48 pm
OK, but you see yourself thatIvan - Tracker Software wrote:please note numbers like -464, -462, etc. - they means distance between pieces of text. Viewer analyzes this and convert to spaces. Number of spaces depends of font's metric.
"Viewer analyzes this and convert to spaces. Number of spaces depends of font's metric."
is probably not the best way to go about this. Clearly, the original author didn't put 2 spaces between words, it's just that the line was filled. I would think that "intelligent viewer/text extractor" would take this into account, and not merely apply the hardcoded formula
number of spaces = distance / (font metric)
-
John - Tracker Supp
- Site Admin
- Posts: 5225
- Joined: Tue Jun 29, 2004 10:34 am
Hi,
We are looking at adding an optional feature that will handle both ways - so as not create an issue if we handle the space duplication that will impact other situations - which it very easily could.
HTH
We are looking at adding an optional feature that will handle both ways - so as not create an issue if we handle the space duplication that will impact other situations - which it very easily could.
HTH
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com