Ligatures and copying text

The PDF-XChange Viewer for End Users
+++ FREE +++

Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange

Wamma
User
Posts: 69
Joined: Thu Feb 07, 2008 3:52 am

Ligatures and copying text

Post by Wamma »

Hi,

PDF XChange does not seem to handle ligatures correctly when copying the text from a pdf to a regular text file. For example, the attached PDF file has several ligatures, see the abstract section, the sentence (the ligature is the 'fl' in 'fluidly', see also high-fidelity): "We show that these
controllers produce motion that fluidly responds to several dimensions
of user control and environmental constraints in realtime."

When selecting this text and pasting it to a text editor, the 'fl' in 'fluidly' shows up as '?uidly'. Adobe acrobat correctly copies/pastes the 'fl'. Additionally, it would be nice if the line break hyphenations would be automatically removed (Acrobat does this as well).
You do not have the required permissions to view the files attached to this post.
User avatar
Bhikkhu Pesala
User
Posts: 1776
Joined: Tue May 29, 2007 9:29 am

Re: Ligatures and copying text

Post by Bhikkhu Pesala »

What is the text editor that you're using? All seems to be working OK here when pasting to Wordpad or Opera.

We show that these
controllers produce motion that fluidly responds to several dimen-
sions of user control and environmental constraints in realtime. Our
results indicate that very few basis functions are required to create
high-fidelity character controllers which permit complex user navi-
gation and obstacle-avoidance tasks.

I think it would be best if the soft hyphens were copied as soft hyphens, but it would be good to remove the line beaks. This is what happens if you use, Text Properties, copy to clipboard:

We show that thesecontrollers produce motion that fluidly responds to several dimen-sions of user control and environmental constraints in realtime. Ourresults indicate that very few basis functions are required to createhigh-fidelity character controllers which permit complex user navi-gation and obstacle-avoidance tasks.

Line breaks are not copied as spaces, and soft hyphens are copied as hyphens.
Windows 10 Home 64-bit • AMD Ryzen 5 3400G, 8 Gb
Review: http://www.softerviews.org/PDF-XChange.html
Wamma
User
Posts: 69
Joined: Thu Feb 07, 2008 3:52 am

Re: Ligatures and copying text

Post by Wamma »

Ah, I did not realize it worked properly in Word. I see now that switching notepad2 from ANSI to Unicode mode fixes the problem. Thanks for the tip!

The "Text properties->Copy to Clipboard" does need to replace the newlines with spaces however.


Edit: Actually, it doesn't work like I want it to. What is copied into Word/wordpad/opera is the actual, single UNICODE character 'fi' and not 'f' 'i'. Acrobat actually converts to individual 'f' and 'i'. Of course, I've no idea how difficult this might be to implement.