use library 162,i can convert word to pdf created by office print and i can extract the pdf file contents to text file by the function PXCp_ET_GetPageContentAsTextW,it shown proper completly in simply chinese ,i want to convert html file to pdf by PDF-XChange printer now,when print finished,i extract the contents from pdf file to text file,but it not shown proper in simply chinese why?
the source file
print html file to pdf ,can't extract chinese words to text
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
-
hhswg
- User
- Posts: 12
- Joined: Mon Apr 27, 2009 3:50 am
print html file to pdf ,can't extract chinese words to text
You do not have the required permissions to view the files attached to this post.
-
Lzcat - Tracker Supp
- Site Admin
- Posts: 677
- Joined: Thu Jun 28, 2007 8:42 am
Re: print html file to pdf ,can't extract chinese words to text
This issue has no relation to Viewer or even to PDF-XChange Pro 4.0.
PDF allows you to store glyph indices (or any other codes) instead of character codes, and if the font provides information how to map these codes to glyphs, the file will look good. However to extract text (unicode) in such cases there must be additional information how to map codes to unicode values, and this information is optional. By default PDF-XChange does not embed this information into pdf (to make files smaller), but you can change this - just check Embed Extended Font/Character info option on Fonts tab in printer driver preferences.
PDF allows you to store glyph indices (or any other codes) instead of character codes, and if the font provides information how to map these codes to glyphs, the file will look good. However to extract text (unicode) in such cases there must be additional information how to map codes to unicode values, and this information is optional. By default PDF-XChange does not embed this information into pdf (to make files smaller), but you can change this - just check Embed Extended Font/Character info option on Fonts tab in printer driver preferences.
Victor
Tracker Software
Project manager
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Tracker Software
Project manager
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
-
John - Tracker Supp
- Site Admin
- Posts: 5225
- Joined: Tue Jun 29, 2004 10:34 am
Re: print html file to pdf ,can't extract chinese words to text
Also there is an additonal option you may need to check - look at the printer preferences -> Document Info
and check > Place Additonal information into the Document
This may make a difference.
and check > Place Additonal information into the Document
This may make a difference.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com