I have a number of documents that were initially generated from images of manuscripts or other historical documents to which I have added a comment box near the top of the page (as a kind of post-it). All of my documents are also tagged with metadata; which is makes it easy to find a given item when needed.
I've recently noticed that when I run a Windows Search for documents matching specific criteria, the text of the comments box sometimes appears in the hit list and sometimes it does not. It also appears that when it does appear, the actual text is also being indexed and therefore is searchable. I'm mystified as to what the difference is between those that do and those that do not display this data. What's the secret?
Thanks.
Athena
What determines whether or not a comment will be indexed by
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
-
Athena
- User
- Posts: 17
- Joined: Fri Mar 27, 2009 6:44 pm
What determines whether or not a comment will be indexed by
Last edited by Athena on Mon Mar 12, 2012 9:52 pm, edited 1 time in total.
-
Paul - PDF-XChange
- Site Admin
- Posts: 7445
- Joined: Wed Mar 25, 2009 10:37 pm
Re: What determines whether or not a comment will be indexed
Hi Athena,
the original documents from images - have they had OCR run on them? That would account for the text from the body showing up in your searches. Coul it be as simple as some of your documents have not had OCR or otherwise have the text data in them to find?
Or are you meaning specifically the meta data is not consistent in the search results?
the original documents from images - have they had OCR run on them? That would account for the text from the body showing up in your searches. Coul it be as simple as some of your documents have not had OCR or otherwise have the text data in them to find?
Or are you meaning specifically the meta data is not consistent in the search results?
Best regards
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
-
Athena
- User
- Posts: 17
- Joined: Fri Mar 27, 2009 6:44 pm
Re: What determines whether or not a comment will be indexed
None of these documents have been OCRed --at least not to my knowledge. And it's not content from the document that is (sometimes) being read by Windows, it's the text in comment boxes that I've added with PDFXchange.
I don't have any problem with the metadata in any of the documents. I thought I did at first because the search results looked "blank" for some but when I looked at the details pane, I realized that the subject and keywords were coming through fine for both categories of documents; it was just that some also included the annotations.
I just noticed this a couple of days ago when I added some freshly annotated documents to a folder then had to search for one. I was surprised to see the annotations in the search results...and then realized that it wasn't happening consistently.
Am I allowed to attach a screenshot here or perhaps send it to you?
Update: I just noticed that in one case, some sory of attempt at OCR appears to have taken place. I highlighted a sentence (Now that I think about it, that wouln't have been possible without it being converted to text) and gibberish is displayed in the hit list.
Athena
I don't have any problem with the metadata in any of the documents. I thought I did at first because the search results looked "blank" for some but when I looked at the details pane, I realized that the subject and keywords were coming through fine for both categories of documents; it was just that some also included the annotations.
I just noticed this a couple of days ago when I added some freshly annotated documents to a folder then had to search for one. I was surprised to see the annotations in the search results...and then realized that it wasn't happening consistently.
Am I allowed to attach a screenshot here or perhaps send it to you?
Update: I just noticed that in one case, some sory of attempt at OCR appears to have taken place. I highlighted a sentence (Now that I think about it, that wouln't have been possible without it being converted to text) and gibberish is displayed in the hit list.
Athena
Last edited by Athena on Mon Mar 12, 2012 10:12 pm, edited 1 time in total.
-
Paul - PDF-XChange
- Site Admin
- Posts: 7445
- Joined: Wed Mar 25, 2009 10:37 pm
Re: What determines whether or not a comment will be indexed
Hi av,
sure you can send screen shots - just put them in a zip archive and attach them to the post.

sure you can send screen shots - just put them in a zip archive and attach them to the post.
Best regards
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
Paul O'Rorke
PDF-XChange Support
http://www.pdf-xchange.com
-
Athena
- User
- Posts: 17
- Joined: Fri Mar 27, 2009 6:44 pm
Re: What determines whether or not a comment will be indexed
Okay, here you go.
I've attached two files that appear to be identical -- US Census schedules each with a comment attached. As shown in the screen shot, Windows search displays the content of one text box but not the other. How are these two documents different?
Thanks
Athena
I've attached two files that appear to be identical -- US Census schedules each with a comment attached. As shown in the screen shot, Windows search displays the content of one text box but not the other. How are these two documents different?
Thanks
Athena
You do not have the required permissions to view the files attached to this post.
-
Ivan - Tracker Software
- Site Admin
- Posts: 3603
- Joined: Thu Jul 08, 2004 10:36 pm
Re: What determines whether or not a comment will be indexed
If you press right mouse button on the file "B Hudson - 1920.pdf", select "Properties..." from the menu and switch to "Previous Versions" page of properties dialog for this file, do you see any records about file's previous versions?
PDF-XChange Co Ltd. (Project Director)
When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.