I have been using a couple of programs, UltraSearch and X1, to search the text contents of documents, including many pdfs created in PDF-Xchange Editor.
I recently created a series of documents in PDF-XCE by importing a number of jpg images, producing pdfs that contain only images, something I've done countless times in the past.
After I added text box comments to some of the pages in one of these docs and saved it, I expected that the text in the text boxes would be searchable (by external file search programs) as regular text, since such text box text has always searchable and findable before by the two search programs I use, X1 Search and UltraSearch.
But for some reason this text is not being found in what appear to be otherwise normal pdfs. To be clear, the text I'm searching for is not part of the image — it is live text in live text box objects created on the top layer of the page. This text has always been searchable from outside the document by search programs looking at the closed files.
So I'm wondering is there some condition that would inhibit text from being seen by search programs that are otherwise able to find text in similar documents?
I've previously discovered that bookmark text in pdfxce is not searchable by these search programs, but text box text has always been visible to these local search engines.
Hoping for some insight.
Thank you!
Searchable text in docs created in PDF-XCE
Moderators: PDF-XChange Support, Daniel - PDF-XChange, Chris - PDF-XChange, Sean - PDF-XChange, Paul - PDF-XChange, Vasyl - PDF-XChange, Ivan - Tracker Software, Stefan - PDF-XChange
-
- User
- Posts: 93
- Joined: Sat Jan 15, 2022 6:48 pm
-
- User
- Posts: 93
- Joined: Sat Jan 15, 2022 6:48 pm
Re: Searchable text in docs created in PDF-XCE
I've attached an example of a file that shows the issue.
It contains 20 pages, each a full image.
The first page contains a text box, whose content would normally be searchable in a search program capable of searching for text content inside a pdf (e.g., UltraSearch, X1 Search.)
I find that the text in the text box is not being seen by the two above search programs.
But if I delete the last 10 pages, then save, the text becomes searchable to these programs.
I've made an animated gif screen cap showing the same file being alternately visible and invisible to X1 Search (which monitors the file system and indexes content in realtime), depending on the presence of certain pages.
Unable to embed the gif. But here's a link to it. [url]https://i.postimg.cc/HsdNkZp0/pdf-search.gif[/url]
[img]https://i.postimg.cc/HsdNkZp0/pdf-search.gif[/img]
Very strange!
Any ideas?
It contains 20 pages, each a full image.
The first page contains a text box, whose content would normally be searchable in a search program capable of searching for text content inside a pdf (e.g., UltraSearch, X1 Search.)
I find that the text in the text box is not being seen by the two above search programs.
But if I delete the last 10 pages, then save, the text becomes searchable to these programs.
I've made an animated gif screen cap showing the same file being alternately visible and invisible to X1 Search (which monitors the file system and indexes content in realtime), depending on the presence of certain pages.
Unable to embed the gif. But here's a link to it. [url]https://i.postimg.cc/HsdNkZp0/pdf-search.gif[/url]
[img]https://i.postimg.cc/HsdNkZp0/pdf-search.gif[/img]
Very strange!
Any ideas?
You do not have the required permissions to view the files attached to this post.
-
- Site Admin
- Posts: 11263
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Searchable text in docs created in PDF-XCE
Hello, XChangeGirl
I cant say much for why these files would be handles differently, aside from the images present being fairly heavy, which might impact indexing processes. I don't believe anything we can do should have an impact here... have you reached out to the developers of the search tools which are experiencing this issue, to ask them if there are any logs or such you can take from their end, which might indicate what is happening?
Kind regards,
I cant say much for why these files would be handles differently, aside from the images present being fairly heavy, which might impact indexing processes. I don't believe anything we can do should have an impact here... have you reached out to the developers of the search tools which are experiencing this issue, to ask them if there are any logs or such you can take from their end, which might indicate what is happening?
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 93
- Joined: Sat Jan 15, 2022 6:48 pm
Re: Searchable text in docs created in PDF-XCE
Hi Dan. Thanks for the reply.
The thing is, it's not just those two programs. When I upload the doc to Google Drive, which is able to search for text content in pdfs, I get the same results as the other two locally installed programs. Some condition of the document is preventing its content from being found by three completely different programs that can otherwise search pdfs.
Could it really be the size of the images? I know the images are on the large side... but still. It's not as though the search is taking a long time... it's coming back negative. But I will batch reduce the image size and recreate the docs and see what happens.
Thanks again.
The thing is, it's not just those two programs. When I upload the doc to Google Drive, which is able to search for text content in pdfs, I get the same results as the other two locally installed programs. Some condition of the document is preventing its content from being found by three completely different programs that can otherwise search pdfs.
Could it really be the size of the images? I know the images are on the large side... but still. It's not as though the search is taking a long time... it's coming back negative. But I will batch reduce the image size and recreate the docs and see what happens.
Thanks again.
-
- Site Admin
- Posts: 11263
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Searchable text in docs created in PDF-XCE
Hello, XChangeGirl
The note about the same happening in google drive is helpful, now I have something I can test with, instead of simply theorizing. Thank you for that information.
If I can reproduce it there, I will bring this to the attention of the Dev team after some extended tests, to see if there is a root cause on our end. Hopefully they can find something more tangible than what I was able to.
[Edit]
A quick update to this... on my end, google drive is unable to find results for either the full sized (21 page) file, or the cut down 10 page variant. However, both show up almost instantly when searching in the file explorer (admittedly, our shell extensions are enabled) Can you confirm, if you take the document you uploaded here, upload it anew, and then search for it in Google drive, that search then works on your end?
Kind regards,
The note about the same happening in google drive is helpful, now I have something I can test with, instead of simply theorizing. Thank you for that information.
If I can reproduce it there, I will bring this to the attention of the Dev team after some extended tests, to see if there is a root cause on our end. Hopefully they can find something more tangible than what I was able to.
[Edit]
A quick update to this... on my end, google drive is unable to find results for either the full sized (21 page) file, or the cut down 10 page variant. However, both show up almost instantly when searching in the file explorer (admittedly, our shell extensions are enabled) Can you confirm, if you take the document you uploaded here, upload it anew, and then search for it in Google drive, that search then works on your end?
Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 93
- Joined: Sat Jan 15, 2022 6:48 pm
Re: Searchable text in docs created in PDF-XCE
Hi Dan. Thank you for looking into this!
When I uploaded to Google drive the 21 page test file that I attached above (which is mislabeled as 80 pages; sorry about that!) does NOT return search results when searched by Drive.
I've done some testing by creating pdfs with smaller image sizes, eg, creating a new pdf by importing about 100 jpgs each, all small within the 400-600Kb range.
What I've found is that text box text becomes unsearchable in documents greater than about 48 pages to 55 pages. That number is someone inconsistent, but much more consistent are the results as I add and remove pages in the range.
I've been testing mainly with X! Search, because it monitors the file system and shows results instantly without my having to upload the docs to Drive. But I've confirmed the results from X1 in Google Drive.
Right around the 48 page point is where the searches start coming up empty.
Just for the record: Am I correct that what I'm documents with that number of images shouldn't present issues? I know they can get large and slow, but are there other reasons to avoid them?
Thanks again!
When I uploaded to Google drive the 21 page test file that I attached above (which is mislabeled as 80 pages; sorry about that!) does NOT return search results when searched by Drive.
I've done some testing by creating pdfs with smaller image sizes, eg, creating a new pdf by importing about 100 jpgs each, all small within the 400-600Kb range.
What I've found is that text box text becomes unsearchable in documents greater than about 48 pages to 55 pages. That number is someone inconsistent, but much more consistent are the results as I add and remove pages in the range.
I've been testing mainly with X! Search, because it monitors the file system and shows results instantly without my having to upload the docs to Drive. But I've confirmed the results from X1 in Google Drive.
Right around the 48 page point is where the searches start coming up empty.
Just for the record: Am I correct that what I'm documents with that number of images shouldn't present issues? I know they can get large and slow, but are there other reasons to avoid them?
Thanks again!
-
- Site Admin
- Posts: 11263
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Searchable text in docs created in PDF-XCE
Hello, XChangeGirl
Inherently, there should not be any issues with larger/longer files, The litmus test for search functions generally should be, "does File explorer find it?". (I also tested the same after inflating the document to 63 pages, and explorer was again able to find it without issue)
If it can, than the content should be searchable to anything else which is reading that data properly. I do not know why these apps are unable to read the comment data after that point, but as we create our comments with strict adherence to the PDF specification, I can only suggest reaching out to them directly and asking if there are any logs or some such that can be taken from their end to indicate where the issue is.
That is not to say there is no chance of something going wrong on our end, but without feedback from the creators of those systems telling us what specific parts of our files has caused the issue, we don't really know what else to look at. If they come back with any information pertaining to the error being caused by the files themselves, could you please forward that to us via support@PDF-XChange.com
Kind regards,
Inherently, there should not be any issues with larger/longer files, The litmus test for search functions generally should be, "does File explorer find it?". (I also tested the same after inflating the document to 63 pages, and explorer was again able to find it without issue)
If it can, than the content should be searchable to anything else which is reading that data properly. I do not know why these apps are unable to read the comment data after that point, but as we create our comments with strict adherence to the PDF specification, I can only suggest reaching out to them directly and asking if there are any logs or some such that can be taken from their end to indicate where the issue is.
That is not to say there is no chance of something going wrong on our end, but without feedback from the creators of those systems telling us what specific parts of our files has caused the issue, we don't really know what else to look at. If they come back with any information pertaining to the error being caused by the files themselves, could you please forward that to us via support@PDF-XChange.com
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 93
- Joined: Sat Jan 15, 2022 6:48 pm
Re: Searchable text in docs created in PDF-XCE
Thanks again, Dan.
When you refer to "Does File explorer find it?" — I assume you mean using the the basic search field in Explorer, located to the right of the path field. When I use that search method, I get the same results: Content is not found in documents over a certain size, roughly 50 pages.
I have been using Windows 10 Pro, but just tested on a completely different computer, running Windows 11 Pro. Same files, same results.
Hmm.... I am not using the most recent version of pdfxce. I will update from 10.4.0 build 388 to the most recent. Could that be it?
I'll find out!
When you refer to "Does File explorer find it?" — I assume you mean using the the basic search field in Explorer, located to the right of the path field. When I use that search method, I get the same results: Content is not found in documents over a certain size, roughly 50 pages.
I have been using Windows 10 Pro, but just tested on a completely different computer, running Windows 11 Pro. Same files, same results.
Hmm.... I am not using the most recent version of pdfxce. I will update from 10.4.0 build 388 to the most recent. Could that be it?
I'll find out!
-
- Site Admin
- Posts: 11263
- Joined: Wed Jan 03, 2018 6:52 pm
Re: Searchable text in docs created in PDF-XCE
Hello, XChangeGirl
Do let me know if the update helps. I cant imagine that it would have an impact on search in Google Drive, but if the others are locally installed, they may be using, or able to make use of, our "ifilter". I dont think there have been any notable changes in that area, but it never hurts to try.
I am however very surprised to hear that you did experience the same issue in Explorer, since I tested with 3 versions of this file, all of which seemed to be found very quickly in it.
Kind regards,
Do let me know if the update helps. I cant imagine that it would have an impact on search in Google Drive, but if the others are locally installed, they may be using, or able to make use of, our "ifilter". I dont think there have been any notable changes in that area, but it never hurts to try.
I am however very surprised to hear that you did experience the same issue in Explorer, since I tested with 3 versions of this file, all of which seemed to be found very quickly in it.
Kind regards,
Dan McIntyre - Support Technician
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
PDF-XChange Co. LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com