Content pane data after OCR: hierarchy and editing
Posted: Mon May 01, 2023 11:30 am
After completing OCR on a scanned document, the Content pane contains a hierarchy of many items comprising an image (or images) and numerous text objects.
Initially the text objects were arranged as words (with punctuation & spaces) grouped by line (one group for each line on the page/column).
However, when I edited the text (blind!) using the Edit Content button, the grouping was changed. My document had two columns, and all text within the column I edited became organised into one single group; the text in the other (unedited) column remained organised in many groups (one per line), and likewise the grouping of the (unedited) text in the header was unchanged. Note that my editing consisted of overwriting just two characters, being from two words on a single line.
I'm just curious: if one of those types of grouping is better than the other, then why not always use the superior grouping?
Besides editing blind, and rather than changing colours and hiding the image, is there a compelling reason not to be able to edit the text directly in the Content pane? (And have such a feature available even for non OCR'ed PDF files?)
From the user's perspective this could presumably operate something like the the editing of bookmark names in the Bookmarks pane.
—DIV
Initially the text objects were arranged as words (with punctuation & spaces) grouped by line (one group for each line on the page/column).
However, when I edited the text (blind!) using the Edit Content button, the grouping was changed. My document had two columns, and all text within the column I edited became organised into one single group; the text in the other (unedited) column remained organised in many groups (one per line), and likewise the grouping of the (unedited) text in the header was unchanged. Note that my editing consisted of overwriting just two characters, being from two words on a single line.
I'm just curious: if one of those types of grouping is better than the other, then why not always use the superior grouping?
Besides editing blind, and rather than changing colours and hiding the image, is there a compelling reason not to be able to edit the text directly in the Content pane? (And have such a feature available even for non OCR'ed PDF files?)
From the user's perspective this could presumably operate something like the the editing of bookmark names in the Bookmarks pane.
—DIV