Metadata file

Sphax · Post by **Sphax** » Thu Feb 16, 2012 3:48 pm

Hello,
I have seen on a website that it was possible to see what the .xmp file contains (all the PDF metadatas).
Is it possible to export automaticaly these information?
Thank you in advance.

Thu Feb 16, 2012 4:10 pm

Hello Sphax,

I am afraid that I can't clearly understand what exactly you are trying to achieve.
Also are you interested in our end user or developer products?

For now our products are only capable of working with PDF files and the viewer to import/export comments and form data to fdf/xfdf files - any other format will need another tool for processing.

Best,
Stefan

timtak · Post by **timtak** » Thu Apr 12, 2012 9:03 am

I like writing comments in PDF-Xchange. If I could then export them as a text file I could move them to my Zotero DB or to MS word.

Perhaps if I created some generic empty pdf form (I have no idea how and my attemps to google on failed) I could import my fdf files into that.

I would be grateful if if PDF-XChange saved comments to text files or opend fdf files. Then I could copy and past the comments to word or a text file.

Thu Apr 12, 2012 11:18 am

Hello timtak,

Exporting comments is only possible to fdf or xfdf file formats but there is also the "Summarize Comments" feature in our Viewer. And there one of the possible file formats is text - have you tried that?

Also direct opening of .fdf/,xfdf files is coming in the next major version of the Viewer later this year.

Best,
Stefan

timtak · Post by **timtak** » Mon Apr 16, 2012 7:56 am

I had not tried that no. It works. That is great!

Is there any control over what is posted to the text file? The notes screen in zotero is pretty small, and only the first line is shown so the "Author: TT Subject: Sticky Note Date: 2012/04/12, 17:56:03" line and "Page: " is something I would ideally like to remove. I could search and remove. I could probably work out a regular expression in Word but it would add a step.

The summarize comments is included in functionality in the free edition (though it was not displayed until I clicked on the "display pro functionality"). The product comparison is difficult for me. It does not mention comment "Summarize Comments" The only other thing that I would like to be able to do is annotate pdfs even if they are password protected. I tried the wizard which recommend the free product but said that the pro version can't do 4 things that I want it to do. Strange that the free should be more powerful than the pro. I don't think that the free product can alter password protected files but I am probably wrong. Would you be so kind as to recommend a product?

timtak · Post by **timtak** » Mon Apr 16, 2012 10:03 am

If PDF-Xchange were to export in the following format......

http://forums.zotero.org/discussion/225 ... es/#Item_0

"You would essentially just need to prefix each note with

"N1 - "

and put those lines between

"TY - BOOK"

and

"ER - "

such that it looked something like this:

TY - BOOK
N1 - 060: Benton: “The ‘rule’ announces more than sanction…
N1 - 061: Benton: “One can certainly read the violence committed…
ER -

(Those are two spaces before each hyphen and one after.)

Then the resultant file could be drag and droppable into zotero.

From the Zotero blog (to give you an idea of the potential customer base)
http://www.zotero.org/blog/zoteros-next-big-step/
Downloaded millions of times since 2006 and used by hundreds of thousands of researchers daily, Zotero has grown to the world’s largest and most diverse online research community, with nearly 50 million library items presently synced to zotero.org.

Mon Apr 16, 2012 12:46 pm

Hello timtak,

We can't promise any zotero specific modifications to the way comments are exported in our Viewer, but if time allows we can review that link you've sent.

To answer the questions in your earlier post as well - all the Summarize Comments features are available in the same screen.

As for annotating password protected documents - you will need to either have the document allow modifications, or know the author password. Our Viewer will otherwise obey the PDF restrictions and not let you perform the modifications you want.

Best,
Stefan

timtak · Post by **timtak** » Tue Apr 17, 2012 5:10 am

Thank you.

I am slowly getting together a regex expression:

replace
*Page*Page > Page
with
nothing

replace
Author*[0-9][0-9]:[0-9][0-9]:[0-9][0-9]
with
nothing

replace
Page: ([0-9])^13
with
\1 (single trailing space)

replace
^13[0-9] ^13
with
nothing

leaves an extra number at the end but it pretty much works to remove everything except line numbers before notes (no hightlights etc) and a word macro to do this is below.

Code: Select all

Sub pdfxchange2zotero()
'
' pdfxchange2zotero Macro
'
'
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "*Page*Page"
        .Replacement.Text = " Page"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute
    With Selection
        If .Find.Forward = True Then
            .Collapse Direction:=wdCollapseStart
        Else
            .Collapse Direction:=wdCollapseEnd
        End If
        .Find.Execute Replace:=wdReplaceOne
        If .Find.Forward = True Then
            .Collapse Direction:=wdCollapseEnd
        Else
            .Collapse Direction:=wdCollapseStart
        End If
        .Find.Execute
    End With
    With Selection.Find
        .Text = "Author*[0-9][0-9]:[0-9][0-9]"
        .Replacement.Text = " Page"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    With Selection.Find
        .Text = "Page: ([0-9])^13"
        .Replacement.Text = "\1"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub pdf2zot()
'
' pdf2zot Macro
'
'
    Selection.HomeKey Unit:=wdStory
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "*Page*Page"
        .Replacement.Text = "Page"
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute
    With Selection
        If .Find.Forward = True Then
            .Collapse Direction:=wdCollapseStart
        Else
            .Collapse Direction:=wdCollapseEnd
        End If
        .Find.Execute Replace:=wdReplaceOne
        If .Find.Forward = True Then
            .Collapse Direction:=wdCollapseEnd
        Else
            .Collapse Direction:=wdCollapseStart
        End If
        .Find.Execute
    End With
    Selection.HomeKey Unit:=wdStory
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "Author*[0-9][0-9]:[0-9][0-9]:[0-9][0-9]"
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    With Selection.Find
        .Text = "Page: ([0-9])^13^13"
        .Replacement.Text = "\1 "
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "^13[0-9] ^13"
        .Replacement.Text = ""
        .Forward = True
        .Wrap = wdFindContinue
        .Format = False
        .MatchCase = False
        .MatchWholeWord = False
        .MatchByte = False
        .MatchAllWordForms = False
        .MatchSoundsLike = False
        .MatchFuzzy = False
        .MatchWildcards = True
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

timtak · Post by **timtak** » Tue Apr 17, 2012 6:34 am

But, oops, I have just realised....the page numbers that PDF-XChange provides are numbered from the top of the document, and are not the document page numbers (which would be required when citing the text). I realise that it would be next to impossible for PDF-XChange to extract original page information.

So I think it is best just to delete everything, and just supply each comment on a new line.

Tue Apr 17, 2012 12:14 pm

Hello timtak,

Yes - the Viewer counts pages starting at 0 for the first one.

Custom page numberings will be supported n v3 of the Viewer, but whether that would also be working with your regex I can't tell at the moment.

Best,
Stefan

Metadata file

Metadata file

Re: Metadata file

Re: Metadata file

Re: Metadata file

Re: Metadata file

Re: Metadata file

Re: Metadata file

Re: Metadata file

Re: Metadata file

Re: Metadata file