Quantcast

pdfmeat

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

pdfmeat

Ken Mankoff
I just discovered pdfmeat at https://code.google.com/p/pdfmeat/ which as of one year ago includes a stand-alone CLI python script to generate BibTeX entries given a file. To get it working on OS X I had to change the path for the FireFox cookie file from ~/mozilla/firefox/ to ~/Library/Application Support/FireFox/Profiles/, did "pip install X" for each X it complained was missing, and it "just worked". Nice. 

It seems like this could be wrapped in a plugin so that drag-and-dropping PDF files onto BibDesk could auto-import with meta-data, just like Mendely and Papers and Zotero and all the other PDF/bibliography managers.

I searched for "pdfmeat" and "bibdesk" and found almost no hits. Is anyone aware of such a plugin, or anyone familiar with plugin writing un-aware of pdfmeat and now interested in creating this plugin? If not, I may have to figure out how BibDesk plugins work...

  -k.

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Ken Mankoff
So I've made some progress on this applescript but seek some help from those on this list who have used applescript before and understand the BibDesk dictionary a bit. I now can drag PDFs into BibDesk where they get blank records. I can then hit CMD+SHFT+P and the full BibTeX record is displayed from pdfmeat.py. What I need help with is looping through each element of this BibTeX record and inserting it into the current blank one. My current code follows. Any help/hints will be much appreciated.

Thanks,

    -k.

tell application "BibDesk"
  set theDoc to document 1
  tell theDoc
    set theSel to selection
    set thePub to item 1 of theSel
    tell thePub
      -- current (probably empty?) BibTeX record
      set curBibTeXRecord to get BibTeX string of thePub
      -- get the BibTeX record using pdfmeat.py. Wrap it in a shell script because it needs a full path, write access to a folder, etc.
      set theFiles to get linked files
      set thePath to POSIX path of theFiles
      set shellCmd to "/Users/mankoff/bin/pdfmeat.sh " & "'" & thePath & "'"
      set pdfMeatOutput to do shell script shellCmd
      --display dialog pdfMeatOutput
      --display dialog curBibTeXRecord
    end tell
  end tell
end tell





On Wed, Jan 8, 2014 at 8:49 PM, Ken Mankoff <[hidden email]> wrote:
I just discovered pdfmeat at https://code.google.com/p/pdfmeat/ which as of one year ago includes a stand-alone CLI python script to generate BibTeX entries given a file. To get it working on OS X I had to change the path for the FireFox cookie file from ~/mozilla/firefox/ to ~/Library/Application Support/FireFox/Profiles/, did "pip install X" for each X it complained was missing, and it "just worked". Nice. 

It seems like this could be wrapped in a plugin so that drag-and-dropping PDF files onto BibDesk could auto-import with meta-data, just like Mendely and Papers and Zotero and all the other PDF/bibliography managers.

I searched for "pdfmeat" and "bibdesk" and found almost no hits. Is anyone aware of such a plugin, or anyone familiar with plugin writing un-aware of pdfmeat and now interested in creating this plugin? If not, I may have to figure out how BibDesk plugins work...

  -k.


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Ken Mankoff
Ok, it seems there is no mechanism to parse the raw BibTeX record (it is read-only in BibDesk AppleScript).

I guess a simple work-around is to import an entirely new record using the pdfmeat.py output, link the old file (the file link being the only thing in the previous empty record) to the new record, and then delete the old record.

I have the import working, and know how to delete a record thanks to the sample script, but can't figure out how to move the linked file from the old to the new.

   -k.





On Fri, Jan 24, 2014 at 8:11 PM, Ken Mankoff <[hidden email]> wrote:
So I've made some progress on this applescript but seek some help from those on this list who have used applescript before and understand the BibDesk dictionary a bit. I now can drag PDFs into BibDesk where they get blank records. I can then hit CMD+SHFT+P and the full BibTeX record is displayed from pdfmeat.py. What I need help with is looping through each element of this BibTeX record and inserting it into the current blank one. My current code follows. Any help/hints will be much appreciated.

Thanks,

    -k.

tell application "BibDesk"
  set theDoc to document 1
  tell theDoc
    set theSel to selection
    set thePub to item 1 of theSel
    tell thePub
      -- current (probably empty?) BibTeX record
      set curBibTeXRecord to get BibTeX string of thePub
      -- get the BibTeX record using pdfmeat.py. Wrap it in a shell script because it needs a full path, write access to a folder, etc.
      set theFiles to get linked files
      set thePath to POSIX path of theFiles
      set shellCmd to "/Users/mankoff/bin/pdfmeat.sh " & "'" & thePath & "'"
      set pdfMeatOutput to do shell script shellCmd
      --display dialog pdfMeatOutput
      --display dialog curBibTeXRecord
    end tell
  end tell
end tell





On Wed, Jan 8, 2014 at 8:49 PM, Ken Mankoff <[hidden email]> wrote:
I just discovered pdfmeat at https://code.google.com/p/pdfmeat/ which as of one year ago includes a stand-alone CLI python script to generate BibTeX entries given a file. To get it working on OS X I had to change the path for the FireFox cookie file from ~/mozilla/firefox/ to ~/Library/Application Support/FireFox/Profiles/, did "pip install X" for each X it complained was missing, and it "just worked". Nice. 

It seems like this could be wrapped in a plugin so that drag-and-dropping PDF files onto BibDesk could auto-import with meta-data, just like Mendely and Papers and Zotero and all the other PDF/bibliography managers.

I searched for "pdfmeat" and "bibdesk" and found almost no hits. Is anyone aware of such a plugin, or anyone familiar with plugin writing un-aware of pdfmeat and now interested in creating this plugin? If not, I may have to figure out how BibDesk plugins work...

  -k.



------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Fischlin  Andreas
Dear Ken,

On 25/01/2014, at 03:11 , Ken Mankoff wrote:

Ok, it seems there is no mechanism to parse the raw BibTeX record (it is read-only in BibDesk AppleScript).

Of course not. You can fully read and write BibDesk records using AppleScripts. It is however, not fully clear what you mean by raw BibTeX record. What I know is that you should not meddle with those and leave that to BibDesk to manage the data in the bib file. Otherwise you risk getting a huge mess.


I guess a simple work-around is to import an entirely new record using the pdfmeat.py output, link the old file (the file link being the only thing in the previous empty record) to the new record, and then delete the old record.

There are plenty of example AppleScripts within the BibDesk community which can do what you seem to be looking for. Why don't you have a look at those ("Help -> Visit BibDesk Wiki")? Here a link to mine I offer from this web site:  http://www.sysecol.ethz.ch/people/afischli   Link Software -> BibDesk -> AppleScripts



I have the import working, and know how to delete a record thanks to the sample script, but can't figure out how to move the linked file from the old to the new.

see above

What seems to be missing is a loop over all records in current selection, which may look similar to following

set thePubs to selection
if (count of thePubs) > 0 then
  repeat with thePub in thePubs
    tell aPub
       set theAuthor to value of field "Author"
       set value of field "Doi" to "10.1007/s10584-010-9923-5"
       ...
       etc.
       ...
    end tell
  end repeat
else
  beep
end if


Regards,
Andreas



   -k.





On Fri, Jan 24, 2014 at 8:11 PM, Ken Mankoff <[hidden email]<mailto:[hidden email]>> wrote:
So I've made some progress on this applescript but seek some help from those on this list who have used applescript before and understand the BibDesk dictionary a bit. I now can drag PDFs into BibDesk where they get blank records. I can then hit CMD+SHFT+P and the full BibTeX record is displayed from pdfmeat.py. What I need help with is looping through each element of this BibTeX record and inserting it into the current blank one. My current code follows. Any help/hints will be much appreciated.

Thanks,

    -k.

tell application "BibDesk"
  set theDoc to document 1
  tell theDoc
    set theSel to selection
    set thePub to item 1 of theSel
    tell thePub
      -- current (probably empty?) BibTeX record
      set curBibTeXRecord to get BibTeX string of thePub
      -- get the BibTeX record using pdfmeat.py. Wrap it in a shell script because it needs a full path, write access to a folder, etc.
      set theFiles to get linked files
      set thePath to POSIX path of theFiles
      set shellCmd to "/Users/mankoff/bin/pdfmeat.sh " & "'" & thePath & "'"
      set pdfMeatOutput to do shell script shellCmd
      --display dialog pdfMeatOutput
      --display dialog curBibTeXRecord
    end tell
  end tell
end tell




On Wed, Jan 8, 2014 at 8:49 PM, Ken Mankoff <[hidden email]<mailto:[hidden email]>> wrote:
I just discovered pdfmeat at https://code.google.com/p/pdfmeat/ which as of one year ago includes a stand-alone CLI python script to generate BibTeX entries given a file. To get it working on OS X I had to change the path for the FireFox cookie file from ~/mozilla/firefox/ to ~/Library/Application Support/FireFox/Profiles/, did "pip install X" for each X it complained was missing, and it "just worked". Nice.

It seems like this could be wrapped in a plugin so that drag-and-dropping PDF files onto BibDesk could auto-import with meta-data, just like Mendely and Papers and Zotero and all the other PDF/bibliography managers.

I searched for "pdfmeat" and "bibdesk" and found almost no hits. Is anyone aware of such a plugin, or anyone familiar with plugin writing un-aware of pdfmeat and now interested in creating this plugin? If not, I may have to figure out how BibDesk plugins work...

  -k.


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Fischlin  Andreas
Sorry, I had a mistake in the AppleScript: Please replace "tell aPub" with "tell thePub".

Regards,
Andreas


>
> What seems to be missing is a loop over all records in current selection, which may look similar to following
>
> set thePubs to selection
> if (count of thePubs) > 0 then
>  repeat with thePub in thePubs
>    tell aPub
>       set theAuthor to value of field "Author"
>       set value of field "Doi" to "10.1007/s10584-010-9923-5"
>       ...
>       etc.
>       ...
>    end tell
>  end repeat
> else
>  beep
> end if
>
>
> Regards,
> Andreas
>
>
>
>   -k.
>
>
>
>
>
> On Fri, Jan 24, 2014 at 8:11 PM, Ken Mankoff <[hidden email]<mailto:[hidden email]>> wrote:
> So I've made some progress on this applescript but seek some help from those on this list who have used applescript before and understand the BibDesk dictionary a bit. I now can drag PDFs into BibDesk where they get blank records. I can then hit CMD+SHFT+P and the full BibTeX record is displayed from pdfmeat.py. What I need help with is looping through each element of this BibTeX record and inserting it into the current blank one. My current code follows. Any help/hints will be much appreciated.
>
> Thanks,
>
>    -k.
>
> tell application "BibDesk"
>  set theDoc to document 1
>  tell theDoc
>    set theSel to selection
>    set thePub to item 1 of theSel
>    tell thePub
>      -- current (probably empty?) BibTeX record
>      set curBibTeXRecord to get BibTeX string of thePub
>      -- get the BibTeX record using pdfmeat.py. Wrap it in a shell script because it needs a full path, write access to a folder, etc.
>      set theFiles to get linked files
>      set thePath to POSIX path of theFiles
>      set shellCmd to "/Users/mankoff/bin/pdfmeat.sh " & "'" & thePath & "'"
>      set pdfMeatOutput to do shell script shellCmd
>      --display dialog pdfMeatOutput
>      --display dialog curBibTeXRecord
>    end tell
>  end tell
> end tell
>
>
>
>
> On Wed, Jan 8, 2014 at 8:49 PM, Ken Mankoff <[hidden email]<mailto:[hidden email]>> wrote:
> I just discovered pdfmeat at https://code.google.com/p/pdfmeat/ which as of one year ago includes a stand-alone CLI python script to generate BibTeX entries given a file. To get it working on OS X I had to change the path for the FireFox cookie file from ~/mozilla/firefox/ to ~/Library/Application Support/FireFox/Profiles/, did "pip install X" for each X it complained was missing, and it "just worked". Nice.
>
> It seems like this could be wrapped in a plugin so that drag-and-dropping PDF files onto BibDesk could auto-import with meta-data, just like Mendely and Papers and Zotero and all the other PDF/bibliography managers.
>
> I searched for "pdfmeat" and "bibdesk" and found almost no hits. Is anyone aware of such a plugin, or anyone familiar with plugin writing un-aware of pdfmeat and now interested in creating this plugin? If not, I may have to figure out how BibDesk plugins work...
>
>  -k.
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_______________________________________________
> Bibdesk-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bibdesk-users
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Bibdesk-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bibdesk-users


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Christiaan Hofman
In reply to this post by Ken Mankoff

On Jan 25, 2014, at 3:11, Ken Mankoff wrote:

Ok, it seems there is no mechanism to parse the raw BibTeX record (it is read-only in BibDesk AppleScript).

I guess a simple work-around is to import an entirely new record using the pdfmeat.py output, link the old file (the file link being the only thing in the previous empty record) to the new record, and then delete the old record.

I have the import working, and know how to delete a record thanks to the sample script, but can't figure out how to move the linked file from the old to the new.

   -k.





On Fri, Jan 24, 2014 at 8:11 PM, Ken Mankoff <[hidden email]> wrote:
So I've made some progress on this applescript but seek some help from those on this list who have used applescript before and understand the BibDesk dictionary a bit. I now can drag PDFs into BibDesk where they get blank records. I can then hit CMD+SHFT+P and the full BibTeX record is displayed from pdfmeat.py. What I need help with is looping through each element of this BibTeX record and inserting it into the current blank one. My current code follows. Any help/hints will be much appreciated.

Thanks,

    -k.


It is probably easier to copy the fields from the (temporary) new item you get from pdfmeat to the old item. If they're always the same fields, you could go over those fields explicitly. Or you could loop over the fields of the temporary item.  I do something like that in my BibDesk Download script available on the Wiki.

Christiaan

tell application "BibDesk"
  set theDoc to document 1
  tell theDoc
    set theSel to selection
    set thePub to item 1 of theSel
    tell thePub
      -- current (probably empty?) BibTeX record
      set curBibTeXRecord to get BibTeX string of thePub
      -- get the BibTeX record using pdfmeat.py. Wrap it in a shell script because it needs a full path, write access to a folder, etc.
      set theFiles to get linked files
      set thePath to POSIX path of theFiles
      set shellCmd to "/Users/mankoff/bin/pdfmeat.sh " & "'" & thePath & "'"
      set pdfMeatOutput to do shell script shellCmd
      --display dialog pdfMeatOutput
      --display dialog curBibTeXRecord
    end tell
  end tell
end tell





On Wed, Jan 8, 2014 at 8:49 PM, Ken Mankoff <[hidden email]> wrote:
I just discovered pdfmeat at https://code.google.com/p/pdfmeat/ which as of one year ago includes a stand-alone CLI python script to generate BibTeX entries given a file. To get it working on OS X I had to change the path for the FireFox cookie file from ~/mozilla/firefox/ to ~/Library/Application Support/FireFox/Profiles/, did "pip install X" for each X it complained was missing, and it "just worked". Nice. 

It seems like this could be wrapped in a plugin so that drag-and-dropping PDF files onto BibDesk could auto-import with meta-data, just like Mendely and Papers and Zotero and all the other PDF/bibliography managers.

I searched for "pdfmeat" and "bibdesk" and found almost no hits. Is anyone aware of such a plugin, or anyone familiar with plugin writing un-aware of pdfmeat and now interested in creating this plugin? If not, I may have to figure out how BibDesk plugins work...

  -k.


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Ken Mankoff
Hi Christiaan and other AppleScript experts,

On Sat, Jan 25, 2014 at 5:13 AM, Christiaan Hofman <[hidden email]> wrote:

On Jan 25, 2014, at 3:11, Ken Mankoff wrote:

It is probably easier to copy the fields from the (temporary) new item you get from pdfmeat to the old item. If they're always the same fields, you could go over those fields explicitly. Or you could loop over the fields of the temporary item.  I do something like that in my BibDesk Download script available on the Wiki.


I have looked over the Download script but the key line to link the files doesn't "just work" on my system


The key lines, I think, are:

make new linked file with data (theFile as alias)
my linkFile(it, theFile) 

After my code runs I get the following error message:

{publication id "bdskidentifier://3B137B15-F381-4BEE-BF7D-18B34CD2EF7A" of document "test.bib"} doesn’t understand the “make” message.

But I realize I may not actually need to link the PDF! pdfmeat.py includes a field in the BibTeX record:

file={file:///path/to/file.pdf:pdf}

So perhaps I don't need to link something from the old record, I just need to tell the new record to convert "file" to "linked file", however that happens...

   -k.


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Ken Mankoff
OK, it turns out that the import command returns an array, even if you only give it one BibTeX record, so the improved code needs:

set newPubs to import from pdfMeatOutput

set newPub to (get item 1 of newPubs)

   -k.



On Sat, Jan 25, 2014 at 8:52 AM, Ken Mankoff <[hidden email]> wrote:
Hi Christiaan and other AppleScript experts,

On Sat, Jan 25, 2014 at 5:13 AM, Christiaan Hofman <[hidden email]> wrote:

On Jan 25, 2014, at 3:11, Ken Mankoff wrote:

It is probably easier to copy the fields from the (temporary) new item you get from pdfmeat to the old item. If they're always the same fields, you could go over those fields explicitly. Or you could loop over the fields of the temporary item.  I do something like that in my BibDesk Download script available on the Wiki.


I have looked over the Download script but the key line to link the files doesn't "just work" on my system


The key lines, I think, are:

make new linked file with data (theFile as alias)
my linkFile(it, theFile) 

After my code runs I get the following error message:

{publication id "bdskidentifier://3B137B15-F381-4BEE-BF7D-18B34CD2EF7A" of document "test.bib"} doesn’t understand the “make” message.

But I realize I may not actually need to link the PDF! pdfmeat.py includes a field in the BibTeX record:

file={file:///path/to/file.pdf:pdf}

So perhaps I don't need to link something from the old record, I just need to tell the new record to convert "file" to "linked file", however that happens...

   -k.



------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Ken Mankoff
Ok all done! Thanks for those who supplied some tips. With this AppleScript and a working pdfmeat.py (test it from the command line), I can drag a PDF into BibDesk, select it, and populate it with all of the data from Google Scholar.

  -k.

property useFileURL : true

-- use relative or absolute path?

property useRelativePath : false

-- delete linked files/URLs after converting?

property deleteLinkedFiles : false


tell application "BibDesk"

set theDoc to document 1

tell theDoc

set theSel to selection

set thePub to item 1 of theSel

tell thePub

-- current (probably empty?) BibTeX record

set curBibTeXRecord to get BibTeX string of thePub

-- get the BibTeX record using pdfmeat.py. Wrap it in a shell script because it needs a full path, write access to a folder, etc.

set theFile to get linked files

set thePath to POSIX path of theFile

set shellOpts to "cd /tmp; PATH=$PATH:/usr/local/bin " -- path to pdftotext program

set pdfMeatCmd to "/path/to/python /path/to/pdfmeat.py  "

set shellCmd to shellOpts & pdfMeatCmd & "'" & thePath & "'"

set pdfMeatOutput to do shell script shellCmd

end tell

set newPubs to import from pdfMeatOutput

set newPub to (get item 1 of newPubs)

tell newPub

make new linked file with data theFile at beginning of linked files

set cite key to generated cite key

end tell

show thePub

show newPub

--delete thePub

end tell

end tell



------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Roberto
Sounds like it would be a great improvement if it comes with the standard Bibdesk in the future.
Great 


On Sat, Jan 25, 2014 at 3:59 PM, Ken Mankoff <[hidden email]> wrote:
Ok all done! Thanks for those who supplied some tips. With this AppleScript and a working pdfmeat.py (test it from the command line), I can drag a PDF into BibDesk, select it, and populate it with all of the data from Google Scholar.

  -k.

property useFileURL : true

-- use relative or absolute path?

property useRelativePath : false

-- delete linked files/URLs after converting?

property deleteLinkedFiles : false


tell application "BibDesk"

set theDoc to document 1

tell theDoc

set theSel to selection

set thePub to item 1 of theSel

tell thePub

-- current (probably empty?) BibTeX record

set curBibTeXRecord to get BibTeX string of thePub

-- get the BibTeX record using pdfmeat.py. Wrap it in a shell script because it needs a full path, write access to a folder, etc.

set theFile to get linked files

set thePath to POSIX path of theFile

set shellOpts to "cd /tmp; PATH=$PATH:/usr/local/bin " -- path to pdftotext program

set pdfMeatCmd to "/path/to/python /path/to/pdfmeat.py  "

set shellCmd to shellOpts & pdfMeatCmd & "'" & thePath & "'"

set pdfMeatOutput to do shell script shellCmd

end tell

set newPubs to import from pdfMeatOutput

set newPub to (get item 1 of newPubs)

tell newPub

make new linked file with data theFile at beginning of linked files

set cite key to generated cite key

end tell

show thePub

show newPub

--delete thePub

end tell

end tell



------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users



------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Christiaan Hofman
In reply to this post by Ken Mankoff

On Jan 25, 2014, at 15:59, Ken Mankoff wrote:

Ok all done! Thanks for those who supplied some tips. With this AppleScript and a working pdfmeat.py (test it from the command line), I can drag a PDF into BibDesk, select it, and populate it with all of the data from Google Scholar.

  -k.

property useFileURL : true

-- use relative or absolute path?

property useRelativePath : false

-- delete linked files/URLs after converting?

property deleteLinkedFiles : false


tell application "BibDesk"

set theDoc to document 1

tell theDoc

set theSel to selection

set thePub to item 1 of theSel

tell thePub

-- current (probably empty?) BibTeX record

set curBibTeXRecord to get BibTeX string of thePub


-- get the BibTeX record using pdfmeat.py. Wrap it in a shell script because it needs a full path, write access to a folder, etc.

set theFile to get linked files


This is really an array, so this can go wrong. You can use linked file 1.

set thePath to POSIX path of theFile

set shellOpts to "cd /tmp; PATH=$PATH:/usr/local/bin " -- path to pdftotext program

set pdfMeatCmd to "/path/to/python /path/to/pdfmeat.py  "

set shellCmd to shellOpts & pdfMeatCmd & "'" & thePath & "'"


Applescript has the standard command "quoted form of" to get the quotes.

set pdfMeatOutput to do shell script shellCmd

end tell


set newPubs to import from pdfMeatOutput

set newPub to (get item 1 of newPubs)

tell newPub

make new linked file with data theFile at beginning of linked files

You can also do:

add (linked files of thePub) to linked files

set cite key to generated cite key

end tell


show thePub

show newPub

--delete thePub

end tell

end tell


As I said, you can also copy the fields from newPub to thePub. This may be better if you need some more control, for instance if thePub may already have some fields set. Something like:

set ignoredFields to {"Date-Added", "Date-Modified"}
repeat with theField in fields of newPub
set theName to name of theField
if theName is not in ignoredFields and value of field theName of thePub is "" then
set value of field thePub to value of theField
end if
end repeat

Christiaan


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Matthew Heun
In reply to this post by Roberto
Great work!  I've been watching this development with much interest, and I really hope this feature is rolled into BibDesk soon.  

Autofill is the feature I miss most from Papers.  It is the one thing that slows me down compared to Papers.  So, this is awesome!

Thanks to everyone who worked on this.  

Cheers,

Matt



On Jan 25, 2014, at 10:07 AM, Roberto <[hidden email]> wrote:

> Sounds like it would be a great improvement if it comes with the standard Bibdesk in the future.
> Great
>
>
> On Sat, Jan 25, 2014 at 3:59 PM, Ken Mankoff <[hidden email]> wrote:
> Ok all done! Thanks for those who supplied some tips. With this AppleScript and a working pdfmeat.py (test it from the command line), I can drag a PDF into BibDesk, select it, and populate it with all of the data from Google Scholar.
>
>   -k.
>
> property useFileURL : true
>
> -- use relative or absolute path?
>
> property useRelativePath : false
>
> -- delete linked files/URLs after converting?
>
> property deleteLinkedFiles : false
>
>
>
> tell application "BibDesk"
>
> set theDoc to document 1
>
> tell theDoc
>
> set theSel to selection
>
> set thePub to item 1 of theSel
>
> tell thePub
>
> -- current (probably empty?) BibTeX record
>
> set curBibTeXRecord to get BibTeX string of thePub
>
>
> -- get the BibTeX record using pdfmeat.py. Wrap it in a shell script because it needs a full path, write access to a folder, etc.
>
> set theFile to get linked files
>
> set thePath to POSIX path of theFile
>
> set shellOpts to "cd /tmp; PATH=$PATH:/usr/local/bin " -- path to pdftotext program
>
> set pdfMeatCmd to "/path/to/python /path/to/pdfmeat.py  "
>
> set shellCmd to shellOpts & pdfMeatCmd & "'" & thePath & "'"
>
> set pdfMeatOutput to do shell script shellCmd
>
> end tell
>
>
> set newPubs to import from pdfMeatOutput
>
> set newPub to (get item 1 of newPubs)
>
> tell newPub
>
> make new linked file with data theFile at beginning of linked files
>
> set cite key to generated cite key
>
> end tell
>
>
> show thePub
>
> show newPub
>
> --delete thePub
>
> end tell
>
> end tell
>
>
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
> _______________________________________________
> Bibdesk-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bibdesk-users
>
>
> ------------------------------------------------------------------------------
> CenturyLink Cloud: The Leader in Enterprise Cloud Services.
> Learn Why More Businesses Are Choosing CenturyLink Cloud For
> Critical Workloads, Development Environments & Everything In Between.
> Get a Quote or Start a Free Trial Today.
> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk_______________________________________________
> Bibdesk-users mailing list
> [hidden email]
> https://lists.sourceforge.net/lists/listinfo/bibdesk-users


------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Ken Mankoff
So I've put the code here: https://github.com/mankoff/BibDeskAppleScripts

Unfortunately it is a binary blob so merging changes isn't easy. I have it so I can edit AppleScript in emacs as plain ASCII, so I guess a hook like that could be built into git, but...

Anyway, there is a lot more I plan to do to this code to make it more generic and useful, all wrapping around pdfmeat.py since that does an OK job of returning BibTeX records from Google Scholar. But I don't think there will be much progress for a while as I have other things to do...

   -k.

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: pdfmeat

Ken Mankoff
OK, perhaps the last post for now on this project https://github.com/mankoff/BibDeskAppleScripts

It seems like it would be quite easy to get full PDF download from any site plus any metadata provided by either Google Scholar or the publisher. The key is to use 3rd party tools.

pdfmeat.py does an excellent job, in my experience, of fetching BibTeX records. I'd like to modify it to accept a specified search string instead of always parsing it from the PDF which is error-prone.

bibfetch.pl does an OK job of fetching BibTeX records (Google Scholar only) and is used when no PDF file is available for (the current version of) pdfmeat.py. bibfetch.pl also has an option to provide any URLs to PDFs that Google Scholar reports. Therefore, another few commands (for example, shell command to curl) would download the PDF, which could then be auto-filed with the BibDesk record being modified.

This is now a fairly comprehensive solution that downloads PDFs if available, from any website. If the PDF already exists but there is no metadata in BibDesk, that is OK, it still gets most of the data thanks to pdfmeat.py. Either way, any missing records are filled in.

The drawback of course is two external tools, which each have their own dependencies. The Python script required me to "pip install" a few things, and uses pdftotext, which suggests an entire LaTeX install. This might not be a problem with the BibDesk crowd. bibfetch.pl needed a few CPAN packages installed too.

Cheers,

    -k.

------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Bibdesk-users mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/bibdesk-users
Loading...