I wrote a script in Perl to take a RIS-formatted citation library and parse it into a JSON file compatible for use with CiteULike‘s API. It’s a bit sloppy, but considering I’ve had their API since October, it’s about fucking time I did it.

It’d be relatively trivial to tweak this to parse out a Papers library XML export, or change it around to probably work its magic on a BiBTex-formatted citation. I’ll slowly work on getting something up and running to tweak around between formats and auto-parsing them. But first, I need to get this mish-mashed all into one script to do it all; none of this leaving a password around in a flat-file text document. Bleargh! Unfortunately, at the present time it’s just optimized for Mac systems (for defining the file path). I’ll also tweak that as I go and see if I can get it to work okay in the Windows environment (going to make me feel dirty though).

Granted, this took me most of the evening, but I’m quite proud of myself so far. I already used it to parse and upload my entire Papers library to finally merge that with my CiteULike library. Click on through below to get at the code. And note to self: find better way to implement code into my WP blog.

#!/usr/bin/env perl -w
use strict;
use URI::Escape;
# This Perl script takes a RIS-formatted library (defined from default input) and produces a file (library.json) that is formatted for use with the pdf_upload.pl API script provided for multiple file uploads at CiteULike.org open(INFILE,"<$ARGV[0]")||die("Could not open file ".$ARGV[0]."\n"); print STDOUT ("Please enter your CiteULike username:\t");chomp(my$username=<STDIN>); print STDOUT ("Please enter the password for your CiteULike account:\t");chomp(my$pw=<STDIN>); print STDOUT ("Enter the name of the group you would like this library uploaded to (leave blank if you do not wish to define a group):\t");chomp(my$culgroup=<STDIN>); chomp(my$culgroup="test_upload"); open(OUTFILE,">library.json");
print OUTFILE ("{\n\t\"username\" : \"$username\",\n\t\"password\" : \"$pw\",\n\t\"post_username\" : \"group:$culgroup\",\n"); print OUTFILE ("\t\"files\" : [\n");
my($id,$journal,$title,$path)=undef; # Parse the RIS library for relevant information while(<INFILE>) { chomp; # Obtain article ID from library if(/ID\s\s/) {$id=substr($_,6)} # Obtain periodical name (for troubleshooting) if(/JF\s\s/) {$journal=substr($_,6)} # Obtain title (for troubleshooting) if(/T1\s\s/) {$title=substr($_,6)} # Obtain file path for PDF (if available) if(/L1\s\s/) { $_=uri_unescape(substr($_,6)); s/file:\/\/localhost//; $path=substr($_,6); # Deposit JSON entry upon path presence so as to skip entries without a file defined print OUTFILE ("\t\t{\n\t\t\t\"article_id\" : \"$id\",\n\t\t\t\"full_journal\" : \"$journal\",\n\t\t\t\"title\" : \"$title\",\n\t\t\t\"path\" : \"$path\"\n\t\t},\n"); } } # Close out remainder of JSON file print OUTFILE ("\t]\n}\n");
close(INFILE);close(OUTFILE); __END__

Leave a Reply

Your email address will not be published. Required fields are marked *