HIP XML Parser (v0.01) – Self Plagiarism is Style

This is some code that I’ve been meaning to make available for public consumption for weeks, but we’ve been up to our necks with our RFID tender at Huddersfield recently.
The basic idea is to convert the XML output of HIP 2 and HIP 3 into a Perl data structure, which you can then use to repurpose your bib data and searches for other uses (e.g. to provide an OpenSearch interface).
The first chunk of code I’m making available provides a function (parseBib) that will convert the XML from a full bib page into a data structure. Given the v0.01, you should treat this as alpha code at best!
http://www.daveyp.com/files/stuff/xmlparser/bib.pl
The above Perl script also contains some code to fetch the XML (using LWP) and will also dump (using Data::Dumper) the resulting Perl data structure to an output text file (dump_output.txt). I’ve also uploaded the code as a CGI file that you can run to display the Data::Dumper output – e.g.:
Building an object-oriented database system : the story of O2 /
Just to get you started, here’s some further info…

If $content contains the XML output from HIP, then create the data structure by calling parseBib like this:
my $info = parseBib( $content );
The above will pass $content by reference, but you could call the function using parseBib( $content ) if you wanted to.
In return, parseBib will return a reference to the data structure. Here are some sample bits of code to give you an idea of how to access the data:
# main item title... print $info->{title}; # ...or... print $info->{titles}[0]; # main/first author... print $info->{authors}[0]; # ...and second author (if any)... print $info->{authors}[1]; # main publisher... print $info->{publishers}[0]; # series title... print $info->{bibContents}->{Series}[0]; # ISBN... print $info->{isbn}; # first subject heading... print $info->{subjects}[0]; # total number of subject headings... print $info->{subjectCount};
In the event that something goes pear shaped, then $info->{error} will contain a value — therefore you should check to see if error has a value before you assume that $info actual contains useful information.
I’ll post the companion code which parses a set of results soon!
If you have any comments or suggestions, then please post a reply or email me (d.c.pattern{at}hud.ac.uk)
[update]
Just a few more comments:
1) As a rule of thumb, no two HIP installs are the same — your “Call No.” label is someone else’s “Dewey Class” label. These little customisations all affect the XML output from HIP, and will therefore feed into the Perl data structure. So, if you don’t include notes (or local notes) in your HIP display, then you won’t get them appearing in the XML output.
2) Some parts of the Perl data structure are fixed, but others are more fluid and will depend upon the bib you are looking at. The Perl data structure is littered with various counts (e.g. $info->{subjectCount}) that you can check to see if there’s usable data. For example, if you wanted to print out all the subject headings, then any of these three chunks of code will do the job:
# chunk 1 foreach my $subjectHeading ( @{$info->{subjects}} ) { print $subjectHeading."n"; } # chunk 2 foreach my $loop ( 1 .. $info->{subjectCount} ) { my $offset = $loop-1; print $info->{subjects}[$offset]."n"; } # chunk 3 print join( "n", @{$info->{subjects}} );

2 thoughts on “HIP XML Parser (v0.01)”

Peter says:

11 November 2009 at 5:46 am
Edit: Tags got stripped out because I put brackets around them
An old post, but I started playing with this, but it fails because our XML output lacks a whole bunch of elements. Tags like TITLE, AUTHOR , CALL, SUBJECT etc. are all missing. We’re on 3.08 as well and I passed in all the same parameters. Is there something that needs to be set in HIP for this to work?
Dave Pattern says:

13 November 2009 at 7:36 pm
Hi Peter
I think some of the tags depend on how HIP has been configured — could you post (or email) a link for you OPAC and I’ll have a look.
It might also be worth looking at the Data::Dumper file that should get generated. That’ll show how the XML output from HIP is mapped into the Perl data structure.

Comments are closed.