Horizon/HIP – Page 8 – Self Plagiarism is Style

IE7 beta 2 and OpenSearch Autodiscovery

I’m on holiday this week, so it’s giving me a chance to catch up on things.
I bit the bullet last night and installed IE7 beta 2 on my laptop — partly to see if all of our library web sites work okay, but mostly to see how it handles RSS and OpenSearch.
After a virtual prod a couple of weeks ago from Richard Wallis @ Talis, I added an OpenSearch interface to our OPAC (webcat.hud.ac.uk/OpenSearch.xml). The ability to then use the a9.com site to do a MetaLib-like cross search of multiple resources (e.g. Wikipedia and the OPAC) is a pretty cool feature, especially if you’re doing research — just bring up an article from Wikipedia and you get to see relevant library holdings at the same time:

Continue reading “IE7 beta 2 and OpenSearch Autodiscovery”

say hi to “pewbot”!

I’ve knocked together a web service front end for our “people who borrowed this, borrowed that…” data. For want of a better name, I’ve christened it “pewbot” (people who borrowed this).
To use the pewbot service, call it using a URL in the format:

https://library.hud.ac.uk:4128/pewbot/[ISBN]

…where ISBN is a 10 digit ISBN (sorry – no ISBN 13 support just yet!)
There are 5 possible error messages that might get returned:

invalid ISBN
the ISBN was not valid
ISBN not found
the ISBN was not found on our catalogue
not enough data for ISBN
the ISBN was found on our catalogue, but we don’t have enough circ data to generate any “borrowed that”s
time out
the service timed out before it completed processing the request
database unavailable
the backend database is unavailable

To see a sample error, try https://library.hud.ac.uk:4128/pewbot/garbage.
Assuming you don’t get an error, you’ll get a list of ISBNs and frequency counts in the following format:

<isbn count="[COUNT]">[ISBN]</isbn>

…where ISBN is a “borrowed that” item and COUNT is the number of borrowers who borrowed both that ISBN and the original ISBN that you sent to the web service.
Continue reading “say hi to “pewbot”!”

Live OPAC search terms display

Another shameless hack inspired by the “Making Visible the Invisible” at SPL.
I’ve tweaked HIP to cache keyword search terms and then put together a couple of pages that display successful searches (in tasteful shades of purple and lilac) and failed searches (in gruesome greens).
IE has a nice CSS blur, so I’ve coupled that with Ajax to provide a constantly updating web page where new terms appear at the front and then drop slowly to the back, becoming more and more blurred and darker as they recede (click to view full size versions):

Dewey DNA Profile your checkouts

It has come to my attention that we have a large number of items being removed from our shelves. Whoever is doing this is being extremely clever by not removing too many from any one shelf.
I have long harboured suspicions that this is in some way related to all of those people who keep wandering into the Library. I suspect that our Counter Staff are in cahoots with these so-called “borrowers” as they allow many of them to walk out of the Library unchallenged carrying piles of books.
To aid my investigations of this secretive “Lending Culture”, I have compiled a Dewey DNA Profile of items that were “borrowed” in the last 28 days:

If you wish to create a similar profile for items “borrowed” from your Library, then you may find the Perl deweydna.script can aid you in your sleuthing.
Now, if you will excuse me, I must don my deerstalker hat and re-light my pipe.
————-
Seriously tho, this is kinda inspired by the wonderful “Making Visible the Invisible” at Seattle Public Library.

Curse you Superpatron!

It’s way past my bedtime, but the Ann Arbor Superpatron has been planting ideas in my head again…

Recently Checked Out Books feed (in RSS or otherwise)

I’ve not built a feed, but I have come up with these two representations of the most recent check outs (click for larger versions):
1) The last 30 covers to walk out the door…

2) Word Splat!…

…that particular splat is entitled “And Treacle Challenge Yorkshire” and is now on sale for only $395,000 (serious bidders only please!)
Word Splat! is made up of words from the titles of the most recent X number of check outs (where X is a roughly a handful).
I made a typo when initially coding the Word Splat!, and ended up with a random sub selection of words at the top left. I kinda like that, so whatever you get at the top left (if anything) is officially the title of that Splat!
update…
I’ve added three more book collages:
1) 30 Overdue Books
2) 30 Most Recent Requests
3) 30 Most Borrowed Books
The “Overdue Books” are a random selection of items that were due back on the previous day, but have yet to turn up.

A Perfect Library 2.0 Day

Just relaxing with a glass of wine after a very very Library 2.0 day 🙂
With a lot of help from Iman Moradi (blog/flickr), we ran an introduction to Library 2.0 for members of our Subject Teams and Tech Services this afternoon. Then, after a coffee break, we watched the SirsiDynix Institute Weblogs & Libraries: Communication, Conversation, and the Blog People web seminar given by Michael Stephens.
All in all, it’s given us a lot to discuss as we look towards (hopefully) implementing a Library Services or Computing & Library Services weblog. Fingers crossed that next week’s Library 2.0 Web Seminar will be as much fun. I’m keen to run into Stephen Abram at the upcoming SirsiDynix SuperConference in Birmingham as I want to find out what Library 2.0 things the company has in the pipeline — the API layer in the upcoming Horizon 8 release is defintely a welcome step in the right direction.
There was a lot of interest amongst staff in the new NCSU OPAC, especially as a lot of pioneering work on faceted searching was carried at here at Huddersfield by Amanda Tinker and Steve Pollit. I’m hoping that there might be potential for us to implement some of Amanda and Steve’s research into our OPAC.
We’ve also got a plateful of potential new features to unleash on our unsuspecting students — simple renewals via email, RSS feeds, keyword search alerts, “people who borrowed this…”, and more. I’m hoping to see if we can’t do some cool stuff with SMS as well.
2006 is already shaping up to be a busy year for the Library Systems Team — we’ll be involved in the RFID implementation and stock conversion (we’re currently out to tender on this) and we’re also implementing Talis Reading List. One thing I can’t stand is having nothing to do, so I’m not complaining 😀
I noticed Talis have stated that both John Blyberg and myself are developing these things purely for our own patrons/students. Whilst that’s true to an extent (after all, I work for Huddersfield not SirsiDynix), we’re both freely sharing much our code so that other Innovative and SirsiDynix customers can play around with it if they want to. Librarians have a long and proud tradition of sharing freely and I don’t intend to buck that trend just yet.
Speaking of which, I’ve been busy working on a Perl module to process the XML output from HIP 2.x/3.x and turn it into a simple Perl data structure. The XML output from HIP gives you pretty much all the information you need, but the structure is a little unwieldy. I’m hopeful the module will make it easier to quickly develop cool stuff like RSS feeds and OpenSearch interfaces from the OPAC. Once I’ve got the module finished (and posted on this site), I’ll also use it underpin the REST interface. In turn, that should make the REST code more manageable and I might be able to get that code to a stage where I’d be happy to make it available to the SirsiDynix community.
Unfortunately I’m currently suffering from a mild case of tendonitis in my right arm and hand, so I’m not doing as much coding as normal until it clears up. Still, as long as I can lift a glass of wine and snuggle up to Bry on the sofa in front of the TV, I’m happy 🙂

“Did You Mean?” – part 2

I’ve been keeping an eye on the search terms and suggestions over the last few days and I noticed that we’re getting quite a few people getting failed keyword searches simply because there’s nothing that matches the term.
In particular, we’ve got a lot of students searching for diuretics. As there’s no matches found, the spell checker jumps in and suggests things like dietetics, natriuretic or diabetics. That got me wondering if there was a way of generating suggestions relevant to diuretics, rather than words that look or sound like it.
As a prototype, I’ve modified the Perl script to query the Answers.com web site and parse the response. The hyperlinks text is compared with known keywords in the subject index and a tag cloud is generated (click to view larger version):

I’ve named it “Serendipity” simply because I’ve no idea what’s going to appear in there — the suggested keywords might be relevant (Hypertension and Caffeine) or they may be too broad (Medicine) to be of use.
Continue reading ““Did You Mean?” – part 2″

“Did You Mean?” for HIP 2 or 3

[update: we’re now using Aspell and the Text::Aspell module]
HIP 4 contains a spellchecking “did you mean?” facility which, although not as powerful as Googles, is certainly a step in the right direction. One of the basic rules of designing any web based system that supports searching or browsing is to always give the user choices — even if they have gone down a virtual one way street and hit a dead end.
Unfortunately it’s going to be another few months before SirsiDynix release the UK enhanced version of HIP 4 for beta testing, so I thought I’d have a stab at adding the facility to our existing HIP 3.04 server.
Fortunately Perl provides a number of modules for this kind of thing, including String::Approx, Text::Metaphone, and Text::Soundex.
String::Approx is good at catching simple typos (e.g. Hudersfield or embarassement) whereas the latter two modules attempt to find “sounds like” matches — for example, when given batched, Text::Metaphone suggests scratched, thatched and matched.
To set something like this up, you need to have a word list. You could download one (e.g. a list of dictionary words), but it makes more sense to generate your own — in my case I’ve parsed Horizon’s title table to create a list of keywords and frequency. That’s given me a list of nearly 67,000 keywords that all bring up matches in either a general or title keyword search.
Once I’d got the keyword list, I ran it through Text::Metaphone and Text::Soundex to generate the relevant phonetic values — doing that in advance means that your spellchecking code can run faster as it doesn’t need to generate the values again for each incoming request.
Next up, I wrote an Apache mod_perl handle to create the suggestions from a given search term. As String::Approx can often give the best results, the term is run against that first. If no suggestions are found, the term is run against Text::Metaphone and then Text::Soundex in turn to find broader “sounds like” suggestions.
Assuming that one of the modules comes up with a least one suggestion, then that gets displayed in HIP:

There’s still more work to do, as the suggestions only appear for a failed single keyword. Handling two misspelled words (or more) is technically challenging — what’s the best method of presenting all the possible options to a user? You could just give them a list of possibilities, but I’d prefer to give them something they can click on to initiate a new search.

Amazon.co.uk Greasemonkey script

This Greasemonkey script (based on Carrick Mundell‘s script) for the Firefox web browser adds details of (and links to) our library holdings to the Amazon.co.uk site:

To use the script, you’ll need to do the following:
1) Make sure you have the latest version of Firefox installed
2) Go to http://greasemonkey.mozdev.org/ and install the Greasemonkey extension

3) Close down all your Firefox browser windows and restart it
4) You should now see a little smiling Greasemonkey at the bottom right hand corner of the browser

5) Now go to https://library.hud.ac.uk/firefox/, click on the hudamazon.user.js script, and then click on the “Install” button.

6) Now go to Amazon.co.uk and search for some books!

~ o ~

For reference, here are the various messages that might appear:
1) Available…
Copies of the item are available.

2) Available in electronic format…
Access is available to an electronic version of the item.

3) Due back…
The item is currently on loan.

4) Other editions of this title are available…
We don’t have this specific edition of the item, but we do have others – click on the link to show them all.

5) ISBN not found…
The Amazon ISBN doesn’t match anything on our catalogue, but you can click on the link to start a title keyword search.

REST output from Huddersfield’s catalogue

Inspiried by John Blyberg’s middleware which provides REST output from the Ann Arbor catalogue, I’ve put together something similar for ours:

https://library.hud.ac.uk/rest/info.html

Here are some sample results:

http://161.112.232.203:4128/rest/keyword/author/tolkien
http://161.112.232.203:4128/rest/keyword/subject/yorkshire railways
http://161.112.232.203:4128/rest/record/id/411760
http://161.112.232.203:4128/rest/record/isbn/1904633048

I’ve decided to include more information in the output than John did — primarily because I want to use the REST output to power a staff OPAC. Amongst other things, the output includes:

borrowing suggestions (based on historical circ data)
links to cover scans and thumbnails
loan history data (at both the bib and item level)
“other edition” links (using the OCLC xISBN service)

The output is littered with xlink links, which can be used to issue further REST requests.
For those of you who like gory techie details, the REST output is generated by approx 1,000 lines of Perl coded as a single mod_perl handle. The code works by fetching the XML output from HIP and then parsing it to strip out anything that’s not required in the REST output. At the same time, extra information is pulled in from other sources (e.g. direct from the Horizon database, and from the xISBN service).
Unfortunately, looking at the XML output from other HIP servers, I doubt the code can quickly be used by other Horizon sites. Also, not everyone has their own mod_perl server to the the code on. However, if anyone wants to play around with the code then please send me an email (d.c.pattern [at] hud.ac.uk). There’s also a cloud on the Horizon (pun itended) relating to getting XML output out of HIP 4 — it seems Dynix have chosen to make it harder (not easier) to do this with the latest version of their OPAC (boo! hiss!).
I’ve already said that I’m planning to use the REST output to power a staff OPAC, but what I’m really keen on is letting our students loose on the data for use in final year projects, etc. I’m also planning to use the output for a revised version of the Amazon Greasemonkey script.
The University is gradually moving towards a portal environment and I’m hoping the REST output will come in handy for dropping live catalogue content into other systems.
There’s still quite a bit of work to do, especially with adding information for journals. We’ve already got live journal information from our SFX OpenURL server appearing in our OPAC, so I might as well include that in the REST output too:

Have fun!