Library Stuff – Page 22 – Self Plagiarism is Style

Live OPAC search terms display

Another shameless hack inspired by the “Making Visible the Invisible” at SPL.
I’ve tweaked HIP to cache keyword search terms and then put together a couple of pages that display successful searches (in tasteful shades of purple and lilac) and failed searches (in gruesome greens).
IE has a nice CSS blur, so I’ve coupled that with Ajax to provide a constantly updating web page where new terms appear at the front and then drop slowly to the back, becoming more and more blurred and darker as they recede (click to view full size versions):

Dewey DNA Profile your checkouts

It has come to my attention that we have a large number of items being removed from our shelves. Whoever is doing this is being extremely clever by not removing too many from any one shelf.
I have long harboured suspicions that this is in some way related to all of those people who keep wandering into the Library. I suspect that our Counter Staff are in cahoots with these so-called “borrowers” as they allow many of them to walk out of the Library unchallenged carrying piles of books.
To aid my investigations of this secretive “Lending Culture”, I have compiled a Dewey DNA Profile of items that were “borrowed” in the last 28 days:

If you wish to create a similar profile for items “borrowed” from your Library, then you may find the Perl deweydna.script can aid you in your sleuthing.
Now, if you will excuse me, I must don my deerstalker hat and re-light my pipe.
————-
Seriously tho, this is kinda inspired by the wonderful “Making Visible the Invisible” at Seattle Public Library.

Curse you Superpatron!

It’s way past my bedtime, but the Ann Arbor Superpatron has been planting ideas in my head again…

Recently Checked Out Books feed (in RSS or otherwise)

I’ve not built a feed, but I have come up with these two representations of the most recent check outs (click for larger versions):
1) The last 30 covers to walk out the door…

2) Word Splat!…

…that particular splat is entitled “And Treacle Challenge Yorkshire” and is now on sale for only $395,000 (serious bidders only please!)
Word Splat! is made up of words from the titles of the most recent X number of check outs (where X is a roughly a handful).
I made a typo when initially coding the Word Splat!, and ended up with a random sub selection of words at the top left. I kinda like that, so whatever you get at the top left (if anything) is officially the title of that Splat!
update…
I’ve added three more book collages:
1) 30 Overdue Books
2) 30 Most Recent Requests
3) 30 Most Borrowed Books
The “Overdue Books” are a random selection of items that were due back on the previous day, but have yet to turn up.

A Perfect Library 2.0 Day

Just relaxing with a glass of wine after a very very Library 2.0 day 🙂
With a lot of help from Iman Moradi (blog/flickr), we ran an introduction to Library 2.0 for members of our Subject Teams and Tech Services this afternoon. Then, after a coffee break, we watched the SirsiDynix Institute Weblogs & Libraries: Communication, Conversation, and the Blog People web seminar given by Michael Stephens.
All in all, it’s given us a lot to discuss as we look towards (hopefully) implementing a Library Services or Computing & Library Services weblog. Fingers crossed that next week’s Library 2.0 Web Seminar will be as much fun. I’m keen to run into Stephen Abram at the upcoming SirsiDynix SuperConference in Birmingham as I want to find out what Library 2.0 things the company has in the pipeline — the API layer in the upcoming Horizon 8 release is defintely a welcome step in the right direction.
There was a lot of interest amongst staff in the new NCSU OPAC, especially as a lot of pioneering work on faceted searching was carried at here at Huddersfield by Amanda Tinker and Steve Pollit. I’m hoping that there might be potential for us to implement some of Amanda and Steve’s research into our OPAC.
We’ve also got a plateful of potential new features to unleash on our unsuspecting students — simple renewals via email, RSS feeds, keyword search alerts, “people who borrowed this…”, and more. I’m hoping to see if we can’t do some cool stuff with SMS as well.
2006 is already shaping up to be a busy year for the Library Systems Team — we’ll be involved in the RFID implementation and stock conversion (we’re currently out to tender on this) and we’re also implementing Talis Reading List. One thing I can’t stand is having nothing to do, so I’m not complaining 😀
I noticed Talis have stated that both John Blyberg and myself are developing these things purely for our own patrons/students. Whilst that’s true to an extent (after all, I work for Huddersfield not SirsiDynix), we’re both freely sharing much our code so that other Innovative and SirsiDynix customers can play around with it if they want to. Librarians have a long and proud tradition of sharing freely and I don’t intend to buck that trend just yet.
Speaking of which, I’ve been busy working on a Perl module to process the XML output from HIP 2.x/3.x and turn it into a simple Perl data structure. The XML output from HIP gives you pretty much all the information you need, but the structure is a little unwieldy. I’m hopeful the module will make it easier to quickly develop cool stuff like RSS feeds and OpenSearch interfaces from the OPAC. Once I’ve got the module finished (and posted on this site), I’ll also use it underpin the REST interface. In turn, that should make the REST code more manageable and I might be able to get that code to a stage where I’d be happy to make it available to the SirsiDynix community.
Unfortunately I’m currently suffering from a mild case of tendonitis in my right arm and hand, so I’m not doing as much coding as normal until it clears up. Still, as long as I can lift a glass of wine and snuggle up to Bry on the sofa in front of the TV, I’m happy 🙂

“Did You Mean?” – part 2

I’ve been keeping an eye on the search terms and suggestions over the last few days and I noticed that we’re getting quite a few people getting failed keyword searches simply because there’s nothing that matches the term.
In particular, we’ve got a lot of students searching for diuretics. As there’s no matches found, the spell checker jumps in and suggests things like dietetics, natriuretic or diabetics. That got me wondering if there was a way of generating suggestions relevant to diuretics, rather than words that look or sound like it.
As a prototype, I’ve modified the Perl script to query the Answers.com web site and parse the response. The hyperlinks text is compared with known keywords in the subject index and a tag cloud is generated (click to view larger version):

I’ve named it “Serendipity” simply because I’ve no idea what’s going to appear in there — the suggested keywords might be relevant (Hypertension and Caffeine) or they may be too broad (Medicine) to be of use.
Continue reading ““Did You Mean?” – part 2″

“Did You Mean?” for HIP 2 or 3

[update: we’re now using Aspell and the Text::Aspell module]
HIP 4 contains a spellchecking “did you mean?” facility which, although not as powerful as Googles, is certainly a step in the right direction. One of the basic rules of designing any web based system that supports searching or browsing is to always give the user choices — even if they have gone down a virtual one way street and hit a dead end.
Unfortunately it’s going to be another few months before SirsiDynix release the UK enhanced version of HIP 4 for beta testing, so I thought I’d have a stab at adding the facility to our existing HIP 3.04 server.
Fortunately Perl provides a number of modules for this kind of thing, including String::Approx, Text::Metaphone, and Text::Soundex.
String::Approx is good at catching simple typos (e.g. Hudersfield or embarassement) whereas the latter two modules attempt to find “sounds like” matches — for example, when given batched, Text::Metaphone suggests scratched, thatched and matched.
To set something like this up, you need to have a word list. You could download one (e.g. a list of dictionary words), but it makes more sense to generate your own — in my case I’ve parsed Horizon’s title table to create a list of keywords and frequency. That’s given me a list of nearly 67,000 keywords that all bring up matches in either a general or title keyword search.
Once I’d got the keyword list, I ran it through Text::Metaphone and Text::Soundex to generate the relevant phonetic values — doing that in advance means that your spellchecking code can run faster as it doesn’t need to generate the values again for each incoming request.
Next up, I wrote an Apache mod_perl handle to create the suggestions from a given search term. As String::Approx can often give the best results, the term is run against that first. If no suggestions are found, the term is run against Text::Metaphone and then Text::Soundex in turn to find broader “sounds like” suggestions.
Assuming that one of the modules comes up with a least one suggestion, then that gets displayed in HIP:

There’s still more work to do, as the suggestions only appear for a failed single keyword. Handling two misspelled words (or more) is technically challenging — what’s the best method of presenting all the possible options to a user? You could just give them a list of possibilities, but I’d prefer to give them something they can click on to initiate a new search.

Amazon.co.uk Greasemonkey script

This Greasemonkey script (based on Carrick Mundell‘s script) for the Firefox web browser adds details of (and links to) our library holdings to the Amazon.co.uk site:

To use the script, you’ll need to do the following:
1) Make sure you have the latest version of Firefox installed
2) Go to http://greasemonkey.mozdev.org/ and install the Greasemonkey extension

3) Close down all your Firefox browser windows and restart it
4) You should now see a little smiling Greasemonkey at the bottom right hand corner of the browser

5) Now go to https://library.hud.ac.uk/firefox/, click on the hudamazon.user.js script, and then click on the “Install” button.

6) Now go to Amazon.co.uk and search for some books!

~ o ~

For reference, here are the various messages that might appear:
1) Available…
Copies of the item are available.

2) Available in electronic format…
Access is available to an electronic version of the item.

3) Due back…
The item is currently on loan.

4) Other editions of this title are available…
We don’t have this specific edition of the item, but we do have others – click on the link to show them all.

5) ISBN not found…
The Amazon ISBN doesn’t match anything on our catalogue, but you can click on the link to start a title keyword search.

REST output from Huddersfield’s catalogue

Inspiried by John Blyberg’s middleware which provides REST output from the Ann Arbor catalogue, I’ve put together something similar for ours:

https://library.hud.ac.uk/rest/info.html

Here are some sample results:

http://161.112.232.203:4128/rest/keyword/author/tolkien
http://161.112.232.203:4128/rest/keyword/subject/yorkshire railways
http://161.112.232.203:4128/rest/record/id/411760
http://161.112.232.203:4128/rest/record/isbn/1904633048

I’ve decided to include more information in the output than John did — primarily because I want to use the REST output to power a staff OPAC. Amongst other things, the output includes:

borrowing suggestions (based on historical circ data)
links to cover scans and thumbnails
loan history data (at both the bib and item level)
“other edition” links (using the OCLC xISBN service)

The output is littered with xlink links, which can be used to issue further REST requests.
For those of you who like gory techie details, the REST output is generated by approx 1,000 lines of Perl coded as a single mod_perl handle. The code works by fetching the XML output from HIP and then parsing it to strip out anything that’s not required in the REST output. At the same time, extra information is pulled in from other sources (e.g. direct from the Horizon database, and from the xISBN service).
Unfortunately, looking at the XML output from other HIP servers, I doubt the code can quickly be used by other Horizon sites. Also, not everyone has their own mod_perl server to the the code on. However, if anyone wants to play around with the code then please send me an email (d.c.pattern [at] hud.ac.uk). There’s also a cloud on the Horizon (pun itended) relating to getting XML output out of HIP 4 — it seems Dynix have chosen to make it harder (not easier) to do this with the latest version of their OPAC (boo! hiss!).
I’ve already said that I’m planning to use the REST output to power a staff OPAC, but what I’m really keen on is letting our students loose on the data for use in final year projects, etc. I’m also planning to use the output for a revised version of the Amazon Greasemonkey script.
The University is gradually moving towards a portal environment and I’m hoping the REST output will come in handy for dropping live catalogue content into other systems.
There’s still quite a bit of work to do, especially with adding information for journals. We’ve already got live journal information from our SFX OpenURL server appearing in our OPAC, so I might as well include that in the REST output too:

Have fun!

Taggytastic – part 3

God bless Bryony – she puts up with a lot! Last night she had to put up with me adding a tagging system to our test HIP server (wwwlibrarycat.hud.ac.uk).

To be honest, the amount of interest in the subject keyword cloud took me by surprise, and it was fascinating to read all of the other blogs that picked it up. A number of blogs made the very valid point that it wasn’t a true tag cloud – the tags were created using the existing subject keywords and not from tags added by our users.

I began to feel that tag clouds in OPACs are a true “chicken & egg” scenario — to be able to add that kind of functionality to our live OPAC, I need to be able to prove that it’s a valid and worthwhile new feature… but to do that, I need to have already implemented it and to have had a healthy number of tags added by our users. I think this is backed up by the sheer number of people out there who think it’s a “good idea” but who are not in a position to start the ball rolling. Obviously, it’d be a different story if our OPACs already had tag functionality “out of the box”!

I’m sure there are a dozen ways of implementing user tags in an OPAC, but here’s mine…

I want to have some degree of control over the tagging (yes I know, I should be trusting our users!), so on a live system I’d need a user to log into the OPAC before they could add tags.
If they’ve logged in, then I can let them add new tags or delete ones they’ve previously added.
I want to be able to make tag suggestions – for example, if a user wants to tag “Web Site Design using XML and HTML”, then I want to be able to suggest relevant tags that other users have already added (e.g. xml, html, web design, etc). Unless other users have already tagged that particular book, how can I generate suggestions?
There’s more than one way of doing this, but I decided to do it by adding tags to the subject headings as well as to books. In other words, when someone tags a book with html then I also add that tag to all of the subject headings for that book as well. Then, when someone wants to tag a different book that has one of those subject headings, I can suggest html as a possible tag.

This does mean that irrelevant tags can get added to subject headings but (in theory) over time the relevant tags will outweigh the irrelevant ones for each heading. As well as using the tags linked to subject headings, I also take into account any existing tags for that book.

It’s been interesting watching the beta tagging feature on Amazon, although (being early days) they seem to be getting a huge number of irrelevant tags – for example, the latest Harry Potter is tagged with things like good, kerri, brothers present and Jill. How long will it take before the irrelevant tags on Amazon sink away and the relevant ones rise to the top? More to the point, would library patrons be more likely to use relevant tags on an OPAC? Again, it’s the “chicken & egg” — without a healthy critical mass of relevant tags, how do you prove to cynical members of staff that tagging is a good thing and not a method of adding virtual graphitti to the OPAC?

I’ve still got more work to do with our initial attempt at OPAC tagging — at the moment, all you can do is add tags. In particular, I don’t have a method of selecting a tag and then showing all the items that have been tagged… but I’m hoping that Casey Durfee (Seattle Public Library) will come to my rescue. Spookily, I woke up this morning wondering how on earth I could hack our HIP server to allow this and then found that Casey had already emailed me to let me know it was possible!

I’m fairly happy with the suggestions based on subject headings – for example, when I try to add the very first tag to Internet technology and e-commerce, the following suggestions appear:

The suggestions are based on items that have been previously tagged which have the subject headings “Web site development” and/or “Electronic commerce”. Some of the suggestions are more relevant than others (e.g. ecommerce), but at least I’m not getting anything too weird appearing (yet!). Obviously there’s nothing to stop the user adding a new tag, but hopefully having a few suggestions available will help them out.

This should also have the added benefit that each of the subject headings will get a healthy selection of (hopefully relevant) tags attached to them. For example, here are some of the tags I’ve currently got attached to the “Web site development” subject heading (in ranked order):

web services
xml
java
microsoft
dotnet
asp
html
ecommerce
php
macromedia
portals
security
soap
lucene
web design
databases

…wouldn’t it be cool if the OPAC could tie those together when the user does a subject keyword search? For example, if they searched for XML, then the OPAC could suggest that they might be interested in other books under the “Web site development” subject heading. Or, if they were looking at books under the “Web site development” subject heading, then the OPAC could suggest that they might also be interested in books tagged with things like xml, web services and java.

I guess I’ve tagged about 150 books so far (mostly those to do with XML and HTML), but what I’d really like to do is throw open the doors and invite anyone who reads this weblog post to jump in and start tagging our catalogue.

When you first try to tag an item, the server will attempt to save a cookie in your web browser — you can see the value of this cookie in the “debug” section, along with the user number assigned to that cookie. The only reason I’m doing this is to try and give you an option to remove any tags that you’ve added (but not ones that other people have added).

Also, the script doesn’t refresh the book page with your new tags — so, once you’ve added your tags, you’ll need to refresh the book page to make them appear.

I’ll keep on working on the code over the holidays and, once it’s in a stable state, I’ll post the scripts here.

If anyone has any comments or suggestions, please feel to email me:

d.c.pattern [at] hud.ac.uk
email [at] daveyp.com

Have fun folks!

Folksonomies

I’ve just stumbled across an excellent article by Ellyssa Kroski about folksonomies (user, rather than expert, created taxonomies):

The Hive Mind: Folksonomies and User-Based Tagging

Now that Amazon is letting its users tag items, how long before we see this functionality in the OPAC? I’m getting very tempted to add a tagging facility to our test OPAC server.