Self Plagiarism is Style – Page 37

A Perfect Library 2.0 Day

Just relaxing with a glass of wine after a very very Library 2.0 day 🙂
With a lot of help from Iman Moradi (blog/flickr), we ran an introduction to Library 2.0 for members of our Subject Teams and Tech Services this afternoon. Then, after a coffee break, we watched the SirsiDynix Institute Weblogs & Libraries: Communication, Conversation, and the Blog People web seminar given by Michael Stephens.
All in all, it’s given us a lot to discuss as we look towards (hopefully) implementing a Library Services or Computing & Library Services weblog. Fingers crossed that next week’s Library 2.0 Web Seminar will be as much fun. I’m keen to run into Stephen Abram at the upcoming SirsiDynix SuperConference in Birmingham as I want to find out what Library 2.0 things the company has in the pipeline — the API layer in the upcoming Horizon 8 release is defintely a welcome step in the right direction.
There was a lot of interest amongst staff in the new NCSU OPAC, especially as a lot of pioneering work on faceted searching was carried at here at Huddersfield by Amanda Tinker and Steve Pollit. I’m hoping that there might be potential for us to implement some of Amanda and Steve’s research into our OPAC.
We’ve also got a plateful of potential new features to unleash on our unsuspecting students — simple renewals via email, RSS feeds, keyword search alerts, “people who borrowed this…”, and more. I’m hoping to see if we can’t do some cool stuff with SMS as well.
2006 is already shaping up to be a busy year for the Library Systems Team — we’ll be involved in the RFID implementation and stock conversion (we’re currently out to tender on this) and we’re also implementing Talis Reading List. One thing I can’t stand is having nothing to do, so I’m not complaining 😀
I noticed Talis have stated that both John Blyberg and myself are developing these things purely for our own patrons/students. Whilst that’s true to an extent (after all, I work for Huddersfield not SirsiDynix), we’re both freely sharing much our code so that other Innovative and SirsiDynix customers can play around with it if they want to. Librarians have a long and proud tradition of sharing freely and I don’t intend to buck that trend just yet.
Speaking of which, I’ve been busy working on a Perl module to process the XML output from HIP 2.x/3.x and turn it into a simple Perl data structure. The XML output from HIP gives you pretty much all the information you need, but the structure is a little unwieldy. I’m hopeful the module will make it easier to quickly develop cool stuff like RSS feeds and OpenSearch interfaces from the OPAC. Once I’ve got the module finished (and posted on this site), I’ll also use it underpin the REST interface. In turn, that should make the REST code more manageable and I might be able to get that code to a stage where I’d be happy to make it available to the SirsiDynix community.
Unfortunately I’m currently suffering from a mild case of tendonitis in my right arm and hand, so I’m not doing as much coding as normal until it clears up. Still, as long as I can lift a glass of wine and snuggle up to Bry on the sofa in front of the TV, I’m happy 🙂

“Did You Mean?” – part 2

I’ve been keeping an eye on the search terms and suggestions over the last few days and I noticed that we’re getting quite a few people getting failed keyword searches simply because there’s nothing that matches the term.
In particular, we’ve got a lot of students searching for diuretics. As there’s no matches found, the spell checker jumps in and suggests things like dietetics, natriuretic or diabetics. That got me wondering if there was a way of generating suggestions relevant to diuretics, rather than words that look or sound like it.
As a prototype, I’ve modified the Perl script to query the Answers.com web site and parse the response. The hyperlinks text is compared with known keywords in the subject index and a tag cloud is generated (click to view larger version):

I’ve named it “Serendipity” simply because I’ve no idea what’s going to appear in there — the suggested keywords might be relevant (Hypertension and Caffeine) or they may be too broad (Medicine) to be of use.
Continue reading ““Did You Mean?” – part 2″

“Did You Mean?” for HIP 2 or 3

[update: we’re now using Aspell and the Text::Aspell module]
HIP 4 contains a spellchecking “did you mean?” facility which, although not as powerful as Googles, is certainly a step in the right direction. One of the basic rules of designing any web based system that supports searching or browsing is to always give the user choices — even if they have gone down a virtual one way street and hit a dead end.
Unfortunately it’s going to be another few months before SirsiDynix release the UK enhanced version of HIP 4 for beta testing, so I thought I’d have a stab at adding the facility to our existing HIP 3.04 server.
Fortunately Perl provides a number of modules for this kind of thing, including String::Approx, Text::Metaphone, and Text::Soundex.
String::Approx is good at catching simple typos (e.g. Hudersfield or embarassement) whereas the latter two modules attempt to find “sounds like” matches — for example, when given batched, Text::Metaphone suggests scratched, thatched and matched.
To set something like this up, you need to have a word list. You could download one (e.g. a list of dictionary words), but it makes more sense to generate your own — in my case I’ve parsed Horizon’s title table to create a list of keywords and frequency. That’s given me a list of nearly 67,000 keywords that all bring up matches in either a general or title keyword search.
Once I’d got the keyword list, I ran it through Text::Metaphone and Text::Soundex to generate the relevant phonetic values — doing that in advance means that your spellchecking code can run faster as it doesn’t need to generate the values again for each incoming request.
Next up, I wrote an Apache mod_perl handle to create the suggestions from a given search term. As String::Approx can often give the best results, the term is run against that first. If no suggestions are found, the term is run against Text::Metaphone and then Text::Soundex in turn to find broader “sounds like” suggestions.
Assuming that one of the modules comes up with a least one suggestion, then that gets displayed in HIP:

There’s still more work to do, as the suggestions only appear for a failed single keyword. Handling two misspelled words (or more) is technically challenging — what’s the best method of presenting all the possible options to a user? You could just give them a list of possibilities, but I’d prefer to give them something they can click on to initiate a new search.

Amazon.co.uk Greasemonkey script

This Greasemonkey script (based on Carrick Mundell‘s script) for the Firefox web browser adds details of (and links to) our library holdings to the Amazon.co.uk site:

To use the script, you’ll need to do the following:
1) Make sure you have the latest version of Firefox installed
2) Go to http://greasemonkey.mozdev.org/ and install the Greasemonkey extension

3) Close down all your Firefox browser windows and restart it
4) You should now see a little smiling Greasemonkey at the bottom right hand corner of the browser

5) Now go to https://library.hud.ac.uk/firefox/, click on the hudamazon.user.js script, and then click on the “Install” button.

6) Now go to Amazon.co.uk and search for some books!

~ o ~

For reference, here are the various messages that might appear:
1) Available…
Copies of the item are available.

2) Available in electronic format…
Access is available to an electronic version of the item.

3) Due back…
The item is currently on loan.

4) Other editions of this title are available…
We don’t have this specific edition of the item, but we do have others – click on the link to show them all.

5) ISBN not found…
The Amazon ISBN doesn’t match anything on our catalogue, but you can click on the link to start a title keyword search.

REST output from Huddersfield’s catalogue

Inspiried by John Blyberg’s middleware which provides REST output from the Ann Arbor catalogue, I’ve put together something similar for ours:

https://library.hud.ac.uk/rest/info.html

Here are some sample results:

http://161.112.232.203:4128/rest/keyword/author/tolkien
http://161.112.232.203:4128/rest/keyword/subject/yorkshire railways
http://161.112.232.203:4128/rest/record/id/411760
http://161.112.232.203:4128/rest/record/isbn/1904633048

I’ve decided to include more information in the output than John did — primarily because I want to use the REST output to power a staff OPAC. Amongst other things, the output includes:

borrowing suggestions (based on historical circ data)
links to cover scans and thumbnails
loan history data (at both the bib and item level)
“other edition” links (using the OCLC xISBN service)

The output is littered with xlink links, which can be used to issue further REST requests.
For those of you who like gory techie details, the REST output is generated by approx 1,000 lines of Perl coded as a single mod_perl handle. The code works by fetching the XML output from HIP and then parsing it to strip out anything that’s not required in the REST output. At the same time, extra information is pulled in from other sources (e.g. direct from the Horizon database, and from the xISBN service).
Unfortunately, looking at the XML output from other HIP servers, I doubt the code can quickly be used by other Horizon sites. Also, not everyone has their own mod_perl server to the the code on. However, if anyone wants to play around with the code then please send me an email (d.c.pattern [at] hud.ac.uk). There’s also a cloud on the Horizon (pun itended) relating to getting XML output out of HIP 4 — it seems Dynix have chosen to make it harder (not easier) to do this with the latest version of their OPAC (boo! hiss!).
I’ve already said that I’m planning to use the REST output to power a staff OPAC, but what I’m really keen on is letting our students loose on the data for use in final year projects, etc. I’m also planning to use the output for a revised version of the Amazon Greasemonkey script.
The University is gradually moving towards a portal environment and I’m hoping the REST output will come in handy for dropping live catalogue content into other systems.
There’s still quite a bit of work to do, especially with adding information for journals. We’ve already got live journal information from our SFX OpenURL server appearing in our OPAC, so I might as well include that in the REST output too:

Have fun!

Taggytastic – part 3

God bless Bryony – she puts up with a lot! Last night she had to put up with me adding a tagging system to our test HIP server (wwwlibrarycat.hud.ac.uk).

To be honest, the amount of interest in the subject keyword cloud took me by surprise, and it was fascinating to read all of the other blogs that picked it up. A number of blogs made the very valid point that it wasn’t a true tag cloud – the tags were created using the existing subject keywords and not from tags added by our users.

I began to feel that tag clouds in OPACs are a true “chicken & egg” scenario — to be able to add that kind of functionality to our live OPAC, I need to be able to prove that it’s a valid and worthwhile new feature… but to do that, I need to have already implemented it and to have had a healthy number of tags added by our users. I think this is backed up by the sheer number of people out there who think it’s a “good idea” but who are not in a position to start the ball rolling. Obviously, it’d be a different story if our OPACs already had tag functionality “out of the box”!

I’m sure there are a dozen ways of implementing user tags in an OPAC, but here’s mine…

I want to have some degree of control over the tagging (yes I know, I should be trusting our users!), so on a live system I’d need a user to log into the OPAC before they could add tags.
If they’ve logged in, then I can let them add new tags or delete ones they’ve previously added.
I want to be able to make tag suggestions – for example, if a user wants to tag “Web Site Design using XML and HTML”, then I want to be able to suggest relevant tags that other users have already added (e.g. xml, html, web design, etc). Unless other users have already tagged that particular book, how can I generate suggestions?
There’s more than one way of doing this, but I decided to do it by adding tags to the subject headings as well as to books. In other words, when someone tags a book with html then I also add that tag to all of the subject headings for that book as well. Then, when someone wants to tag a different book that has one of those subject headings, I can suggest html as a possible tag.

This does mean that irrelevant tags can get added to subject headings but (in theory) over time the relevant tags will outweigh the irrelevant ones for each heading. As well as using the tags linked to subject headings, I also take into account any existing tags for that book.

It’s been interesting watching the beta tagging feature on Amazon, although (being early days) they seem to be getting a huge number of irrelevant tags – for example, the latest Harry Potter is tagged with things like good, kerri, brothers present and Jill. How long will it take before the irrelevant tags on Amazon sink away and the relevant ones rise to the top? More to the point, would library patrons be more likely to use relevant tags on an OPAC? Again, it’s the “chicken & egg” — without a healthy critical mass of relevant tags, how do you prove to cynical members of staff that tagging is a good thing and not a method of adding virtual graphitti to the OPAC?

I’ve still got more work to do with our initial attempt at OPAC tagging — at the moment, all you can do is add tags. In particular, I don’t have a method of selecting a tag and then showing all the items that have been tagged… but I’m hoping that Casey Durfee (Seattle Public Library) will come to my rescue. Spookily, I woke up this morning wondering how on earth I could hack our HIP server to allow this and then found that Casey had already emailed me to let me know it was possible!

I’m fairly happy with the suggestions based on subject headings – for example, when I try to add the very first tag to Internet technology and e-commerce, the following suggestions appear:

The suggestions are based on items that have been previously tagged which have the subject headings “Web site development” and/or “Electronic commerce”. Some of the suggestions are more relevant than others (e.g. ecommerce), but at least I’m not getting anything too weird appearing (yet!). Obviously there’s nothing to stop the user adding a new tag, but hopefully having a few suggestions available will help them out.

This should also have the added benefit that each of the subject headings will get a healthy selection of (hopefully relevant) tags attached to them. For example, here are some of the tags I’ve currently got attached to the “Web site development” subject heading (in ranked order):

web services
xml
java
microsoft
dotnet
asp
html
ecommerce
php
macromedia
portals
security
soap
lucene
web design
databases

…wouldn’t it be cool if the OPAC could tie those together when the user does a subject keyword search? For example, if they searched for XML, then the OPAC could suggest that they might be interested in other books under the “Web site development” subject heading. Or, if they were looking at books under the “Web site development” subject heading, then the OPAC could suggest that they might also be interested in books tagged with things like xml, web services and java.

I guess I’ve tagged about 150 books so far (mostly those to do with XML and HTML), but what I’d really like to do is throw open the doors and invite anyone who reads this weblog post to jump in and start tagging our catalogue.

When you first try to tag an item, the server will attempt to save a cookie in your web browser — you can see the value of this cookie in the “debug” section, along with the user number assigned to that cookie. The only reason I’m doing this is to try and give you an option to remove any tags that you’ve added (but not ones that other people have added).

Also, the script doesn’t refresh the book page with your new tags — so, once you’ve added your tags, you’ll need to refresh the book page to make them appear.

I’ll keep on working on the code over the holidays and, once it’s in a stable state, I’ll post the scripts here.

If anyone has any comments or suggestions, please feel to email me:

d.c.pattern [at] hud.ac.uk
email [at] daveyp.com

Have fun folks!

Folksonomies

I’ve just stumbled across an excellent article by Ellyssa Kroski about folksonomies (user, rather than expert, created taxonomies):

The Hive Mind: Folksonomies and User-Based Tagging

Now that Amazon is letting its users tag items, how long before we see this functionality in the OPAC? I’m getting very tempted to add a tagging facility to our test OPAC server.

HIP Tips!

Adding a message to your log in page
At Huddersfield, we’ve added some text to our HIP login page:

This is a really easy hack and just involves editing a single XSL page (security.xsl).
As always, make a safe backup of the file before you do any editing!
Firstly, open up security.xsl in your favourite text editor. If you use Microsoft Notepad and it looks a mess — e.g. you get weird squares (𘂅) appearing — then try opening the stylesheet using Wordpad instead.
Go down to the end of the file, and it should look a little like this:

</center>
</td>
</tr>
</table>
</form>
</xsl:if>
</xsl:template>
</xsl:stylesheet>

Simply insert some well formatted (i.e. XHTML) code between the </form> and the </xsl:if> – for example:

</center>
</td>
</tr>
</table>
</form>
<!-- extra info for users - added by Dave -->
<p /><div align="center" style="font-size:80%">
<b>Borrower ID</b> is the 10 digits of your Campus ID Card
<br />
<b>PIN Number</b> is the 4 digits of the day and month
of your birth (e.g. 0206 for 2nd of June)
<p />
<span style="border-bottom:dashed black 1px; color:red;">
<b>Don't forget to logout when you have finished!</b>
</span>
</div>
<!-- end of extra info -->
</xsl:if>
</xsl:template>
</xsl:stylesheet>

…you should always add comments to any code you add to a stylesheet — it will help you locate your changes when it’s 3 months down the line and you can’t remember what you did, or why you did it!

All this was done using HIP 3.04 (UK release) and it should work for other versions of HIP 3.

p.s. if you like the idea of reminding a logged in user to log out when they’ve finished, check out this tip too!

Quotes of the Month

Jenny Levine has linked to an excellent article by Roy Tennant on the Library Journal web site:

“What I Wish I Had Known“

I love Roy’s statement that:

I wish I had known that the solution for needing to teach our users how to search our catalog was to create a system that didn’t need to be taught — and that we would spend years asking vendors for systems that solved our problems but did little to serve our users.

A few minutes later, I stumbled across Jennifer Matthews‘ blog – she’s a student of English and Comparative Literary Studies at the University of Warwick:

So I figure that the library is evil. And it hates me.
(The Library)

HIP Tips!

Do your 856 URLs show up in a big font size that doesn’t seem to quite fit in with the rest of the text on the full bib page?
The quickest way to fix it is to fire up the Horizon table editor, select marc_map, and then locate the marc_map that you use for your 856 URLs.
In the “HTML format (Info Portal only)” field, insert class="smallAnchor" before the href. For example, if your HTML format looks like this:

<a href="$_">{<img src="$9">|$y|$_}</a>

…then change it to:

<a class="smallAnchor" href="$_">{<img src="$9">|$y|$_}</a>

Save the change, and then restart JBoss and the 856 links should pick up the formatting of the “smallAnchor” element from your HIP cascading style sheets (CSS).
And, for the more adventurous – if you’d like to know which 856 links your users are clicking on, then you can set your marc_map up to redirect to a CGI script that logs the URL and then redirects the user’s web browser to the true 856 link.
Once you’ve got your CGI script ready (in this case, I’ve called it logit.pl), you just need to change the 856 marc_map to link to the script – e.g.

<a href="http://foo.com/cgi/logit.pl?$_">{<img src="$9">|$y|$_}</a>

Once you’ve saved that and restarted JBoss, your 856 URLs look like this in HIP:

http://foo.com/cgi/logit.pl?http://www.ebooks.com/12345

Your CGI script just needs to take the contents of the QUERY_STRING environment variable (in the above, it’s http://www.ebooks.com/12345), append it to your log, and then issue a redirect to that URL.
(disclaimer: all of the above was done with Horizon 7.32 UK and HIP 3.04 – your mileage may vary depending on which versions you’ve got!)