Taggytastic – part 3

God bless Bryony – she puts up with a lot! Last night she had to put up with me adding a tagging system to our test HIP server (wwwlibrarycat.hud.ac.uk).

To be honest, the amount of interest in the subject keyword cloud took me by surprise, and it was fascinating to read all of the other blogs that picked it up.  A number of blogs made the very valid point that it wasn’t a true tag cloud – the tags were created using the existing subject keywords and not from tags added by our users.

I began to feel that tag clouds in OPACs are a true “chicken & egg” scenario — to be able to add that kind of functionality to our live OPAC, I need to be able to prove that it’s a valid and worthwhile new feature… but to do that, I need to have already implemented it and to have had a healthy number of tags added by our users.  I think this is backed up by the sheer number of people out there who think it’s a “good idea” but who are not in a position to start the ball rolling.  Obviously, it’d be a different story if our OPACs already had tag functionality “out of the box”!

I’m sure there are a dozen ways of implementing user tags in an OPAC, but here’s mine…

  • I want to have some degree of control over the tagging (yes I know, I should be trusting our users!), so on a live system I’d need a user to log into the OPAC before they could add tags.
     
  • If they’ve logged in, then I can let them add new tags or delete ones they’ve previously added.
     
  • I want to be able to make tag suggestions – for example, if a user wants to tag “Web Site Design using XML and HTML”, then I want to be able to suggest relevant tags that other users have already added (e.g. xml, html, web design, etc).  Unless other users have already tagged that particular book, how can I generate suggestions?

    There’s more than one way of doing this, but I decided to do it by adding tags to the subject headings as well as to books.  In other words, when someone tags a book with html then I also add that tag to all of the subject headings for that book as well.  Then, when someone wants to tag a different book that has one of those subject headings, I can suggest html as a possible tag.

    This does mean that irrelevant tags can get added to subject headings but (in theory) over time the relevant tags will outweigh the irrelevant ones for each heading.  As well as using the tags linked to subject headings, I also take into account any existing tags for that book.

It’s been interesting watching the beta tagging feature on Amazon, although (being early days) they seem to be getting a huge number of irrelevant tags – for example, the latest Harry Potter is tagged with things like good, kerri, brothers present and Jill.  How long will it take before the irrelevant tags on Amazon sink away and the relevant ones rise to the top? More to the point, would library patrons be more likely to use relevant tags on an OPAC?  Again, it’s the “chicken & egg” — without a healthy critical mass of relevant tags, how do you prove to cynical members of staff that tagging is a good thing and not a method of adding virtual graphitti to the OPAC?

I’ve still got more work to do with our initial attempt at OPAC tagging — at the moment, all you can do is add tags.  In particular, I don’t have a method of selecting a tag and then showing all the items that have been tagged… but I’m hoping that Casey Durfee (Seattle Public Library) will come to my rescue.  Spookily, I woke up this morning wondering how on earth I could hack our HIP server to allow this and then found that Casey had already emailed me to let me know it was possible!

I’m fairly happy with the suggestions based on subject headings – for example, when I try to add the very first tag to Internet technology and e-commerce, the following suggestions appear:

The suggestions are based on items that have been previously tagged which have the subject headings “Web site development” and/or “Electronic commerce”.  Some of the suggestions are more relevant than others (e.g. ecommerce), but at least I’m not getting anything too weird appearing (yet!).  Obviously there’s nothing to stop the user adding a new tag, but hopefully having a few suggestions available will help them out.

This should also have the added benefit that each of the subject headings will get a healthy selection of (hopefully relevant) tags attached to them.  For example, here are some of the tags I’ve currently got attached to the “Web site development” subject heading (in ranked order):

  • web services
  • xml
  • java
  • microsoft
  • dotnet
  • asp
  • html
  • ecommerce
  • php
  • macromedia
  • portals
  • security
  • soap
  • lucene
  • web design
  • databases

…wouldn’t it be cool if the OPAC could tie those together when the user does a subject keyword search?  For example, if they searched for XML, then the OPAC could suggest that they might be interested in other books under the “Web site development” subject heading.  Or, if they were looking at books under the “Web site development” subject heading, then the OPAC could suggest that they might also be interested in books tagged with things like xml, web services and java.

I guess I’ve tagged about 150 books so far (mostly those to do with XML and HTML), but what I’d really like to do is throw open the doors and invite anyone who reads this weblog post to jump in and start tagging our catalogue.

When you first try to tag an item, the server will attempt to save a cookie in your web browser — you can see the value of this cookie in the “debug” section, along with the user number assigned to that cookie.   The only reason I’m doing this is to try and give you an option to remove any tags that you’ve added (but not ones that other people have added).

Also, the script doesn’t refresh the book page with your new tags — so, once you’ve added your tags, you’ll need to refresh the book page to make them appear.

I’ll keep on working on the code over the holidays and, once it’s in a stable state, I’ll post the scripts here.

If anyone has any comments or suggestions, please feel to email me:

d.c.pattern [at] hud.ac.uk
email [at] daveyp.com

Have fun folks!

HIP Tips!

Adding a message to your log in page
At Huddersfield, we’ve added some text to our HIP login page:

This is a really easy hack and just involves editing a single XSL page (security.xsl).
As always, make a safe backup of the file before you do any editing!
Firstly, open up security.xsl in your favourite text editor.  If you use Microsoft Notepad and it looks a mess — e.g. you get weird squares (𘂅) appearing — then try opening the stylesheet using Wordpad instead.
Go down to the end of the file, and it should look a little like this:

</center>
</td>
</tr>
</table>
</form>
</xsl:if>
</xsl:template>
</xsl:stylesheet>

Simply insert some well formatted (i.e. XHTML) code between the </form> and the </xsl:if> – for example:

</center>
</td>
</tr>
</table>
</form>
<!-- extra info for users - added by Dave -->
<p /><div align="center" style="font-size:80%">
<b>Borrower ID</b> is the 10 digits of your Campus ID Card
<br />
<b>PIN Number</b> is the 4 digits of the day and month
of your birth (e.g. 0206 for 2nd of June)
<p />
<span style="border-bottom:dashed black 1px; color:red;">
<b>Don't forget to logout when you have finished!</b>
</span>
</div>
<!-- end of extra info -->
</xsl:if>
</xsl:template>
</xsl:stylesheet>

…you should always add comments to any code you add to a stylesheet — it will help you locate your changes when it’s 3 months down the line and you can’t remember what you did, or why you did it!

All this was done using HIP 3.04 (UK release) and it should work for other versions of HIP 3.

p.s. if you like the idea of reminding a logged in user to log out when they’ve finished, check out this tip too!

Quotes of the Month

Jenny Levine has linked to an excellent article by Roy Tennant on the Library Journal web site:

What I Wish I Had Known

I love Roy’s statement that:

I wish I had known that the solution for needing to teach our users how to search our catalog was to create a system that didn’t need to be taught — and that we would spend years asking vendors for systems that solved our problems but did little to serve our users.

A few minutes later, I stumbled across Jennifer Matthews‘ blog – she’s a student of English and Comparative Literary Studies at the University of Warwick:

So I figure that the library is evil. And it hates me.
(The Library)

HIP Tips!

Do your 856 URLs show up in a big font size that doesn’t seem to quite fit in with the rest of the text on the full bib page?
The quickest way to fix it is to fire up the Horizon table editor, select marc_map, and then locate the marc_map that you use for your 856 URLs.
In the “HTML format (Info Portal only)” field, insert class="smallAnchor" before the href. For example, if your HTML format looks like this:

<a href="$_">{<img src="$9">|$y|$_}</a>

…then change it to:

<a class="smallAnchor" href="$_">{<img src="$9">|$y|$_}</a>

Save the change, and then restart JBoss and the 856 links should pick up the formatting of the “smallAnchor” element from your HIP cascading style sheets (CSS).
And, for the more adventurous – if you’d like to know which 856 links your users are clicking on, then you can set your marc_map up to redirect to a CGI script that logs the URL and then redirects the user’s web browser to the true 856 link.
Once you’ve got your CGI script ready (in this case, I’ve called it logit.pl), you just need to change the 856 marc_map to link to the script – e.g.

<a href="http://foo.com/cgi/logit.pl?$_">{<img src="$9">|$y|$_}</a>

Once you’ve saved that and restarted JBoss, your 856 URLs look like this in HIP:

http://foo.com/cgi/logit.pl?http://www.ebooks.com/12345

Your CGI script just needs to take the contents of the QUERY_STRING environment variable (in the above, it’s http://www.ebooks.com/12345), append it to your log, and then issue a redirect to that URL.
(disclaimer: all of the above was done with Horizon 7.32 UK and HIP 3.04 – your mileage may vary depending on which versions you’ve got!)

CODI 2005 – session links

I’ve put together a page listing each of the CODI 2005 sessions along with (hopefully!) all the PowerPoint, handout, podcast, blog, etc links.

http://www.daveyp.com/files/stuff/codi2005links.html

Please feel free to re-use the link or to circulate it.
If you have any additions or corrections, please email them to me:

d.c.pattern [at] hud.ac.uk

using “circ_tran” to show borrowing suggestions in HIP

One of the things we’re trying to do this year at Huddersfield is to make better use of our data archives:
…as each student goes through a library turnstile, data is written away…
…as each student borrows a book, more data is quietly written away…
…as each student uses an electronic resource, data is written away…
…as each student logs onto a PC, yet another piece of data is…
…okay, enough already – you get the idea!
We’re not particularly interested in what an individual student has done, but we’d like to see the broader pictures. For example, we open the Library 24/7 at certain times of the year (e.g. Easter) – we’d like to know more about the kinds of students who come in late at night and leave early the next morning:

  • are certain ethnic groups more likely to use the Library outside of the standard opening hours?
  • do we get more male or female students using the Library in the wee small hours?
  • are students coming in to use the computers, to issue/return items, or to sit quietly in a corner and study?

The answers to those kinds of questions tend to be found in several databases. The Sentry database tells us when someone entered the Library, but it doesn’t tell us if they are male or female, Asian or Caucasian – that kind is information is stored in the Student Records System. Also, the Sentry database doesn’t tell us what the student actually did – Circ transactions are in Horizon and PC usage info is stored in other databases.
So, long term we’re looking at ways of trying to combine data from all of those sources into meaningful and enlightening stats.
“What has this got to do with showing borrowing suggestions in HIP?”, I hear you ask!
Well, once I’d had a hunt around in our circ_tran table in Horizon, it seemed like a great use of all that historical Circ data would be to do an Amazon-like “patrons who borrowed this book also borrowed…”.
Before I proceed with the “how to”, I’ve got a hunch that not everyone has got a circ_tran table – it might be something that SirsiDynix needs to set for you, rather than a default table that ships with Horizon (can anyone confirm this?)
The circ_tran table contains (amongst other things) two very useful bits of information – the borrower# and the item# of the item they borrowed. You can use the item# to look up the bib# of that item (using the item table).
Once you’ve got the borrower# and bib#, you can use that to create two lists of data:

  • a list of all the bib#s that a specific borrower has ever borrowed
  • a list of all the borrower#s who have borrowed a specific bib#

To build the list of borrowing suggestions, you start with a bib# and:

  • 1) build the list of all the borrower#s who have borrowed that bib#
  • 2) for each of those borrower#s, compile all the bib#s of all the items they’ve borrowed to a single big list of bib#s
  • 3) take that big list and count how many times each bib# appears in the list
  • 4) sort your list of individual bib#s by the count of how many times they appear in the big list

…those bib#s that appear the most times in the big list are therefore the most appropriate ones to suggest.
Unfortunately those 4 steps can take some serious CPU time, so it’s not possible to do it on the fly as each of your patrons brings up a full bib page in HIP. Therefore, you need to pre-process each of your bib#s to generate a list of other suggested bib#s.
I wrote a Perl script this evening (which I’ll make available soon) that slurps up the entire circ_tran table into your PCs memory and then processes each of the bib#s to create up to 10 other suggested bib#s. Each of those suggestions is then pumped into a MySQL database where it will sit until a patron views that bib#s page in HIP.
A single line of JavaScript added to the fullnonmarcbib.xsl stylesheet then pulls in dynamic content from a Perl CGI script. That CGI script simply fetches the list of suggested bib#s from the MySQL database, quickly runs them via the title table in Horizon, and then displays a random selection of them underneath the copy/holding info:
click to view larger image
The only real drawback is that it’s not working with your circ_tran data in real time – the list of 10 possible suggestions per bib# won’t change until I run the slurping Perl script again to rebuild all of the suggestions. On our database of 2,046,180 circ_tran entries, that took about 3 hours to process. So, in theory, you could schedule it to run once a week or once a month.

Taggytastic! (part 2)

Wow! Fame and glory – hopefully the untold riches will be just around the corner! 😉
For anyone who wants to have a go with their Horizon/HIP, I’ve uploaded the script to here:
http://www.daveyp.com/files/stuff/tags/
I’ve done a little bit of tweaking, and the final keyword list now looks like this.
You’ll need to download the Perl script and the sample config.txt file.
As with many of the other scripts I’ve uploaded, you’ll need a working ODBC connection to your Horizon database – if you’re running ReportSmith or EasyAsk, then you’ll know all about that. You’ll also need to have Perl installed, along with the DBD::ODBC module from CPAN.
The config.txt file has three columns:

  • the first column defines the range for each keyword count, and this works with the $threshold variable to select the font size & colour for each keyword in the HTML output
  • the second column defines the font size – you should be able to use any valid CSS value (e.g. 50%, 10px, or x-small, etc)
  • the final column defines the font colour – in the example file I’ve gone for a blue gradient (#006 thu #77D), but if you prefer a single colour then just change all the entries to that (e.g. #00F) – again, you should be able to use any valid CSS value (#123456, red, etc)

To run the script, just put it in the same directory as the config.txt file and run it (e.g. perl getsubjects.txt). The HTML output file should get created in the same directory.
There’s a few variables that you can tweak:

  • $minimumBibs – this is used in the intial SQL query on the subject table, so a lower value means more subjects will be included for processing, but the query might take longer to run and/or hit your Horizon server harder
  • $threshold – once all the subjects have processed, any whose total number of matching bibs fall below the threshold value will be exlcuded from the output – if you’d prefer a smaller list of keywords in the HTML output, then choose a higher value and vice versa
  • $spacing – this is a string of characters to insert between each keyword
  • $hipUrl – unless you really want to link to our HIP, then you’ll need to tweak this URL

Have fun with the script!
Jenny emailed me to ask if the script could work with other systems (e.g. Innovative), so I’m going to have a go writing a smaller version of the script that will take a list of keywords and counts, such as the example below, and then create the same output:
1237 American poetry
381 Java
857 World Wide Web
…so, as long as you can query your system to get something in the above format, then it should work.
[one quick sandwich and Coke later]
…and here is a more general version of the script that should work with other systems:
http://www.daveyp.com/files/stuff/tags2/
As well as the Perl script, you’ll need to download the config.txt file. Also, you’ll need to create your own subjects.txt file – I’ve included a sample one so you can get a rough idea of the layout.
As before, you can do a bit of tweaking with the variables and the config.txt file to customise the final HTML output.
Horizon users who don’t want to faff around with getting the first script to use ODBC can generate their own subjects.txt by copying the output of running the following SQL statement in SQL Advantage (or similar):
select n_bibs,processed from subject where n_bibs > 50
…however, you won’t get the advantage of the way the first script collapses sub-subjects together.

Taggytastic!

Inspired by Jenny Levine‘s mock up of an OPAC with keyword tags, I’ve gone a step further and used our Horizon database (the “subject” table in particular) to generate a real page based on subject keywords with more than 10 bibs:
http://www.daveyp.com/files/stuff/subjects.html
I did a bit of tweaking so that sub-subjects (is that a real word?) are collapsed into the parent subject – if you hover your mouse pointer over one of the links, then you should get a better idea of what I mean. Once I’d got the totals for each parent subject, I excluded anything with less than 100 bibs.

CODI 2005 – Homeward Bound!

Sadly the two “spare days” after the end of CODI 2005 have flown by and tomorrow morning we’re setting off back to the UK. By the way, just in case anyone wants to know what “the sun going down on CODI 2005” looked like, here it is/was:
sun set on Wednesday evening
We spent most of yesterday in St Paul (the sibling to Minneapolis in the title “The Twin Cities”). As some of the Horizon mailing list regulars will know, we were keen to visit the Catbus exhibit at the Minnesota Children’s Museum – and just to prove we did, here’s Bryony in the front seat:
driving the Catbus
…and here’s Totoro himself:
Totoro
…and there’s more pictures here!
For lunch, we walked about a mile out of St Paul to Red’s Pizza Savoy. After the cosmopolitan Minneapolis, Red’s felt like a taste of true Americana.
We even made it back in time to go and give the Loring Park squirrel’s their tea: