Presentation to the TILE Project meeting in London

About 90 minutes ago, I had the pleasure of doing a short presentation to the JISC TILE Project’s “Sitting on a gold mine” workshop in London. Unfortunately I wasn’t able to present in person, so we had a go doing it all via a video conferencing link. As far as I can tell, it seemed to go okay!
The presentation was an opportunity to formally announce the release of the usage data.
Our Repository Manager was keen to try putting something non-standard into the repository and twisted my arm into recording the audio… and I’d forgotten how much I hate hearing my own voice!!!
Anyway, as soon as SlideShare starts playing ball, I’ll have a go uploading and sync’ing the audio track. Otherwise, here’s a copy of the PowerPoint: “Can You Dig It?: A Systems Perspective” and you can hear the audio by clicking on the Flash player below…
[audio:https://library.hud.ac.uk/ppt/CanYouDigIt.mp3]
The workshop had a copy of the PowerPoint that they were running locally, so every now and then you’ll hear me say “next slide”.
I haven’t listened to much of the audio, so I’ve got my fingers crossed I didn’t say anything too stupid!!!
[edit]
Well, here’s my first attempt at SlideCasting…

Can You Dig It
View SlideShare presentation or Upload your own.

…I had no idea how much I go “erm” when presenting! :-S

Free book usage data from the University of Huddersfield

I’m very proud to announce that Library Services at the University of Huddersfield has just done something that would have perhaps been unthinkable a few years ago: we’ve just released a major portion of our book circulation and recommendation data under an Open Data Commons/CC0 licence. In total, there’s data for over 80,000 titles derived from a pool of just under 3 million circulation transactions spanning a 13 year period.
https://library.hud.ac.uk/usagedata/
I would like to lay down a challenge to every other library in the world to consider doing the same.
This isn’t about breaching borrower/patron privacy — the data we’ve released is thoroughly aggregated and anonymised. This is about sharing potentially useful data to a much wider community and attaching as few strings as possible.
I’m guessing some of you are thinking: “what use is the data to me?”. Well, possibly of very little use — it’s just a droplet in the ocean of library transactions and it’s only data from one medium-sized new University, somewhere in the north of England. However, if just a small number of other libraries were to release their data as well, we’d be able to begin seeing the wider trends in borrowing.
The data we’ve released essentially comes in two big chunks:
1) Circulation Data

This breaks down the loans by year, by academic school, and by individual academic courses. This data will primarily be of interest to other academic libraries. UK academic libraries may be able to directly compare borrowing by matching up their courses against ours (using the UCAS course codes).

2) Recommendation Data

This is the data which drives the “people who borrowed this, also borrowed…” suggestions in our OPAC. This data had previously been exposed as a web service with a non-commercial licence, but is now freely available for you to download. We’ve also included data about the number of times the suggested title was borrowed before, at the same time, or afterwards.

Smaller data files provide further details about our courses, the relevant UCAS course codes, and expended ISBN lookup indexes (many thanks to Tim Spalding for allowing the use of thingISBN data to enable this!).
All of the data is in XML format and, in the coming weeks, I’m intending to create a number of web services and APIs which can be used to fetch subsets of the data.
The clock has been ticking to get all of this done in time for the “Sitting on a gold mine: improving provision and services for learners by aggregating and using learner behaviour data” event, organised by the JISC TILE Project. Therefore, the XML format is fairly simplistic. If you have any comments about the structuring of the data, please let me know.
I mentioned that the data is a subset of our entire circulation data — the criteria for inclusion was that the relevant MARC record must contain an ISBN and borrowing must have been significant. So, you won’t find any titles without ISBNs in the data, nor any books which have only been borrowed a couple of times.
So, this data is just a droplet — a single pixel in a much larger picture.
Now it’s up to you to think about whether or not you can augment this with data from your own library. If you can’t, I want to know what the barriers to sharing are. Then I want to know how we can break down those barriers.
I want you to imagine a world where a first year undergraduate psychology student can run a search on your OPAC and have the results ranked by the most popular titles as borrowed by their peers on similar courses around the globe.
I want you to imagine a book recommendation service that makes Amazon’s look amateurish.
I want you to imagine a collection development tool that can tap into the latest borrowing trends at a regional, national and international level.
Sounds good? Let’s start talking about how we can achieve it.


FAQ (OK, I’m trying to anticipate some of your questions!)
Q. Why are you doing this?
A. We’ve been actively mining circulation data for the benefit of our students since 2005. The “people who borrowed this, also borrowed…” feature in our OPAC has been one of the most successful and popular additions (second only to adding a spellchecker). The JISC TILE Project has been debating the benefits of larger scale aggregations of usage data and we believe that would greatly increase the end benefit to our users. We hope that the release of the data will stimulate a wider debate about the advantages and disadvantages of aggregating usage data.
Q. Why Open Data Commons / CC0?
A. We believe this is currently the most suitable licence to release the data under. Restrictions limit (re)use and we’re keen to see this data used in imaginative ways. In an ideal world, there would be services to harvest the data, crunch it, and then expose it back to the community, but we’re not there yet.
Q. What about borrower privacy?
A. There’s a balance to be struck between safeguarding privacy and allowing usage data to improve our services. It is possible to have both. Data mining is typically about looking for trends — it’s about identifying sizeable groups of users who exhibit similar behaviour, rather than looking for unique combinations of borrowing that might relate to just one individual. Setting a suitable threshold on the minimum group size ensures anonymity.

Coming soon, to a blog near here…

Okay — I’m the first to admit I don’t blog enough… I still haven’t even blogged about how great Mashed Library 2008 was (luckily other attendees have already blogged about it!)
Anyway, unless I get run over by a bus, later on this week I’m going to post something fairly big — well, it’s about 90MB which perhaps isn’t that “big” these days — that I’m hoping will get a lot of people in the library world talking. What I’ll be posting will just be a little droplet, but I’m hoping one day it’ll be part of a small stream …or perhaps even a little river.

(view slideshow of Mashed Library 2008)

Dewey friend wheel

I’ve been meaning to have a stab at creating something similar to a friend wheel, but using library data, for a while now. Here’s a prototype which uses our “people who borrowed this, also borrowed…” data to try find strong borrowing relationships…
Dewey friends
I picked three random Dewey numbers and hacked together a quick PerlMagick script to draw the wheel:

  • 169 – Logic -> Analogy (orange)
  • 822 – English & Old English literatures -> Drama (purple)
  • 941 – General history of Europe -> British Isles (light blue)

The thickness and brightness of the line indicates the strength of the relationship between the two classifications. For example, for people who borrowed items from 941, we also see heavy borrowing in the 260’s (Christian social theology), 270’s (Christian church history), and the 320’s (Political science).
The next step will be to churn through all of the thousand Dewey numbers and draw a relationship wheel for our entire book stock. I’ve left my work PC on to crunch through the raw data overnight, so hopefully I’ll be able to post the image tomorrow.

Our books, arranged by Hue and Lightness

Sunday afternoons were made for doing this kind of thing…
Book drop
(click here for the biggest version)
Several thousand of our books, arranged vertically by hue and horizontally by lightness. The value was calculated by finding the average colour of the book cover and then converting that to the relevant HSL value. There’s a little bit of randomness thrown in too, in terms of rotation and position. The image was created using Perl and ImageMagick.
If nothing else, it shows that we have more red and blue books than green or pink ones!

Horizon 7.4.2 – available “worldwide”

The press release for Horizon 7.4.2 has just gone online.
Both Talin Bingham (Chief Technology Officer) and Gary Rautenstrauch (Chief Executive Officer) use the word “worldwide” in the press release:

This new version adds functionality requested by our customers worldwide and offers great benefits to libraries and patrons alike…

Providing the features librarians need and delivering the best user experience worldwide are SirsiDynix’s highest priorities.

However, the reality is that Horizon 7.4.2 is a North American only release. Much as I would love to be able to roll out some of those new features here at Huddersfield, and much as I would love to have all those really nasty security holes in HIP fixed, the bottom line is that I can’t — SirsiDynix’s definition of “worldwide” is a curiously US-centric one.
Horizon customers in the UK, France, Germany, Sweden, Belgium, Netherlands, etc, are not “qualifying customers”, despite paying their yearly maintenance.
SirsiDynix International made a decision a year or two ago that they would no longer provide regional variations of Horizon, and I can fully understand why. As a non-American customer, I might not be happy about it, but I can understand why. What I can’t understand (and frankly, it’s starting to really piss me off) is why the company continues to pretend in public that they are.
If anyone senior from the SirsiDynix US office would like to contact me today, then please do — I’m sure you’ll find my direct telephone number in your UK customer contacts database. Maybe there’s a perfectly good reason why most of your Horizon customers in Europe are no longer classified as being part of your “worldwide” customer base and I’d really love to hear it.

Google Book Search Data API

The new Google Book Search Data API has some really cool features and I’m wondering how much of it I can shoehorn into the OPAC?
Our students increasingly expect the OPAC search box to be searching the full-text of our book stock — i.e. they type in several words that it would be useful to borrow a book about. Searching just the bog-standard MARC metadata, you’ll be lucky to get much back… and perhaps then, only if we’ve got the full table of contents in the the MARC record.
So, for example, if I do a keyword search for “english media coverage of immigrants and social exclusion” on our OPAC, I’ll find nothing. However, if I run the same query through the Google API and then filter the results (using the ISBN) to just items we hold in the library, I get 6 hits from the first 40 results that Google sends me:

(I’d probably find more if I also used thingISBN or xISBN to match on associated ISBNs)
I’m not going to claim that those 6 are the most relevant books we hold in the library for that particular search (I’m not sure if I’d find anything of use in the “California politics” book)… but that’s only because I have no idea what the most relevant books are and, no matter how closely I scrutinise our MARC records, I probably never will 😉 So, short of quizzing a Subject Librarian, some of those books might be a worth a quick browse …which I could do virtually with the Embedded Viewer API:
GBS_insertPreviewButtonPopup(‘ISBN:0415198437’);
I guess the big question is “how many API searches will Google let me do every day?”

Green eco-friendly catalogue PCs

Warning — long blog post ahead!
I’ve been promising to post something about our new catalogue PCs …but first, a bit of background:
Like most large(ish) academic libraries, we’ve got dedicated catalogues PCs… lots of them… on every floor! From memory, we had at least 35 of them before the start of the refurbishment. We tended to use PCs that were no longer suitable for staff and they’d often be 5 or 6 years old. Unless staff remembered to turn them off every evening, chances are they’d get left on 24/7.
After a quick Google search, it looks like the average PC & monitor uses around 2.5 pence (UK) per hour (probably more now that electricity costs have risen in the last 12 months). So, if left on 24/7, then it would use 60 pence per day, £4.20 per week, or around £218 per year. Multiply that up by the total number of PCs (35) and we might have been paying around £7,600 per year! :-S
When I saw the plans for the refurbished floors, the first thing I noted was that there was an increased number of catalogue PCs on each floor (bringing to grand total to 45). Again, if left on 24/7, that could cost us nearly £10,000 per year.
Anyway, a couple of things coincided this summer. Firstly, the University (which has been busy improving recycling, etc) was crowned the “Most Improved University” in the annual People & Planet’s Green League table (more info here). Secondly, at the Poster Promenade event in June, I spotted something interesting on one of the stands…
pp_013
On the left-hand side of that photo is a small black box with a cool blue LED — a Viglen MPC-L mini PC. It ships with Xubuntu Linux, 256MB of memory and a 80GB hard drive, and has all the usual connections that you’d see on a PC (6xUSB, VGA, audio, and network). There’s no fan inside, and the metal case acts as a large heatsink for the low spec’d CPU.
Our IT Dept had evaluated them, but the non-standard operating system and the relatively poor performance had put them off. However, they looked ideal for catalogue PCs and, according to the Viglen web site, they only use £1 of electricity per year!
A quick hunt around on the Viglen web site also threw up the fact that they can be purchased with a VESA mount, so the PC can be attached to the back of a flat screen monitor — potentially a huge space saver.
Due to the limited time available, I didn’t fancy trying to figure out how to run Xubuntu as a PAC and instead I installed XP and configured it in the same way as our other catalogue PCs (using Public Web Browser as the Windows shell). The mini PC is *just* about powerful enough to run a web browser smoothly. We normally use McAfee antivirus on University PCs, but that killed the mini PC (it uses far too much CPU and too much memory), so I went with a freebie antivirus option instead.
The mini PCs weren’t too difficult to image. After finally managing to get Norton Ghost to run off a USB drive, it took about 20 minutes to image each mini PC.
So, enough talk, let’s get to the good bit with some pictures!
First of all, you’ll need a TFT monitor with 4 VESA mounting holes on the back:
minipc_001
The VESA mounting cage for the mini PC looks like this:
minipc_002
You can see the mini PC connections on these two photos:
minipc_003 minipc_004
And here you can get a feel for the size (that’s a 17″ TFT monitor behind it):
minipc_005 minipc_006
The mini PC would have no problems fitting into a 5.25″ drive bay on a standard PC:
minipc_007 minipc_008
Here’s the mini PC inside its cage:
minipc_009 minipc_010
Next up, you screw the cage onto the back of the monitor:
minipc_011
Shame they don’t bundle a short VGA lead with the PC!:
minipc_012 minipc_013
Then slip the mini PC into its cage and hook up the VGA cable:
minipc_014 minipc_015
The whole thing is secured using a padlock, which traps all the cables (no more stolen mice!):
minipc_016
From above, you can see just how small the mini PC is:
minipc_017 minipc_018
Setting them up took a little bit of time, as tidying up the various cables so that they’re hidden behind the TFT is a bit tricky:
minipc_019 minipc_020
And, voila — 6 new eco-friendly catalogue PCs and not an ugly PC base unit in sight!
minipc_021
I set the mini PCs up to drop the monitor into standby after 15 minutes, so hopefully we’re going to save a few thousand pounds in electricity this year and maybe we’ll manage to stay in the top 10 in next years’ Green League table 🙂
—-
[edit] I forgot to mention that the mini PC is powered using a 12 volt laptop style power adaptor.

Playing with Processing

Iman first mentioned Processing ages ago, but it’s only recently I’ve gotten around to having a play with it.
So, this is my first stab at coming up with something visual and it’s in the same vein as Dewey Blobs
proc1
…you’ll need Java installed to view it.
Rather than lay Dewey out on a 2D gird, I’m using a 10x10x10 cube (000 is at the front-top-left and 999 is at the back-bottom-right of the cube). The code then cycles through all of the check-outs (orange) and check-ins (blue) from a single day, with a zigzagging 3D line linking up the previous transactions.
What I originally wanted to achieve was to have two curving lines, snaking their way through the cube, but figuring out how to do the Bezier curves made my brain hurt 😉 Anyway, if you want to see a version where the line runs more quickly, click here — it’s harder to read the book titles, but the lines fade away more realistically. Or, here’s a 3rd version that doesn’t include the Dewey classification or book title.
A word of warning: the Java might chomp away at your CPU, so I’m not sure how well it’ll run on a slower PC.