Peaks and troughs in borrowing

A good couple of years ago, I blogged about “lending paths”, but we’ve not really progressed things any further since then. I still like the idea that you can somehow predict books that people might/should borrow and also when you might get a sudden rush of demand on a particular title.
Anyway, whilst heading back up north after the “Library Domain Model” workshop, I got wondering about whether we could use historical circulation data to manage the book stock more effectively.
Here’s a couple of graphs — the first is for “Strategic management: awareness and change” (Thompson, 1997) and the second is for “Strategic management: an analytical introduction” (Luffman, 1996)…


The orange bars are total number of times the book has been borrowed in that particular month. The grey bars show how many times we’d have expected the book to be loaned in that month if the borrowing for that book had followed the global borrowing trends for all stock.
Just to explain that it a little more depth — by looking at the loans for all of our stock, we can build up a monthly profile that shows the peaks and troughs throughout the academic year. If I know that a particular book has been loaned 200 times, I can have a stab at predicting what the monthly breakdown of those 200 loans would be. So, if I know that October accounts for 20% of all book loans and July accounts for only 5%, then I could predict that 40 of those 200 loans would be from October (200 x 20%) and that 10 would be from July (200 x 5%). Those predictions are the grey bars.
For both of the books, the first thing that jumps out is the disconnect between the actual (orange) number of loans in May and the prediction (grey). In other words, both books are unusually popular (when compared to all the other books in the library) in that month. So, maybe in March or April, we should think about taking some of the 2 week loan copies and changing them to 1 week loans (and then change them back in June), especially if students have had to place hold requests in previous years.


For some reason, I didn’t take any photos at the “Library Domain Model” event itself, but I did do the “tourist thing” on the South Bank…
london_021 london_019 london_037 london_024

“Engaging our Digital Natives”, University of Bradford

On Friday, I had the pleasure of giving a presentation (“Web 2.0 and You Too“) as part of the “Engaging our Digital Natives” event at the University of Bradford. For some reason, Slideshare isn’t showing the notes from the presentation, but they should be available if you download the Powerpoint.
Some photographs from the day are available on Flickr or as a slideshow
bradford_003
bradford_041
bradford_034
bradford_030
bradford_023
bradford_011

All change!

I’m moving the blog onto a new web server, so things might be a little weird for a couple of days 🙂
To help make the move, I’m using my “other” domain (daveyp.co.uk) but this will eventually revert back to daveyp.com once everything is working okay.

Quick plug: CILIP U&CR Y&H Open Source event

Just a quick plug to say that there are still spaces available at the “Open Source: Free Speech, Free Beer and Free Kittens!” event at Hudderfield on Friday 26th June. Full details and a link to the booking form are available on the CILIP University College and Research Group web site.
Speakers at the event include:
– Ken Chad (Ken Chad Consulting)
– Nick Dimant and Jonathan Field (PTFS Europe)
– Nicolas Morin (BibLibre)
– Richard Wallis (Talis)
…although I don’t think there’ll be any free beer or kittens on offer to delegates, there will be a free lunch which is kindly being sponsored by PTFS Europe 🙂

Web service for the free book usage data

I’ve been meaning to get around to adding a web service front end on to the book usage data that we released in December for ages. So, better late than never, here it is!
It’s not the fastest bit of code I’ve ever written, but (if there’s enough interest) I could speed it up.
The web service can be called a couple of different ways:
1) using an ISBN
Examples:
a) https://library.hud.ac.uk/api/usagedata/isbn=0415014190 (“Language in the news”)
b) https://library.hud.ac.uk/api/usagedata/isbn=159308000X (“The Adventures of Huckleberry Finn”)
Assuming a match is located, data for 1 or more items will be returned. This will include FRBR style matching using the LibraryThing thingISBN data, as shown in the second example where we don’t have an item which exactly matches the given ISBN.
2) using an ID number
Examples:
a) https://library.hud.ac.uk/api/usagedata/id=125120 (“Language and power”)
The item ID numbers are included in the suggestion data and are the internal bibliographic ID numbers used by our library management system.
——————-
edit 1: I should also have mentioned that the XML returned is essentially the same format as described here.
edit 2: Ive now re-written the code as a mod_perl script (to make it faster when using ISBNs) and slightly altered the URL

Keeping everyone happy at a conference

At Mashed Library UK 2009, we’re planning to kick the event off with six 30 minute opening sessions. We’ve got two rooms, so there’ll be a session running in each room at the same time. Since a delegate can’t be in two places at the same time, they’ll only be able to go to three of the six sessions. So, how do you ensure that you keep everyone happy and that you don’t have too many clashes (i.e. having to miss a session you’d have quite liked to have gone to)?
Having never organised an event before, I’m guessing the usual way would be to try and schedule sessions together that target different audiences? However, that sounds like a potential headache inducer and I’m a programmer, not a planner!
So, what we’re going to do, once we’ve got all six sessions finalised, is to let each of the 60 odd delegates (and by that I mean we’ve got more than 60 delegates!) rank the sessions in order of preference. So, their 1st, 2nd, and 3rd choices would be the three sessions that they’d most like to go to.
With that kind of data, you’d expect to see some clustering (i.e. delegates making the same or similar choices) and so (in theory) there will be an optimal sequencing of sessions that will give the most delegates the best chance to going to their three top choices.
There’s a wide variety of programming techniques for finding optimal solutions to problems, from the simple to the complex (e.g. simulated annealing and genetic algorithms). However, because I’d got a bath running, I decided to knock up a quick hack using the simplest method — randomly generate a session sequence and then see how well it meets the choices of the delegates. By the way, if you want to learn more about calculating optimal solutions, see “Programming Collective Intelligence” by Toby Segaran (ISBN 9780596529321).
With any optimal solution code, you need to way of measuring the success of a given solution. To my mind, that would be “happiness” — if you find a solution that gives a delegate the ability to attend their top three choices, they’ll be very happy, but if you have a session clash for their 1st and 2nd choices, they won’t be happy. Once you’ve calculated the overall “happiness” for all the delegates, then that allows you to compare that particular solution with other random solutions (i.e. “does this session sequence generate more happiness or less that the previous one?”)
I hadn’t planned on releasing the code, as it really was a 5 minute “quick and dirty” hack, but Ben tweeted to say he might find it useful, so I’ve uploaded the Perl script to here. I’ve also included a sample file containing some dummy delegate choices.
For each delegate, there’s a comma separated list showing their session preference (1=top choice)…

Andy    2,4,3,5,6,1

…so Andy’s top choice is session 6, followed by session 1, then session 3, etc.
If you run the Perl script, it’ll pick a random session sequence and calculate the happiness. It’ll keep on looping and trying to find better solutions until it finds one that can’t be improved upon. You’d probably want to run the code several times to ensure that the final solution really is the best one. You might want to also try one of the alternative $overall calculations to see if that produces the same session sequence.
Here’s an example of an early solution…

[1]     session 1 = 11 delegate(s)
[1]     session 6 = 4 delegate(s)
[2]     session 5 = 6 delegate(s)
[2]     session 4 = 9 delegate(s)
[3]     session 2 = 8 delegate(s)
[3]     session 3 = 7 delegate(s)
HAPPINESS = 87 (5.8)
        1       Andy    -4.8
        3       Beth    -2.8
        3       Cary    -2.8
        9       Dave    +3.2
        5       Earl    -0.8
        9       Fred    +3.2
        9       Gene    +3.2
        3       Hans    -2.8
        9       Iggy    +3.2
        5       Jane    -0.8
        5       Karl    -0.8
        9       Leah    +3.2
        9       Macy    +3.2
        3       Neil    -2.8
        5       Owen    -0.8
CLASHES = 7 / OVERALL = 12.4285714285714 / DIFF = 38.4

In the above output, it’s proposing to run sessions 1 & 6 together, then 5 & 4, and finally 2 & 3. By looking at the delegate choices, you can easily calculate which of the two concurrent sessions each delegate would prefer to go to (i.e. 11 delegates would choose to go to session 1).
The code also calculates a “happiness” value for each delegate. If a delegate gets to go to their 1st, 2nd and 3rd choices, then they’d get a maximum happiness score of 9 (3 x 3 points). If a 1st choice session is being run at the same time as their 2nd choice (or a 2nd at the same time as the 3rd), that would make them unhappy, so a point is deducted. If a 1st choice runs at the same time as their 3rd choice, they’d probably accept that (however, nothing is added to their happiness score).
Once all the scores have been calculated, we get an overall happiness of 87 (out of a possible 135, i.e. 15 delegates x the maximum happiness score of 9) and the average happiness is 5.8 out of 9.
We can also see the how (un)happy each delegate is and how much they deviate from the average happiness. Dave, Fred, Gene, Iggy, Leah and Macy all get to go to their top 3 choices, so they’ve all got scores of 9 out of 9. Andy is very unhappy (1 out of 9). The others are somewhere in the middle, so they’ve all had to make compromises and won’t be going to their top 3 sessions.
There are 7 clashes (when a 1st choice runs at the same time as the 2nd, or the 2nd at the same time as the 3rd). Ideally, we’d like to keep the clashes to a minimum.
Here’s an example of a better solution (which might actually be the optimal solution for the dummy data)…

[1]     session 3 = 9 delegate(s)
[1]     session 5 = 6 delegate(s)
[2]     session 4 = 9 delegate(s)
[2]     session 6 = 6 delegate(s)
[3]     session 1 = 10 delegate(s)
[3]     session 2 = 5 delegate(s)
HAPPINESS = 101 (6.73333333333333)
        5       Andy    -1.73333333333333
        9       Beth    +2.26666666666667
        3       Cary    -3.73333333333333
        3       Dave    -3.73333333333333
        5       Earl    -1.73333333333333
        3       Fred    -3.73333333333333
        5       Gene    -1.73333333333333
        9       Hans    +2.26666666666667
        5       Iggy    -1.73333333333333
        9       Jane    +2.26666666666667
        9       Karl    +2.26666666666667
        9       Leah    +2.26666666666667
        9       Macy    +2.26666666666667
        9       Neil    +2.26666666666667
        9       Owen    +2.26666666666667
CLASHES = 2 / OVERALL = 50.5 / DIFF = 36.2666666666667

The average happiness is now up to 6.73 per delegate and there are only 2 clashes, which is much better. Cary, Dave and Fred will be the most affected by this particular session scheduling, but we now have 8 delegates attending their top choices.
So, the big question will be: what happens when we get the real data from the 60 odd delegates who are coming to Mashed Library? Stay tuned for the answer!

Transcript of the #cilip2 Twitter hastag

Despite a widespread network failure that seemed to affect quite a few universities, I finally managed to pick up all of the #cilip2 tweets from today’s event: http://www.daveyp.com/files/stuff/cilip2.html
cilip2_full
Whenever I get a spare half-an-hour, I’ll do some analysis of the tweets. If anyone want a tab separated version of the data, you can grab it from here.