Course level journal article feed

Following on from the last blog post, I’ve done some coding to see how well (or not!) a course level new journal article feed might work.
The process behind the code is…

  1. for a given course, identify the most frequently accessed journal titles
  2. use JournalTOCs to fetch the latest articles from each journal’s RSS feed
  3. for each course, create a list of articles (sorted in descending date)

…and you can see the initial output from the code here: http://www.daveyp.com/files/stuff/journals/
For some courses (e.g. Educational Administration) it looks like usage is focused on a single journal, but most seem to bring in content from multiple titles — for example, BSc Criminology is bringing in content from:

One of the opportunities here is to use the journal usage data to identify potentially relavant journals that aren’t being used on a course and include those in the feed. In the above example, the Journal of criminal justice might be such a journal.

Quick idea from #jisclms event

A mega quick blog post before the afternoon session kicks off!
Lynn Connaway‘s talk mentioned that they’d found that students wanted the library/librarian to provide a filtered feed of relevant stuff, so here’s our idea…
1) capture OpenURL usage data along with user data (so you know who’s looking at which journals)
2) identify the most popular journals for individual courses
3) for each course, use TicTOCs/JournalTOCs to provide an aggregated feed of new articles for those journal

Summon 4 HN — bits o’ code

As part of the JISC Summon 4 HN project, we’ll be releasing some chunks of code that I’ve knocked together for our Summon implementation at Huddersfield.
The code will cover these areas:

  1. updating Summon with MARC record additions, updates and deletions from Horizon
  2. providing live availability information from Horizon without resorting to screen-scraping the OPAC
  3. customising 360 Link using jQuery

In theory, the first 2 might also be of interest to Horizon sites that are implementing an alternative OPAC (e.g. VuFind or AquaBrowser) where you need to set up regular MARC exports. The latter might be of interest to 360 Link sites in general.
Keep an eye on the Project Code section of the Summon 4 HN blog for details of the code 🙂


I couldn’t find a relevant photo for this blog post, so instead, let’s have another look at those infamous MIMAS #cupcakes from ILI2009 🙂
ili2009_013

Here comes Summ(er|on)

It’s probably a sign of getting old and decrepit, but this year has just flown by — it doesn’t seem like two minutes since we kicked off our implementation of Serials Solutions’ Summon and now it’s gone fully live (it actually went fully live halfway through the Mashed Library event we ran the other week).
woods_004
The bulk of the implementation was done and dusted by early January 2010, and the majority of the implementation time was spent populating 360 Link (the Serials Solutions link resolver) with our journal holdings — a task our Journals Team found much easier than when we implemented SFX back in 2006.  As the plan had always been to run Summon in parallel to MetaLib during the 2009/10 academic year, it meant we had lots of time to play and tweak. 
We flipped the link resolver over from SFX to 360 Link in late January and then formally “soft” launched Summon during the University’s Research Festival in early March.  Throughout the academic year, usage of Summon has been growing and the vast majority of the feedback has been positive 🙂
As part of the JISC Summon4HN Project, we’ll be documenting the implementation and releasing chunks of code that we hope might be of use to the community, including:

  • code for automating the export of deleted, new and updated MARC records from Horizon so that they can be imported into Summon (or VuFind, AquaBrowser, etc)
  • code for creating “dummy” journal title records (so that known journal titles can be easily located in Summon, e.g. American Journal of Nursing)
  • a basic mod_perl implementation of the DLF spec for exposing availability data for library collections
  • details of the various tweaks we’ve made to our 360 Link instance

Also, as part of the roll out of Summon, we’ve been revamping our E-Resources Wiki to provide a browseable list of resources — as with the journal titles, we’ve been dropping dummy MARC records into Summon so that known resources can be located via a search (e.g. Mintel Reports).

Non/low library usage and final grades

Whilst chatting to one of the delegates at yesterday’s “Gaining business intelligence from user activity data” event (my Powerpoint slides can be grabbed from here) about non & low-usage of library services/resources, I began wondering how that relates to final grades.
In the previous blog post, we’ve seen that there appears to be evidence of a correlation between usage and grades, but that doesn’t really give an indication into how many students are non/low users. For example, if we happened to know that 25% of all students never borrow anything from the library, does that mean that 25% of students who gain the highest grades don’t borrow a book?
Let’s churn the data again 🙂
In the following 3 graphs, we’re looking at:

  • X axis: bands of usage (zero usage, then incremental bands of 20, then everything over 180 uses)
  • Y axis: as a percentage, what proportion of the students who achieved a particular grade are in each band

You can click on the graphs to view a full-sized version.
One of the things to look for is which grade peaks in each band of usage.
Borrowing
The usage bands represent the number of items borrowed from the library during the final 3 years of study…
horizon
caveat: we have a lot of distance learners across the world and we wouldn’t expect them to borrow anything from the library
In terms on non-usage (i.e. never borrowing an item), there’s a marked difference between those who get the two highest grades (1 and 2:1) and those who get the lowest honours grade (3). It seems that those who get a third-class honour are twice as likely to be non-users than those who get a first-class or 2:1 degree.
E-Resource Usage
The usage bands represent the number of times the student logged into MetaLib (or AthensDA) during the final 3 years of study…
metalib
caveat: this is a relatively crude measure of e-resource usage, as it doesn’t measure what the student accessed or how long they accessed each e-resource
Even at a quick glance, we can see that this graphs tells a different story to the previous one — the numbers of non-users is lower, but there’s a huge (worrying?) amount of low usage (the “1-20” band). I can only speculate on that:

  • did students try logging in but found the e-resources too difficult to use?
  • how much of an impact do the barriers to off-campus access (e.g. having to know when & how to authenticate using Athens or Shibboleth) have on repeat usage?
  • are students finding the materials they need for their studies outside of the subscription materials?

As I mentioned previously, Summon is a different kettle of fish to MetaLib, so it’s unlikely we’ll be able to capture comparative usage data — if you’ve tried using Summon, you’ll know that you don’t need to log in to use it (authentication only kicks in when you try to access the full-text). However, we’re confident that Summon’s ease-of-use and the work we’ve done to improve off-campus access will result in a dramatic increase in e-resource usage.
As before, we see it’s those students who graduate with a third-class honour who are the most likely to be non or low-users of e-resources.
Visits to the Library
The usage bands represent the number of visits to the library during the final 3 years of study…
sentry
caveat: we have a lot of distance learners across the world and we wouldn’t expect them to borrow anything the the library
Again, the graph shows that those who gain a third-class degree are twice as likely to never visit the library than those who gain a first-class or 2:1.

Library usage and final grades

It’s high time I started blogging again, so let’s start off with something that my colleagues in the library have been talking about at recent conferences — the link between the usage of library services and the final academic grades achieved by students.
As a bit of background to this, it’s probably worth mentioning that we’ve had an ongoing project (since 2006?) in the library looking at non and low-usage of library resources. That project has helped identify the long term trends in book borrowing, e-resource usage and library visits by the students at Huddersfield. Plus, we’ve used that information to help identify specific courses and cohorts of students who probably aren’t using the library as much as they should be, as well as when is the most effective time during a course to do refresher training.
Towards the back end of last year, we worked with the Student Records Team to build up a profile of library usage by the previous 2 years worth of graduates. For each graduate, we compared their final degree grade with their last 3 years of library usage data — specifically:

  • Items loaned — how many things did they borrow from the library?
  • MetaLib/AthensDA logins — how often did they access e-resources?
  • Entry stats — how many times did they venture in to the library?

Now, I’ll be the first to admit that these are basic & crude measures…

  • A student might borrow many items, but maybe he’s just working his way through our DVD collection for fun.
  • A login to MetaLib doesn’t tell you what they looked at or how long they used our e-resources.
  • Students might (and do) come into the library for purely social reasons.
  • Using the library is just one part of the overall academic experience.

…but they are rough indicators, useful for a quick initial check to see if there is a correlation. Plus, we know from the non & low-usage project that there are still many students who (for many reasons) don’t use the library much.
So, let’s churn the data! 🙂
Here’s the average usage by the 3,400 or so undergraduate degree students who graduated with an honour in the 2007/8 academic year:
2007/8
In terms of visits to the library, there’s no overall correlation — the average number of visits per student ranges from 109 to 120 — although we do seem some correlation at the level of individual courses. What does this tell us (if anything)? I’d say it’s evidence that the library is for everyone, regardless of their ability and academic prowess.
We do see a correlation with stock usage and e-resource usage. Those who achieved a first (1) on average borrowed twice as many items as those who got a third (3) and logged into MetaLib/AthensDA to access e-resources 3.5 times as much. The correlation is fairly linear across the grades, although there’s a noticable jump up in e-resource usage (when compared to stock borrowing) in those who gained a first.
Now the data for the 3,200 or students from the following academic year, 2008/9:
2008/9
As before, no particular correlation with visits to the library, but a noticeable correlation with stock & e-resource usage. Again we see that jump in e-resource usage for those who got the highest grade.
Note too that the average usage has increased. We’ve not changed the way we measure logins or item circulation, so this is a real year-on-year growth. (Side note: as we make the move from MetaLib to Summon, the concept of an “e-resource login” will change dramatically, so we won’t be able to accurately compare year-on-year in future)
Finally, here’s both years of graduates usage combined onto a single graph:
2007/8 & 2008/9
I’m curious about that jump in e-resource usage. Does it mean, to gain the best marks, students need to be looking online for the best journal articles, rather than relying on the printed page? If that is the case, will Summon have a measurably positive impact on improving grades (it certainly makes it a lot easier to find relevant articles quickly)?
Going forward, we’ve still got a lot of work to do drilling down into the data — analysing it by individual courses, looking deeper into the books that were borrowed and the e-resources that were accessed, etc. We’re also need to prove that all this has a stastical relevance. Not only that, but how can we use that knowledge and insight to improve the services which the library offers — it’d be foolish to say “borrow more books and you’ll get better grades”, but maybe we can continue to help guide students to the most relevant materials for their students.
It’s all exciting stuff and, believe me, the University of Huddersfield Library is a great environment to work in… I just wish there were more hours in the day! 🙂

ILI 2009 Presentation

I really struggled to shoehorn everything I wanted to talk about during my ILI 2009 presentation into the slides, so this blog post goes into a bit more depth than I’ll probably talk about…
slide 1 & 2

I’m still in two minds about whether or not the word “exploit” has too many negative connotations, but what the heck!
If you do use any of the content from the presentation, please drop me an email to let me know 🙂
slide 3

As part of the development of the UK version of Horizon back in the early 1990s, libraries requested that the company (Dynix) add code to log all circulation transactions. Horizon was installed at Huddersfield in 1996 and has been logging circulation data since then. At the time of writing this blog post, we’ve got data for 3,157,111 transactions.
slide 4

With that volume of historical data, it seemed sensible to try and create some useful services for our students. In November 2005, we started dabbling with an Amazon-style “people who borrowed this” service on our OPAC. After some initial testing and tweaking, the service went fully live in January 2006. The following month, we added a web service API (named “pewbot”).
To date, we’ve had over 90,000 clicks on the “people who borrowed this, also borrowed…” suggestions, with a peak of 5,229 clicks in a single month (~175 clicks per day). Apart from the “Did you mean?” spelling suggestions, this has been the most popular tweak we’ve made to our OPAC.
slide 5

Because we’re an academic library, we get peaks and troughs of borrowing throughout the academic year. The busiest times are the start of the new academic year in October and Easter.
slide 6

If you compare the number of clicks on the “people who borrowed this, also borrowed..” suggestions, you can see that it’s broadly similar to the borrowing graph, except for the peak usage. Due to the borrowing peak in October, in November a significant portion of our book stock will be on loan. When our students find that they books they want aren’t available, they seem to find the suggestions useful.
I’m hoping to do some analysis to see if there’s a stronger correlation between the suggested books that are clicked on and then borrowed on the same day during November than during the other months.
slide 7

Once a user logs into the OPAC, we can provide a personal suggestion by generating the suggestions for the books they’ve borrowed recently and then picking one of the titles that comes out near the top.
slide 8

I was originally asked to come up with some code to generate new book lists for each of our seven academic schools. It turned out to be extremely hard to figure out which school a book might have been purchased for, so I turned to the historical book circulation data to come up with a better method.
Rather than having a new book list per school, we’re now offering new book lists per course of study.
The way it’s done is really simple — for each course, we analyse all of the books borrowed by students on that course and then automatically build up a Dewey lending profile. Whenever a new book is added to our catalogue, we check to see which courses have previously borrowed heavily from that Dewey class and then add the book details to their feeds.
The feeds are picked up by the University Portal, so students should see the new book list for their course and (touch wood!) the titles will be highly relevant to their studies.
slide 9

One of the comments I frequently hear is that book recommendation services might create a “vicious circle” of borrowing, with only the most popular books being recommended. At Huddersfield, we’ve seen the opposite — since adding recommendations and suggestions, the range of stock being borrowed has started to widen.
From 2000 to 2005, the range of titles being borrowed per year was around 65,000 (which is approximately 25% of the titles held by the library). Since adding the features in early 2006, we’ve seen a year-on-year increase in the range of titles being borrowed. In 2009, we expect to see over 80,000 titles in circulation, which is close to 33% of the titles held by the library.
I strongly believe that by adding serendipity to our catalogue, we’re seeing a very positive trend in borrowing by our students.
slide 10

Not only are students borrowing more widely than before, they’re also borrowing more books than before. From 2000 to 2005, students would borrow an average of 14 books per year. In 2009, we’re expecting to see borrowing increase to nearly 16 books per year. We’re also seeing a year-on-year decrease in renewals — rather than keeping hold of a book and renewing it, students seem to be returning items sooner and borrowing more than ever before.
slide 11

We’re also logging keyword searches on the catalogue — since 2006, we’ve logged over 5 million keyword searches and it’s fun looking at some of the trends.
As we had a bit of dead space on the OPAC front page, we decided to add some “eye candy” — in this case, it’s a keyword cloud of the most popular search terms from the last 48 hours. Looking at the usage statistics, we’re seeing that new students find the cloud a useful way of starting their very first search of the catalogue, with the usage in October nearly twice that of the next highest month.
slide 12

A much more useful service that we’ve built from the keywords is one that suggests good keywords to combine with your current search terms.
In the above example, we start with a general search for “law” which brings back an unmanageable 7000+ results. In the background, the code quickly searches through all of the previous keyword searches that contained law and pulls together the other keywords that are most commonly used in multi-keyword searches that included “law”. With a couple of mouse clicks, the user can quickly narrow the search down to a manageable 34 results for “criminal law statutes“.
There’re two things I really like about this service:
1) I didn’t have to ask our librarians to come up with the lists of good keywords to combine with other keywords — they’ve got much more important things to do with their time 🙂
2) The service acts as a feedback loop — the more searches that are carried out, the better the suggestions become.
slide 13

I forget exactly how this came about (but I suspect a conversation with Ken Chad sowed the initial seed), but we decided to release our circulation and recommendation data into “the wild” in December 2008 — see here for the blog post and here for the data.
The data was for every item that has an ISBN in the bibliographic record, as we felt than the ISBN would be the most useful match point for mashing the data up with other web services (e.g. Amazon).
We realised that we’d need to use a licence for the data release and, after a brief discussion with Ken Chad, it became increasingly obvious that a Public Domain licence was the most appropriate. Accordingly, the data was released under a joint Open Data Commons and (partly because we couldn’t decide which licence was the best one!). In other words, we wanted it to be really clear that there were “no strings” attached to how the data could be used.
slide 14

Within a couple of days of releasing the data, Patrick Murray-John at the University of Mary Washington had taken it and “semantified” the data.
A few weeks later, I had the privilege of chatting to Patrick and Richard Wallis when we took part in a Talis Podcast about the data release.
slide 15

My great friend Iman Moradi (formerly a lecturer at Huddersfield and now the Creative Director of Running in the Halls) used some of the library data as part of the Multimedia Design course.
slides 16 & 17

Iman’s students used the library data to generate some really cool data visualisations — it was really hard to narrow them down to just two images for the ILI presentation. The second image made me think of Ranganathan‘s 5th Law of Library Science: “The library is a growing organism” 🙂
slide 18

The JISC funded MOSAIC Project (Making Our Shared Activity Information Count), which followed on from the completed TILE Project, is exploring the benefits that can be derived from library usage and attention data.
Amongst the goals of the project are to:

  • Encourage academic libraries to release aggregated/anonymised usage data under an open licence
  • Develop a prototype search engine capable of providing course/subject specific relevancy ranked results

The prototype search engine is of particular interest, as it uses the pooled usage/attention data to rank results so that the ones which are more relevant to the student (based on their course) are boosted. For example, if a law student did a search for “ethics”, books on legal ethics would be ranked higher than those relating to nursing ethics, ethics in journalism, etc. This is achieved by deep analysis of the behaviour of other law students at a variety of universities.
slide 19

The MOSAIC Project is also encouraging the developer community to engage with the usage data, and this included sponsorship of a developer competition.
they
slides 20 & 21

It was hard to pick which competition entries to include in the presentation, so I just picked a couple of them at random. The winning entry, and the two runners up, should be announced shortly — keep an eye on the project web site!
slide 22

The library usage graphs on slides 9 and 10 clearly show that borrower behaviour has changed since the start of 2006. Given that this change coincided with the introduction of suggestions, recommendations and serendipity in the library catalogue, I believe that there’s a compelling argument that they have played a role in initiating that change.
With the continuing push for Open Data (e.g. see the recent TED talk by Tim-Berner’s Lee), I believe libraries should be seriously considering releasing their usage and attention data.
slide 23

Most usage based services require some initial data to work with. So, given that disk storage space is so cheap, it makes sense to capture as much usage/attention data as possible in advance, even if you have no immediate thoughts about how to utilise it.

Simple API for JISC MOSAIC Project Developer Competition data

For those of you interested in the developer competition being run by the JISC MOSAIC Project, I’ve put together a quick & dirty API for the available data sets. If it’s easier for you, you can use this API to develop your competition entry rather than working with the entire downloaded data set.

edit (31/Jul/2009): Just to clarify — the developer competition is open to anyone, not just UK residents (however, UK law applies to how the competition is being run). Fingers crossed, the Project Team is hopeful that a few more UK academic libraries will be adding their data sets to the pot in early August.

The URL to use for the API is https://library.hud.ac.uk/mosaic/api.pl and you’ll need to supply a ucas and/or isbn parameter to get a response back (in XML), e.g.:

The “ucas” value is a UCAS Course Code. You can find these codes by going to the UCAS web site and doing a “search by subject”. Not all codes will generate output using the API, but you can find a list of codes that do appear in the MOSAIC data sets here.
If you use both a “ucas” and “isbn” value, the output will be limited to just transactions for that ISBN on courses with that UCAS course code.
You can also use these extra parameters in the URL…

  • show=summary — only show the summary section in the XML output
  • show=data — only show the data in the XML output (i.e. hide the summary)
  • prog=… — only show data for the specified progression level (e.g. staff, UG1, etc, see documentation for full list)
  • year=… — only show data for the specified academic year (e.g. 2005 = academic year 2005/6)
  • rows=… — max number of rows of data to include (default is 500) n.b. the summary section shows the breakdown for all rows, not just the ones included by the rows limit

The format of the XML is pretty much the same as shown in the project documentation guide, except that I’ve added a summary section to the output.
Notes
The API was knocked together quite quickly, so please report any bugs! Also, I can’t guarentee that the API is 100% stable, so please let me know (e.g. via Twitter) if it appears to be down.

Peaks and troughs in borrowing

A good couple of years ago, I blogged about “lending paths”, but we’ve not really progressed things any further since then. I still like the idea that you can somehow predict books that people might/should borrow and also when you might get a sudden rush of demand on a particular title.
Anyway, whilst heading back up north after the “Library Domain Model” workshop, I got wondering about whether we could use historical circulation data to manage the book stock more effectively.
Here’s a couple of graphs — the first is for “Strategic management: awareness and change” (Thompson, 1997) and the second is for “Strategic management: an analytical introduction” (Luffman, 1996)…


The orange bars are total number of times the book has been borrowed in that particular month. The grey bars show how many times we’d have expected the book to be loaned in that month if the borrowing for that book had followed the global borrowing trends for all stock.
Just to explain that it a little more depth — by looking at the loans for all of our stock, we can build up a monthly profile that shows the peaks and troughs throughout the academic year. If I know that a particular book has been loaned 200 times, I can have a stab at predicting what the monthly breakdown of those 200 loans would be. So, if I know that October accounts for 20% of all book loans and July accounts for only 5%, then I could predict that 40 of those 200 loans would be from October (200 x 20%) and that 10 would be from July (200 x 5%). Those predictions are the grey bars.
For both of the books, the first thing that jumps out is the disconnect between the actual (orange) number of loans in May and the prediction (grey). In other words, both books are unusually popular (when compared to all the other books in the library) in that month. So, maybe in March or April, we should think about taking some of the 2 week loan copies and changing them to 1 week loans (and then change them back in June), especially if students have had to place hold requests in previous years.


For some reason, I didn’t take any photos at the “Library Domain Model” event itself, but I did do the “tourist thing” on the South Bank…
london_021 london_019 london_037 london_024