Relevancy Rules

Inspired by the Summon result click stats that Matthew Reidsma has extracted (and, to be honest, I find myself being regularly inspired by what Matthew’s doing!), I’ve started tracking the clicks on our Summon instance too.
Anyone who’s had the misfortune to hear me present recently will know I’ve been waffling on about the importance of making e-resources easy to use and painless to access, and the fact that most of us are biologically programmed to follow the easiest route to information

…an information [seeker] will tend to use the most convenient search method, inthe least exacting mode available.Information seeking behaviour stops assoon as minimally acceptable results are found.
Wikipedia, Principle of least effort

Why will our students not get up and walk ahundred meters to access a key journal article in the library? … the overwhelmingpropensity of most people is to invest as absolutely little effort into information seeking as they possibly can.
Prof Marcia J. Bates, “Toward an Integrated Model of Information Seeking & Searching” (2002)

…numerous studies have shown users areoften willing to sacrifice informationquality for accessibility. This fast food approach to information consumption drives librarians crazy. “Our information is healthier and tastes better too” they shout. But nobody listens. We’re too busy Googling.
Peter Morville, “Ambient Findability” (O’Reilly 2005)

As early as 2004, in a focus group for one of my research studies, a collegefreshman bemoaned, “Why is Googleso easy and the library so hard?”
Carol Tenopir, “Visualize the Perfect Search” (Library Journal 2009)

The present findings indicated that the principle of least effort prevailed in the respondents’ selection and use of information sources.
Liu & Yang, “Factors Influencing Distance-Education Graduate Students’ Use of Information Sources: A User Study” (2004)

People do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable โ€” so long as it requires little effort to find โ€” rather than using information they know to be of high quality and reliable, though harder to find.
Jason Vaughan, “Web Scale Discovery Services” (ALA TechSource 2011)

ili2010_026

If you’re looking at Discovery Services, demand a trial and don’t get distracted by how many options the advanced search page has, how well it handles complex Boolean queries, or how many obscure specialist subject headings it supports — to misquote Obi-Wan Kenobi, “these are not the features you are looking for”. The real questions you should be asking are:

  • Can students use the skills they’ve already picked up from a lifetime of searching Google to use this thing?
  • If I pluck 2 or 3 vaguely relevant keywords out of the air and type them in (possibly misspelling them), do I get useful and relevant results?
  • If I choose some slightly more carefully considered keywords, are the first 5 results on the first page all relevant?
  • Does the interface look uncluttered, straightforward to use and, if I wanted to, is it obvious how to refine the search?
  • Does this product work with EZProxy (or similar) to provide easy off-campus access to articles?

…in fact, and please don’t take this wrong way, you’re possibly not the best person to be answering some of those questions as your neural pathways have been severely damaged by years of using poorly designed journal database interfaces and you have an unhealthy (bordering on the sexually perverse) obsession with “advanced” search pages ๐Ÿ˜‰
Instead, grab some of your newest students (ideally ones who look blankly at you when you ask them if they know what a Boolean operators is) and let them play with it — the more Information Illiterate they are, the better! Treat their comments as pearls of wisdom (“out of the mouth of babes…”) and try to see the library’s e-resource world through their eyes for what it really is: a scary alien landscape of weird library terminology, perplexing login screens, and unnecessary friction at every turn. Above all, never forget that “Libraries are a cruel mistress“!
Matt Borg nicely summed up the above when he cheekily said (and apologies for paraphrasing you, Matt!)…

The trouble with Summon is that students don’t need to be taught how to use it, but librarians do

In other words, you shouldn’t have to be an Information Professional to use a Discovery Service and you don’t have to become a mini-librarian just in order to figure out how the damn thing works. If the interface looks comfortable and familiar to you, it’s probably been designed for librarians to use and will the scare the bejebus out of most of your students. Swallow hard, gird your loins and remember that you’re not buying this product to make your life easier (although chances are it will), you’re buying it to make life easier for your users.
Or, to put it another way, if a Discovery Service looks like a journal database and acts like a journal database, then it probably is a journal database and not a Discovery Service. There’s a very good reason Summon looks more like Google and less like like <insert name of your favourite database here> ๐Ÿ˜€
(If your idea of a “good time” is to scare undergraduates in training sessions by showing them journal database interfaces — “it’s OK, I’m a friendly librarian and I’m here to show you just how hard it can be to find an article!” — then it’s probably high time you sought medical counselling ;-))
OK, so why am I ranting on about all this stuff? It’s simply because I’ve been pulling out some usage stats from our Summon instance…

  • The library’s print collection accounts for just 0.3% of the items, but accounts for 10.3% of the result clicks — I think our users are trying to tell us that they think our OPAC sucks and they’d rather use Summon to search for books
  • 89% of the results clicked on appeared on the first page of results — as with Google, users rarely delve any further the page 1 of the results
  • Only 2% of result clicks came from beyond the 4th page of the results — very few users will explore the long tail of results
  • 50.5% of result clicks were for the first 4 results on page 1 — the majority of users won’t even bother to scroll down the page!
  • 72.3% of searches used 3 keywords or less — students are using their Google skills
  • Since launching Summon, we’ve seen increases of 300% to 1000% in the COUNTER full-text download stats for many of the journal platforms we subscribe to — although “cost per use” can be a crude measure, we’re getting much better value out of our e-resource subscriptions now

All of the above tells me that Summon is doing all the things we originally bought it for and that the relevancy ranking is schmokin’!
“Yes”, there’s still a place for Information Literacy in all of this, and, “yes”, we need to be able to support researchers and Boolean Buffs, but the majority of students just want to whack in a few keywords and quickly find something that’s relevant — if you select a product that allows them to do just that, they will come ๐Ÿ™‚

Hacking Summon for Fun and Profit (part 1)

OK, I’ll admit it, I’ve fallen in love with jQuery over the last 18 months ๐Ÿ™‚
I’ve ended up using quite a bit of jQuery in our new reading list software (“MyReading”), to add various bells and whistles, including dropping an “add to MyReading” option into the Summon interface.
Like they say, “when you’ve got a hammer, everything looks like a nail”, once you know a bit of jQuery, every web page looks hackable, so I’ve pondering what else might be fun and/or sensible to do. To be honest, I really like the Summon interface, so making any major changes to it feels a bit like drawing a moustache on the Mona Lisa (or Mr Graham Stone, for that matter).
So, rather than hack the interface around too much, you could use jQuery to start collecting usage data from Summon (“hmmmm… [drool] usage data!”)…

…or maybe add a helpful hint if a search brings back a silly number of results?

To do the above, you’ll need to host a JavaScript file on your own web server and then include a link to that file in the Summon Administration options, e.g.

Because Summon already uses jQuery, it means you can put jQuery code into your JavaScript file without having to worry about loading the jQuery library yourself. To do the above helpful hint, you could use the following 7 lines of code:

$(document).ready(function() {
  var count = $('#summary .highlight:last').html( );
  count = count.replace(/[^0-9]/g,'');
  if( count > 50000 ) {
    $('#summary').append('<div style="margin-top:5px;"><span id="refineSearchHelp" style="display:none; font-style:italic;">Too many results? Use the options below to refine your search...</span>&nbsp;</div>');
    $('#refineSearchHelp').delay(1000).fadeIn(1000);
} });

Let’s walk through each of those lines…
line 1
Typically, you don’t want your jQuery JavaScript to run until the web page has finished loading, so you’ll often see this line of code — it ensures what follows won’t be executed until after the web page has loaded. If you’ve coded JavaScript before, you’ll probably be familiar with using the onload event in the body HTML tag to do that.
line 2
jQuery lets you easily grab bits of the web page, typically by referencing id attributes (which should be unique) and/or class attributes (which can be repeated). In the same way that CSS uses “#” and “.” to style ids and classes, jQuery uses them to select elements of the page.
If you hunt through the source of a Summon results page, you’ll find something like the following bit of HTML…

<h1 id="summary">
<span class="label">Search Results:</span>
Your search for
<span class='highlight'>germany</span>
returned
<span class='highlight'>3,892,793</span>
results
</h1>

…so, the number of results (3,892,793) appears in a span with a class value of highlight, which itself is inside a h1 with an id of summary. Unfortunately, there’s another span that also has the same class value before it, so we need to use :last in the jQuery to make sure we fetch the HTML contents of the second (i.e. last) span.
line 3
OK, at this point, we should have a JavaScript variable named count that contains the string 3,892,793, so this line strips out the commas (in fact, it strips out anything that isn’t a digit), which should leave count containing 3892793.
line 4
How many results is too many results? Let’s say we’ll display the message for anything more than 50,000 results…
line 5
Time for some more jQuery! ๐Ÿ™‚
jQuery lets you add new bits of HTML to a page, so let’s create a new div — that will appear underneath the results summary message — by appending it to that existing h1. Just to show off, we’re going to have the helpful hint gradually fade in, so we’ll pop the text within its own span that has an id value of refineSearchHelp and we’ll style it so it’s initially hidden (display:none).
In case you’re wondering, I added that space character &nbsp; just so that the div contains something to start off with, which should ensure the page doesn’t suddenly jump as the hint fades in.
line 6
So, now that we’ve got our helpful hint in a hidden span, let’s wait a second (delay(1000) …OK, we’ll actually wait 1,000 milliseconds!) before letting the message gradually fade in (fadeIn(1000)).
line 7
We’ve got to balance the books, so for every brace and bracket we’ve opened, we need to close them, otherwise the web browser might get upset.
Disclaimer!
Dropping jQuery into Summon isn’t officially supported by Serials Solutions, so be sure to take full responsibility for anything to do and thoroughly test it to make sure you’ve not broken Summon for your users, otherwise they’ll be grumpy.
The other thing to be aware of is that Summon is in a state of coninual development, so you’ll need to test any tweaks you’ve made after each update (to make sure that they still work) and that they don’t conflict with any changes Serials Solutions have made to the Summon HTML.
Appendum
By subverting the “Custom Link” option to insert the JavaScript file, you lose the opportunity to add in a normal custom link (this appears to the left of the “Help | About | Feedback” options at the top right of the Summon interface)… or do you?
Well, there’s absolutely no reason why you can’t use jQuery to do that and, in fact, rather than just having one custom link, you could add 2 or 3…

$('#topbar .link').prepend('<a href="https://library.hud.ac.uk/wiki/">A to Z List of Electronic Resources</a>');

The default links appear in a div with a class of link, which has a parent div with an id of topbar. To add in our new extra link before those existing links, we have to prepend it.

UKSG 2012

uksg2012_023
You can grab a copy of my presentation (“I wouldn’t start from here”) from daveyp.com/files/stuff/uksg2012/ (PDF or PPTX).
Claire Gravely wrote up a summary of the session for the UKSG blog.
Unfortunately it looks like I’ve managed to lose the USB stick with the final version of the presentation, so the above is the closest version I could find on my netbook. I’ve snipped out about 25 slides of screenshots that showed an e-resource problem reported by a student and the fun & games I had trying to get to the full-text (which ended with me being asked to pay $59) — the purpose wasn’t to single out any specific vendor or platform for criticism, but to show an example of just how painful the end user experience can be when compared to Google.
After uploading it, I released I’d forgotten to include explicit CC info. Feel free to treat the original content as being CC BY-SA.
The PowerPoint file was too big for SlideShare, so I’ve uploaded the PDF version with notes instead…

More “stuff like this”…

Just a little follow on from the previous blog post
Spurred on by comments from Lisa, I’m exploring if we can filter the recommendations so that they become more relevant to students in a specific academic school, or even to students on a specific course, and the initial results look fairly promising ๐Ÿ™‚
Let’s look at a couple of examples:
International Journal of Sociology and Social Policy (ISSN 0144-333X)
Here are the recommendations based on usage by all users. A quick browse through the items shows a range of subject areas — social exclusion, economics, human resources, etc, and a student would need to sift through to spot the items relevant to their subject area.
Now let’s filter the recommendations so that they’re only based on usage by students in a specific academic school:

…hopefully you can spot that the recommendations suddenly jump to becoming much more relevant to courses in that particular school.
Managerial Auditing Journal (ISSN 0268-6902)
Let’s drill a bit deeper this time and look at courses in the Business school:

Without knowing how our course codes are created, you can probably guess that courses starting with “BA…” are mostly accountancy & finance, and that those starting with “BM…” are to do with leadership and management.

“People who looked at this thing, also looked at this stuff…”

We’ve had serendipity suggestions on the OPAC for nearly 7 years now, but they’ve been based entirely around the physical collection in the library.
After Friday’s Skype chat to the SPLURGE Hackfest, I got to thinking about how we can hook the e-stuff into the recommendations, so I’ve spent the weekend gathering data from our library management system, our link resolver and our EZProxy logs to see what happens if they all go into the same melting pot.
It’s a very rough & ready “crappy prototype”, but you can have a play around with it here. If you get an empty page, click on the “pick random item” link until something interesting happens.
At the moment, the recommendations are being built from a database of just over 5 million events (approx 70% of those are item loans and the rest are accesses of online journals). If you take the “Midwifery” journal as a starting point, you’ll get a list of the other books and journals that people have looked at. The algorithm behind it is the same one I’ve discussed previously.
If you hover over a title, you’ll see the usage info breakdown, e.g. “42 / 56” means that 56 different users in total have looked at the recommended item, and 42 of those also looked at the item we’re generating the recommendations for.
I’ve not done any de-duping, so you might get the same journal title being repeated (once for the print ISSN and once for the e-ISSN), and I’ve not included any ebook usage data yet. I’ve also avoided merging the two lists together until I can figure out a suitable way of weighting book loans against online journal usage.
Picking random items, it’s apparent that some courses lean more towards book borrowing (i.e. very few journal recommendations), whilst stundents studying other subjects are heavy online journal users (i.e. very few book recommendations).
So, what do you think — is it useful to be able to show more than just book recommendations to students?

Tweaking the Summon Search Widget code

Summon has a really cool new custom search box building widget that includes the ability to pre-limit a search to a specific discipline (or disciplines). The widget also allows you to pre-select which facets to apply to the search.
A question came up on the SummonClients mailing list asking if it was possible to exclude facets from the search — “[is there] a way to exclude newspapers AND book reviews (AND possibly Dissertations) from the initial search”? There isn’t an obvious way at the moment to do that, but I’m a shambrarian and I like to tweak and tinker with things ๐Ÿ˜€
So, to exclude a content type facet…
1) Go into the Search Box Builder widget and expand the Content Type selection:

2) Select any Content Types to you want to exclude (e.g. Book Review, Dissertation/Thesis and Newspaper Article):

3) Make any other changes you want (appearance, other facets, etc) and click on Get Code to get the widget’s HTML:

At this point, we’ve got a search widget that will only find results that are Book Reviews, Dissertation/Thesis (Thesii? Thesissesses?) or Newspaper Articles. So, the final change to make is to tweak the HTML so that those 3 types are excluded, which you can do by adding a ,t to each of them:

...["ContentType,Dissertation,t",
"ContentType,Book Review,t",
"ContentType,Newspaper Article,t"]...

The result should be a custom search box that excludes the chosen content types:

Using the Serials Solutions APIs for the MyReading project

dallas_063
I had planned to go along to SummonCamp at ALA Midwinter on Sunday and talk about using the Summon API but, perhaps all too predictably, I ended up staying up waaaaay too late on Saturday night sampling some yummy US beers, forgot to set my alarm and overslept ๐Ÿ™
Anyway, here’s what I would have talked about if I hadn’t been asleep at the time…
MyReading Project
For the last 12 months, I’ve been working on developing reading list software for the University of Huddersfield (home page and blog). By making use of both the Summon and 360 Link APIs, I’ve been able to cut down development time and also improve the functionality of the software for both staff and students.
360 Link API
E-journals and e-journal articles make up about 15% of all the reading list references in the software. One of the primary issues was how to provide accurate links to that material and how to ensure those links are updated whenever we change e-journal subscriptions or database platforms. On top of that, we also needed to ensure that authentication was as seamless as possible. Seeing as our link resolver (360 Link) already does all of the above, it made sense to use that.
So, for journal and article references, we’re storing the OpenURL so that we can query the 360 Link API on-the-fly to fetch back current access links. As 360 Link also handles the creation of EZProxy URLs for authentication, the API will return EZProxy prepended URLs when relevant.
If we take this reference to Iodine status of UK schoolgirls: a cross-sectional survey from The Lancet, we’ve stored the OpenURL as part of the reference:

By calling the 360 Link API with the above OpenURL, we can get back a page of XML.
At the time of writing, the ssopenurl:linkGroups element contains a couple of ssopenurl:linkGroup elements of type holding which, in turn, contain the current article access links for SwetsWise Online Content and ScienceDirect Journals.
So, as long as we’ve got an accurate OpenURL for a reference, we should be able to automatically insert the correct access links into the reading list. But, how do you get the OpenURL in the first place…?
Summon API
Once staff are logged into the reading list software, they’ll find an option to import any result from Summon as a reference into one of their reading lists…

Although Summon doesn’t officially support modifications like this, unofficially it’s possible to execute jQuery by hacking in a link to suitable JavaScript via the “Custom Link” option within the Summon Administration Console…

As doing this isn’t officially supported by Serials Solutions, it’s possible that it could stop working at any time. But, until that day comes, it’s a useful way of making minor tweaks to the Summon interface ๐Ÿ˜‰
I’m only a beginner with jQuery, so the following might not be the most efficient and/or elegant way of adding the custom links, but it does the job…

$(document).ready(function(){ doMyReading( ); });
function doMyReading( )
{
  $( '.metadata' ).each(function(intIndex)
  {
    var myReadingDocID = $( this ).parent().parent().parent().parent().parent().parent().parent().attr("id");
    if( myReadingDocID )
    {
      $( this ).append( '<div style="margin-top:3px;background:#004088;color:#ccf;padding:3px 8px;font-size:98%; white-space:nowrap;">item options: <a title="add this item to MyReading" style="color:#fff;" href="https://library.hud.ac.uk/myreading/perl/admin/import_summon.pl?id='+myReadingDocID+'">add to MyReading</a></div>' );
    }
  });
}

…the important bit is that we grab the document ID value for the result (myReadingDocID in the above), which we can then use to retrieve the exact same result via the Summon API.
When the staff user clicks on the “add to MyReading” link, the reading list software uses the document ID to pull in the reference’s details from the Summon API and automatically populates the reference form…

…which includes the OpenURL and DOI, both of which can subsequently be used to query the 360 API to fetch access links ๐Ÿ™‚
We can also use the document ID to retrieve the article’s subject terms and abstract from Summon…

Summary
So, in summary, we’ve used the APIs to:

  1. avoid having to manually maintain links to e-journal content
  2. make it both quicker and easier for staff to add items from Summon (which currently encompasses over 600,000,000 items!) to reading lists
  3. enhance records by bringing in abstracts and subject terms from Summon