May 2012 – Self Plagiarism is Style

A week on Summon (revisited)

After yesterday’s blog post, I thought I’d have a go at narrowing down my definition of a “separate search”.
If a user enters some search terms, and then uses 2 facets to refine the search before clicking on a result, I was classing that as 3 separate searches — what niggled me overnight was that that approach might inflate the facet use statistics …after all, 30.6% of all searches used at least one facet felt a little high given that I’m forever hearing staff moan that students never use the facets, no?!
For today’s blog post, I’ve removed all searches that didn’t lead to a result click. (There’s a small caveat that my jQuery code currently doesn’t capture a result click for links to the OPAC where the user clicked on the availability message (highlighted in red below) — this is because my jQuery code that captures the result clicks runs once the page has loaded, but before the AJAX’d availability information has been retrieved. When I get some time, I’ll see if I can find a way around that.)

So, let’s see how much of a difference that makes to yesterday’s stats…

29.4% of searches used at least one facet to refine the results
10.4% of searches were refined using the content type facet (e.g. newspaper articles, book reviews, books/ebooks, journal articles, etc)
7.8% of searches were refined to just items with full text available online
9.4% of searches were refined by publication date
5.6% of searches were refined to just articles from scholarly publications (including peer-review)
3.7% of searches were refined using the language facets
2.5% of searches were refined using the subject term facets
2.1% of searches used a Boolean operator, with nearly all of them being AND

So, that overall figure for the % of searches which used at least one facet hasn’t dropped by much from yesterday’s figure of 30.6%.
Anyone who follows me on Twitter will know that I like to cheekily mock the importance of Boolean and the data from the last 7 days reveals a few things:

no-one who used a Boolean NOT in their search clicked on a result
only 0.07% of searches (that’s just 7 searches in every 10,000!) used a Boolean OR, which is arguably the most useful operator to use
unless you’re using a search that includes one of the other Boolean operators, the use of AND is pretty much redundant as it’s the default Boolean operator in a search (i.e. the search “dogs AND cats” is the same as “dogs cats”)… so why are we telling students to use it in Summon?

After poking a bit of fun at someone for entering a 356 word search query yesterday, I can reveal that the longest search in the last 7 days that resulted in a result click was 98 words (it was a paragraph copied and pasted from a journal article).
I guess the big question here is why the disconnect between the “students don’t use facets” mantra and the actual usage data?
Finally, I thought I’d figure out how many results are clicked on after a search…

A week on Summon

[ update: slightly revised stats are available here! ]
We’ve just started collecting in-depth data about how students are searching Summon (keywords entered, facets selected, etc) and I thought some of you might be interested in an early analysis from the last 7 days (just under 40,000 separate searches by 2,807 students)…

On average, students used 4.5 keywords per search (the mode is 3 keywords and the majority of searches used 3 keywords or less — view graph) [1]
30.6% of searches used at least one facet to refine the results [2]
11.7% of searches were refined using the content type facet (e.g. newspaper articles, book reviews, books/ebooks, journal articles, etc)
9.5% of searches were refined to just items with full text available online
9.2% of searches were refined by publication date [3]
7.2% of searches were refined to just articles from scholarly publications (including peer-review)
3.4% of searches were refined using the language facets [4]
2.6% of searches were refined using the subject term factes
2.3% of searches used a Boolean operator, with AND being by far the most common (2.23% of searches) [5]

notes:
[1] – One student copied & pasted the following 356 word title & abstract into the search box!

Peter J. Shaw, David J. Rawlins Article first published online 2 AUG 2011 DOI:10.1111/j.1365-2818.1991.tb03168.x 1991 Blackwell Science Ltd Issue Journal of Microscopy Volume 163, Issue 2, pages 151–165, August 1991 Additional Information(Show All) How to CiteAuthor InformationPublication History SEARCH Search Scope Search String Advanced >Saved Searches > ARTICLE TOOLS Get PDF (1119K) Save to My Profile E-mail Link to this Article Export Citation for this Article Get Citation Alerts Request Permissions Share Abstract References Cited By Get PDF (1119K) Keywords Confocal microscopy;three-dimensional fluorescence microscopy;point-spread function;deconvolution;computer image processing SUMMARY We have measured the point-spread function (PSF) for an MRC-500 confocal scanning laser microscope using subresolution fluorescent beads. PSFs were measured for two lenses of high numerical aperture—the Zeiss plan-neofluar 63 × water immersion and Leitz plan-apo 63 × oil immersion—at three different sizes of the confocal detector aperture. The measured PSFs are fairly symmetrical, both radially and axially. In particular there is considerably less axial asymmetry than has been demonstrated in measurements of conventional (non-confocal) PSFs. Measurements of the peak width at half-maximum peak height for the minimum detector aperture gave approximately 0·23 and 0·8 μm for the radial and axial resolution respectively (4·6 and 15·9 in dimensionless optical units). This increased to 0·38 and 1·5 μm (7·5 and 29·8 in dimensionless units) for the largest detector aperture examined. The resulting optical transfer functions (OTFs) were used in an iterative, constrained deconvolution procedure to process three-dimensional confocal data sets from a biological specimen—pea root cells labelled in situ with a fluorescent probe to ribosomal genes. The deconvolution significantly improved the clarity and contrast of the data. Furthermore, the loss in resolution produced by increasing the size of the detector aperture could be restored by the deconvolution procedure. Therefore for many biological specimens which are only weakly fluorescent it may be preferable to open the detector aperture to increase the strength of the detected signal, and thus the signal-to-noise ratio, and then to restore the resolution by deconvolution. Get PDF (1119K) More content like thisFind more content like this article Find more content written by Peter J. ShawDavid J. RawlinsAll Authors ABOUT USHELPCONTACT USA

…sadly, Summon failed to find a result for that as we don’t subscribe to the article!
[2] – Normally, you search Summon by entering your keywords then, after the results appear, you select facets to refine your search and each facet selection invokes a new search. So, if you ran a search and then select 2 facets, that will be logged as 3 separate searches (1 without any facets, and 2 with).
[3] – Mostly, the publication date facet is being used to limit the search to the X most recent years.
[4] – The vast majority of the content in our Summon instance is in English and, apart from one search that refined the results to just Italian, every use of the language facet was to refine the results to English only.
[5] – Boolean operators have to be entered in UPPER CASE in Summon, with an invisible AND being implict in any multi keyword search that doesn’t include Boolean. Looking at the searches queries that included a Boolean operator, 6% were entered entirely in upper case, implying that the user wasn’t conciously invoking a Boolean search.

Dave’s Law

I’d love to have a law named after me, so here goes:

Dave’s Law

users should not have to become mini-librarians in order to use the library

If you ever find yourself needing to invoke Dave’s Law, please let me know 🙂

Relevancy Rules

Inspired by the Summon result click stats that Matthew Reidsma has extracted (and, to be honest, I find myself being regularly inspired by what Matthew’s doing!), I’ve started tracking the clicks on our Summon instance too.
Anyone who’s had the misfortune to hear me present recently will know I’ve been waffling on about the importance of making e-resources easy to use and painless to access, and the fact that most of us are biologically programmed to follow the easiest route to information…

…an information [seeker] will tend to use the most convenient search method, inthe least exacting mode available.Information seeking behaviour stops assoon as minimally acceptable results are found.
Wikipedia, Principle of least effort

Why will our students not get up and walk ahundred meters to access a key journal article in the library? … the overwhelmingpropensity of most people is to invest as absolutely little effort into information seeking as they possibly can.
Prof Marcia J. Bates, “Toward an Integrated Model of Information Seeking & Searching” (2002)

…numerous studies have shown users areoften willing to sacrifice informationquality for accessibility. This fast food approach to information consumption drives librarians crazy. “Our information is healthier and tastes better too” they shout. But nobody listens. We’re too busy Googling.
Peter Morville, “Ambient Findability” (O’Reilly 2005)

As early as 2004, in a focus group for one of my research studies, a collegefreshman bemoaned, “Why is Googleso easy and the library so hard?”
Carol Tenopir, “Visualize the Perfect Search” (Library Journal 2009)

The present findings indicated that the principle of least effort prevailed in the respondents’ selection and use of information sources.
Liu & Yang, “Factors Influencing Distance-Education Graduate Students’ Use of Information Sources: A User Study” (2004)

People do not just use information that is easy to find; they even use information that they know to be of poor quality and less reliable — so long as it requires little effort to find — rather than using information they know to be of high quality and reliable, though harder to find.
Jason Vaughan, “Web Scale Discovery Services” (ALA TechSource 2011)

If you’re looking at Discovery Services, demand a trial and don’t get distracted by how many options the advanced search page has, how well it handles complex Boolean queries, or how many obscure specialist subject headings it supports — to misquote Obi-Wan Kenobi, “these are not the features you are looking for”. The real questions you should be asking are:

Can students use the skills they’ve already picked up from a lifetime of searching Google to use this thing?
If I pluck 2 or 3 vaguely relevant keywords out of the air and type them in (possibly misspelling them), do I get useful and relevant results?
If I choose some slightly more carefully considered keywords, are the first 5 results on the first page all relevant?
Does the interface look uncluttered, straightforward to use and, if I wanted to, is it obvious how to refine the search?
Does this product work with EZProxy (or similar) to provide easy off-campus access to articles?

…in fact, and please don’t take this wrong way, you’re possibly not the best person to be answering some of those questions as your neural pathways have been severely damaged by years of using poorly designed journal database interfaces and you have an unhealthy (bordering on the sexually perverse) obsession with “advanced” search pages 😉
Instead, grab some of your newest students (ideally ones who look blankly at you when you ask them if they know what a Boolean operators is) and let them play with it — the more Information Illiterate they are, the better! Treat their comments as pearls of wisdom (“out of the mouth of babes…”) and try to see the library’s e-resource world through their eyes for what it really is: a scary alien landscape of weird library terminology, perplexing login screens, and unnecessary friction at every turn. Above all, never forget that “Libraries are a cruel mistress“!
Matt Borg nicely summed up the above when he cheekily said (and apologies for paraphrasing you, Matt!)…

The trouble with Summon is that students don’t need to be taught how to use it, but librarians do

In other words, you shouldn’t have to be an Information Professional to use a Discovery Service and you don’t have to become a mini-librarian just in order to figure out how the damn thing works. If the interface looks comfortable and familiar to you, it’s probably been designed for librarians to use and will the scare the bejebus out of most of your students. Swallow hard, gird your loins and remember that you’re not buying this product to make your life easier (although chances are it will), you’re buying it to make life easier for your users.
Or, to put it another way, if a Discovery Service looks like a journal database and acts like a journal database, then it probably is a journal database and not a Discovery Service. There’s a very good reason Summon looks more like Google and less like like <insert name of your favourite database here> 😀
(If your idea of a “good time” is to scare undergraduates in training sessions by showing them journal database interfaces — “it’s OK, I’m a friendly librarian and I’m here to show you just how hard it can be to find an article!” — then it’s probably high time you sought medical counselling ;-))
OK, so why am I ranting on about all this stuff? It’s simply because I’ve been pulling out some usage stats from our Summon instance…

The library’s print collection accounts for just 0.3% of the items, but accounts for 10.3% of the result clicks — I think our users are trying to tell us that they think our OPAC sucks and they’d rather use Summon to search for books
89% of the results clicked on appeared on the first page of results — as with Google, users rarely delve any further the page 1 of the results
Only 2% of result clicks came from beyond the 4th page of the results — very few users will explore the long tail of results
50.5% of result clicks were for the first 4 results on page 1 — the majority of users won’t even bother to scroll down the page!
72.3% of searches used 3 keywords or less — students are using their Google skills
Since launching Summon, we’ve seen increases of 300% to 1000% in the COUNTER full-text download stats for many of the journal platforms we subscribe to — although “cost per use” can be a crude measure, we’re getting much better value out of our e-resource subscriptions now

All of the above tells me that Summon is doing all the things we originally bought it for and that the relevancy ranking is schmokin’!
“Yes”, there’s still a place for Information Literacy in all of this, and, “yes”, we need to be able to support researchers and Boolean Buffs, but the majority of students just want to whack in a few keywords and quickly find something that’s relevant — if you select a product that allows them to do just that, they will come 🙂

Hacking Summon for Fun and Profit (part 1)

OK, I’ll admit it, I’ve fallen in love with jQuery over the last 18 months 🙂
I’ve ended up using quite a bit of jQuery in our new reading list software (“MyReading”), to add various bells and whistles, including dropping an “add to MyReading” option into the Summon interface.
Like they say, “when you’ve got a hammer, everything looks like a nail”, once you know a bit of jQuery, every web page looks hackable, so I’ve pondering what else might be fun and/or sensible to do. To be honest, I really like the Summon interface, so making any major changes to it feels a bit like drawing a moustache on the Mona Lisa (or Mr Graham Stone, for that matter).
So, rather than hack the interface around too much, you could use jQuery to start collecting usage data from Summon (“hmmmm… [drool] usage data!”)…

Something I added to a heated debate about the Summon discovery service.gvsulib.com/temp/summon_cl…What’s that? Oh, just DATA IN THE HOUSE.

— Matthew Reidsma (@mreidsma) May 5, 2012

…or maybe add a helpful hint if a search brings back a silly number of results?

To do the above, you’ll need to host a JavaScript file on your own web server and then include a link to that file in the Summon Administration options, e.g.

Because Summon already uses jQuery, it means you can put jQuery code into your JavaScript file without having to worry about loading the jQuery library yourself. To do the above helpful hint, you could use the following 7 lines of code:

$(document).ready(function() {
  var count = $('#summary .highlight:last').html( );
  count = count.replace(/[^0-9]/g,'');
  if( count > 50000 ) {
    $('#summary').append('<div style="margin-top:5px;"><span id="refineSearchHelp" style="display:none; font-style:italic;">Too many results? Use the options below to refine your search...</span>&nbsp;</div>');
    $('#refineSearchHelp').delay(1000).fadeIn(1000);
} });

Let’s walk through each of those lines…
line 1
Typically, you don’t want your jQuery JavaScript to run until the web page has finished loading, so you’ll often see this line of code — it ensures what follows won’t be executed until after the web page has loaded. If you’ve coded JavaScript before, you’ll probably be familiar with using the onload event in the body HTML tag to do that.
line 2
jQuery lets you easily grab bits of the web page, typically by referencing id attributes (which should be unique) and/or class attributes (which can be repeated). In the same way that CSS uses “#” and “.” to style ids and classes, jQuery uses them to select elements of the page.
If you hunt through the source of a Summon results page, you’ll find something like the following bit of HTML…

<h1 id="summary">
<span class="label">Search Results:</span>
Your search for
<span class='highlight'>germany</span>
returned
<span class='highlight'>3,892,793</span>
results
</h1>

…so, the number of results (3,892,793) appears in a span with a class value of highlight, which itself is inside a h1 with an id of summary. Unfortunately, there’s another span that also has the same class value before it, so we need to use :last in the jQuery to make sure we fetch the HTML contents of the second (i.e. last) span.
line 3
OK, at this point, we should have a JavaScript variable named count that contains the string 3,892,793, so this line strips out the commas (in fact, it strips out anything that isn’t a digit), which should leave count containing 3892793.
line 4
How many results is too many results? Let’s say we’ll display the message for anything more than 50,000 results…
line 5
Time for some more jQuery! 🙂
jQuery lets you add new bits of HTML to a page, so let’s create a new div — that will appear underneath the results summary message — by appending it to that existing h1. Just to show off, we’re going to have the helpful hint gradually fade in, so we’ll pop the text within its own span that has an id value of refineSearchHelp and we’ll style it so it’s initially hidden (display:none).
In case you’re wondering, I added that space character   just so that the div contains something to start off with, which should ensure the page doesn’t suddenly jump as the hint fades in.
line 6
So, now that we’ve got our helpful hint in a hidden span, let’s wait a second (delay(1000) …OK, we’ll actually wait 1,000 milliseconds!) before letting the message gradually fade in (fadeIn(1000)).
line 7
We’ve got to balance the books, so for every brace and bracket we’ve opened, we need to close them, otherwise the web browser might get upset.
Disclaimer!
Dropping jQuery into Summon isn’t officially supported by Serials Solutions, so be sure to take full responsibility for anything to do and thoroughly test it to make sure you’ve not broken Summon for your users, otherwise they’ll be grumpy.
The other thing to be aware of is that Summon is in a state of coninual development, so you’ll need to test any tweaks you’ve made after each update (to make sure that they still work) and that they don’t conflict with any changes Serials Solutions have made to the Summon HTML.
Appendum
By subverting the “Custom Link” option to insert the JavaScript file, you lose the opportunity to add in a normal custom link (this appears to the left of the “Help | About | Feedback” options at the top right of the Summon interface)… or do you?
Well, there’s absolutely no reason why you can’t use jQuery to do that and, in fact, rather than just having one custom link, you could add 2 or 3…

$('#topbar .link').prepend('<a href="https://library.hud.ac.uk/wiki/">A to Z List of Electronic Resources</a>');

The default links appear in a div with a class of link, which has a parent div with an id of topbar. To add in our new extra link before those existing links, we have to prepend it.

The Revolution will not be Catalogued

.@paulwalk Librarians! Cast off the chains of MARCist oppression! Smash the Spinning Booleans! Vive la Révolution Bibliothèque!

— Dave Pattern (@daveyp) May 5, 2012