Dave – Page 3 – Self Plagiarism is Style

Kitteh! Kitteh! Kitteh!

What a weird 10 hours we’ve had!

Just as we were getting ready for bed last night, we heard a loud and slightly distressed “meow” from outside the house. Fearing a cat had been run over, we went out to investigate but couldn’t see anything in the darkness and the meowing had stopped. After a quick rummage for a torch, we checked the front of the house again, and, just to be sure, had a quick look around the back garden — again, nothing.

On the way back round the side of the house, I noticed something small by the side of the rubbish bins — a little ginger kitten cowering in the torchlight.

After scooping her up and bringing her indoors, Bryony checked with the neighbours to see if anyone was missing a kitten — no-one was. However, a neighbour said a stray cat had been asking for food earlier on yesterday. On her way back, Bry spotted the mother pacing around and heading back into our garden. So, back out with the torch and this time we found two more kittens.

After eventually coaxing the mother (who’s quite tame but looks very young) to come into the house, we had a couple more checks just to make sure we’d got the entire litter — hopefully we have, as the mother stopped fretting once she saw the three were safe and sound (we’ve checked the garden again this morning).

We’re pleased to say all four are doing fine this morning and have pretty much drunken all of Elmo‘s cat milk supply (which we give her as a treat) and chomped their way through plenty of food. Not surprisingly, Elmo’s sulking about not having the house to herself.

Hopefully the local Cats Protection League will be able to come and collect the family and find them a good home, especially as they all seem in good health.

If your heart has been melted by kitteh cuteness — ours certainly have been! — please feel free to make a small donation to Cats Protection (we’ll be making one and I’m happy to match your donation as well if you let me know) 🙂

———————

[update 16/Sep/2011]

The local branch of the Cats Protection League will be coming to collect the family on Monday, so we’ve got a Kitten Weekend to look forward to! A big “thank you” to everyone who’s donated to the CPL — we’ll be matching those with a donation to the main charity, along with another donation to the local branch.

The mother and kittens are in fine fettle and eating us out of house and home!

If you want an overdose of cuteness, there are lots more photos on Flickr 🙂

E. Henry Thripshaw’s Disease

Over the last 5 years or so, I seem to have developed a habit of mistyping words and I’m curious to know if anyone else does it? …or can I name the condition after myself? 😀
As a bit of background, I’m a touch-typist, so type fairly quickly without looking at the keyboard.
Anyway, what happens is that I’ll start typing one word, but I’ll mash that word up with the next word, before I then type that next word. What’s particularly annoying/frustrating is that I only seem to do it if the mashed word is itself a valid, correctly spelt word.
As an example, I was just typing an email with the phrase “…you’ll end up with the same end result”, except what I actually typed was “…you’ll end up with the send end result”.
On a bad day, I can do this several times in a single email (or a blog post). So, before I hit the send button, I’ll usually have to re-read the email a couple of times to try and catch the glitches (as the spellchecker doesn’t catch them).
So, am I the only person in the world who does this?!

5 years of book loans and grades (revisited)

Nigel’s comment on the “5 years of book loans and grades” post reminded me that I did do a breakdown by discipline of the same data.
One of the caveats with this is that it represents nearly a decade’s worth of usage and, during that time, the seven academic schools at Huddersfield have changed — e.g. some courses and subjects have moved from one school to another.
Music, Humanities and Media

In terms of books, the students in this school rack up highest average number of loans.
Business

Business students’ borrowing is much lower, but considerably more stable across the 5 years of graduates.
Computing and Engineering

I guess there are no surprises here — when I did my HND in Computing at Huddersfield in the 1990s, I only visited the library once 😀
Education and Professional Development

There’s something interesting going on here — the borrowing levels for firsts and thirds is very similar, with 2:1 and 2:2 being lower. Very curious!
Human & Health Sciences

Applied Sciences

From memory, Applied Sciences make much higher usage of journals than books.
Art, Design and Architecture

The art stock is much more likely to be used within the library, rather than loaned.

Librarian/Shambrarian Venn Diagram

To go with Ned’s “Great Library Stereotypometer“, which seems to be lacking one vital item, here’s a handy Venn Diagram…

You may find it useful to copy the diagram out onto a small piece of card and keep about your person for reference purposes.
If you are a librarian and you meet a shambrarian:

DO ask questions such as “would you like some more cake?” and “what is your favourite cake anecdote?”
DO feel free to compliment the shambrarian if they are wearing a particularly witty t-shirt
DO NOT bore the shambrarian by talking about your recent holiday tour of “Ye Olde Gin Palaces of London Town” or by reciting verbatim your top 50 gin based cocktail recipes
DO NOT attempt to sexually arouse the shambrarian by showing them photographs of library porn (e.g. this, this, this or this)
UNDER NO CIRCUMSTANCES should you say “if all the librarians got together, we could easily index the entire web… probably using an index card based system”

If you are a shambrarian and you meet a librarian:

DO ask them questions such as “where is your closed stack¹?” and “what is the Dewey classification for Chocolate Guinness Cake?”
DO feel free to compliment the librarian if you think that they have particularly nice cupcakes
DO NOT bore the librarian by showing them your Roy Tennant Fan Club membership card
DO NOT embarrass the librarian by asking them if “colon classification” means what you think it means
DO NOT attempt to sexually arouse the librarian by showing them photographs of shambrarian porn (e.g. this, this, this or this)
UNDER NO CIRCUMSTANCES should you say “Google Scholar is much better than that *very* expensive product your library just bought”

¹ The “closed stack” is where librarians store their cakes and usually has a “NO ENTRY — LIBRARIANS ONLY” sign on the door. If the librarian does not have enough room in their office, the closed stack may also be used to house the library’s gin distillery.

Co-operative Group advertising in Rupert Murdoch’s newspapers

a copy of the email I’ve just sent to the Co-operative, who I’ve banked with all my working life…

To: customer.relations@co-op.co.uk
Dear Sir or Madam
I am writing to express my disappointment that the Co-operative Group has decided to continue advertising in Rupert Murdoch’s newspapers: http://bit.ly/mzXiTI
As a banking customer of nearly 20 years, I have been extremely proud of your ethical stance. However, I believe that this is compromised by your public support of his newspapers at this time. By continuing to advertise in them, I also feel that the Co-operative Group is implicitly condoning their unlawful and highly immoral reporting practices.
In order to help me decide whether or not to close my bank account, I would be grateful if you could respond within 7 days with an explanation as to why you believe your continued advertising is in the best interest of your existing members and customers.
yours
Dave Pattern

Extending the availability messages in Summon

At the recent SummonCamp in New Orleans, there was a question about the local “Availability:” messages that appear in Summon for things like books, e.g.

Availability: available, Huddersfield (Loan Collection Floor 6 – 2 wk loan)

By default, Summon either scrapes your OPAC or makes use of an ILS/LMS API to get real time availability. If neither are available, or if the OPAC takes too long to respond, a “check availability” message appears instead (which typically links through to the item page on the OPAC).
Early on in our Summon implementation, we were concerned about the potential impact on our OPAC — SirsiDynix HIP — of screen scraping. In particular, HIP wasn’t designed to be scraped like this or to be indexed by search engines (many Horizon sites deliberately block Google et al from indexing their HIP) and it creates a new session ID for each request. As each new session takes up some of the OPAC server’s resources, there’s a theoretical limit to the number of concurrent sessions the OPAC can maintain before slowing down (or even crashing). Also, if you’ve done a search in Summon that delivers 25 book results, it takes time for the OPAC to respond to the 25 HTTP requests generated by Summon, and so you often end up getting the “check availability” message anyway.
So, working with Andrew Nagy at Serials Solutions, we implemented a very basic DLF XML web service (code and brief documentation available here) that bypasses our OPAC and pulls the live availability data straight from the Horizon database. Not only does it ensure the OPAC doesn’t take a performance hit, it’s also extremely fast (especially if you run it using mod_perl with a persistent database connection to Horizon) — you can see a typical response (for this book) here: library.hud.ac.uk/perl/summon/dlf.pl?497856
In his Code4Lib Journal article — “Hacking Summon” — Michael B. Klein talks about enhancing an availability API to include extra info and even embedded hyperlinks. This would also be a great way of including item level hold/request functionality into Summon.
At Huddersfield, we’ve done something similar to Michael for our e-resource/database level links, e.g.:

Availability: available, online resource (University network login required)

To help with known item searching, we’ve created some dummy MARC records on our library catalogue for most of the resources listed on our e-resources wiki and these get pushed out to Summon (in the same way that book MARC records do). If the user clicks on the result, they get passed through to the relevant wiki page. However, we also decided we wanted to try and save the user a mouse-click by embedding the actual URL to the resource into the availability message.
To do this, we extended the DLF script so that it detects when an incoming availability request from Summon is for one of the dummy MARC records (rather than a book). The script then does the following:

as the link to the wiki page for that resource is part of the dummy MARC record (the 856 field), it extracts that URL up from the record in Horizon
it then web scrapes that wiki page to extract the actual link to the e-resource (in this particular case, it’s an EZproxy’d link)
the DLF XML is then generated, including the link: library.hud.ac.uk/perl/summon/dlf.pl?646531

One thing that we’ve not done yet, but plan to do, is to include an extra step that queries our E-Resources Blog to check if there are any known problems for that e-resource. If there were, then a link through to the relevant blog post would also be included.

Relevancy ranking in Summon

Yesterday, Tim Fletcher tweeted me a question about Summon:

How does Summon rank results? is there a logic?

…it’s not the kind of question that you can answer in 140 characters, but I quickly knocked off an email to Tim. This morning David F. Flanders suggested I should also blog the response.
So, first of all, a quick caveat: much of the following was gleaned from various presentations over the last couple of years or so and may not be 100% accurate (I’m particularly good at misunremembering stuff!)
The first time I saw Summon (back in early 2009), I believe Serials Solutions were still using the default relevancy ranking that comes with the Open Source Lucene software (which is documented here). In a nutshell, Lucene generates a score for each indexed item (that matches the search query) and then those items are sorted by score (in descending order) to produce the ranked results.
I’ve read quite a few times that the relevancy ranking engine in Lucene is regarded as one of the best, which might be one of the reasons why SirsiDynix recently moved Enterprise from using Brainware to Lucene.
When you mention Lucene, chances are Solr won’t be too far behind. Solr (which is also Open Source) extends Lucene to provide a host of extra features, including facets.
As Summon has developed, and in response to customer feedback, Serials Solutions have gradually tweaked the way their Lucene installation generates the scores by giving each result an additional boost (or reduction) depending on a variety of factors, including:

Currency – newer items are given a slight boost over older items
Content type – books, ebooks and journal articles get a boost to their scores, whilst newspaper articles and book reviews have their scores reduced
Local collections – things that come from the user’s library (e.g. books, repository items, local archives, etc) get a little boost

Additionally, the Summon search engine handles certain words and phrases differently. For example, Lucene normally treats the singular and plural version of words as the same, so searches for “africa hospital” and “africas hospitals” both bring back roughly the same number of results. However, Summon understands that “africa aid” isn’t the same thing as “africa aids“.
Given that few users go beyond the first page of results (I was told the exact figure last week, but it’s slipped from my memory — I think it was less than 5%?), Serials Solutions put a lot of effort into trying to ensure that the most relevant results appear on that first page. Given that the Summon master index is fast approaching 1,000,000,000 items, that’s no trivial task!
As they say, the proof of the pudding is in the eating, so feel free to run some searches on our Summon instance to see how well you think it ranks the results.

Hurricanes and shrimp po’boys (part 1)

I’m jetlagged (this is the first time I’ve had jetlag that feels like being drunk) and still coming down from an-ALA induced high, but here goes a blog post!
I’m currently fortunate enough to be a member of the Serials Solutions Summon Advisory Board, and last week saw the fourth pre-ALA meeting, this time in the one and only New Orleans, the home of hurricane cocktails, shrimp po’boys, high heat & humidity and more seafood than you can shake a stick at…

(seafood platter at the Grand Isle Restaurant)
Summon Advisory Board notes

there are now more than 250 Summon customers around the world
the company is currently concentrating on comprehensiveness (in terms of coverage and seamless access to articles)
gone are the days when Serials Solutions had to approach publishers and argue the case for them to make their content in Summon — most publishers now realise the value and are approaching the company directly to have their content added
John Law’s manta is currently “relevancy, relevancy, relevancy!” — with 800,000,000 items in Summon, relevancy is key to ensuring the user gets the right articles on the first page of results
it wasn’t until I saw some demo searches that the awesomeness of the deal with HathiTrust Collection integration began to sink in — librarians of the world, this truly is a game changer! (on a practial note, it’s going to take Serials Solutions a little while to complete the indexing of the entire HathiTrust Collection)
a pilot with JSTOR means that a Summon search box is integrated into the JSTOR web site interface — it appears when a JSTOR search produces only a small number (or zero) results, so that the user’s search can be expanded to other journal platforms
due to being en route from the UK to New Orleans, I’d missed this annoucement, but the long-awaited deal with Elsevier has been signed
for journal articles, Serials Solutions create “super records” that combine the best metadata from multiple sources — this is de-duping on steroids!
coming soon — discipline searching (currently 63 subject disciplines have been defined, which work at the journal title and journal article level)
coming soon — new article linking improvements (when relevant, Summon results will link directly to the article abstract page on the supplier’s platform, instead of using OpenURLs)
Daniel Forsman (Chalmers University of Technology, Gothenburg, Sweden) suggested that we should promote Summon to our users as being more comprehensive that Google Scholar
although librarians often get hung-up on what’s not in Summon, some analysis by a Summon customer indicated that the non-indexed content is often low quality “filler material” added by aggregator platforms to bump up journal totals

(a bourbon nightcap after the Advisory Board Meeting)

EZproxy and Summon

I’ll flesh out this blog post later on today, but just wanted to post some screenshots (partly as a rebuttal to Nicole’s blog post “Some thoughts about (authentication) discovery aimed at librarians“) to show how well EZproxy fits as the authentication layer between a discovery service (such as Summon) and journal articles on publisher sites.
As Nicole well knows, I’m not a librarian and I couldn’t give two hoots about the “official CILIP endorsed librarian way of doing things” (n.b. my quote, not Nicole’s) when it comes to e-resource access. All I care about is trying to get the user to where they want to be (e.g. the full text of a journal article) with the least number of mouse clicks, and the least amount of swearing, frustration and death-threats against the library for making it so flippin’ difficult ;-D
[edit] Apologies — I didn’t mean to imply that librarians don’t care about users. I just took offence when I felt Nicole’s post implied this was a librarian problem and/or that librarians were the root cause of the problem. As I’m not a librarian myself, I felt it was wrong to infer that anything I say or do is endorsed by, or represents, librarianship in general, or is the way a librarian would choose to do it. To the best of my knowledge, librarians perfer not to have barriers (such as stupidly complicated publisher log in pages) in the way when it comes to accessing information.
This first example is about as good as it gets. A student uses Summon to locate an article (“Ethics, Public Policy, and Global Warming”)…

…when they click on the article link, Summon opens a new browser window and passes the OpenURL details for the article to the link resolver (360 Link). If the user isn’t already authenticated (e.g. by accessing Summon via the University Portal or via the VLE), they’ll need to log in. If they have already authenticated, then they don’t see this screen at all.
The login process logs the user into EZproxy, and also establishes an Athens session in the background (which isn’t required to access the article, but might be useful it they end up wandering off to look at other resources)…

…as this particular article is on JSTOR, the user is able to view the article straight away (via the “Page Scan” preview) or they can choose to download the PDF…

So, from Summon, there’s either a single click (if the user has already authenticated) or two clicks (if the user needs to log in) to get to the full-text (or a page that has a link to the PDF). Ignoring the ethical/moral/technical/philosophical issues of using a proxy solution instead of Shib, I think this is as good as it gets for students.
If they do have to authenticate, it’s a familiar login page and they’re not having to figure out which link on the publisher’s web site to use — do they try putting their university network login details into the username & password fields (1), do they scroll through a list of nearly 200 institutions (2) to find Huddersfield (and are we “University of Huddersfield” or “Huddersfield University”?) …or can they remember that the librarian told them to look for the “Athens” link (3) during the library induction all those months ago?

Plus, if they’ve found this article via Google Scholar, how do then even know if they have access to it? If you want to frustrate a student, nothing does it better than pointing them at a useful article that they can’t access ;-D
This doesn’t mean that there isn’t a role for Shib/Athens, but I feel it’s a different part of the jigsaw puzzle. If I’m an off-campus Huddersfield student wanting to get to ScienceDirect, there’s lots of ways to get there, but one of the simplest is to just Google “science direct huddersfield” (we don’t tell students about EZproxy, so they would never include that as a search term on Google)…

…where the first result takes them through to our electronic resources wiki page for ScienceDirect (which is where most of the other routes end up)…

…the first “Access Link” is the Athens link to ScienceDirect and the (slightly superfluous) note beneath is really just for students who’ve gone directly to ScienceDirect and who aren’t sure which of the various login options to select on the site.

CILIP Cymru Conference

I’m journeying down to Llandrindod Wells tomorrow to give a presentation about usage data to the Welsh Libraries, Archives and Museums Conference (hashtag #cilipw11). I’ve been promised that there’ll be real ale there 🙂
You can grab a draft copy of my presentation (“If you want to get laid, go to college…”) from here (15MB).
The main web links in the presentation are:
– JISC Library Impact Data Project
– JISC Activity Data Programme (including a list of the projects)
– Rufus Pollock (Open Data and Componentization, XTech 2007)
– Paul Walk (“The coolest thing to do with your data will be thought of by someone else”)
– University of Huddersfield – Open Data Release (from Dec 2008)