Using the Serials Solutions APIs for the MyReading project

dallas_063
I had planned to go along to SummonCamp at ALA Midwinter on Sunday and talk about using the Summon API but, perhaps all too predictably, I ended up staying up waaaaay too late on Saturday night sampling some yummy US beers, forgot to set my alarm and overslept 🙁
Anyway, here’s what I would have talked about if I hadn’t been asleep at the time…
MyReading Project
For the last 12 months, I’ve been working on developing reading list software for the University of Huddersfield (home page and blog). By making use of both the Summon and 360 Link APIs, I’ve been able to cut down development time and also improve the functionality of the software for both staff and students.
360 Link API
E-journals and e-journal articles make up about 15% of all the reading list references in the software. One of the primary issues was how to provide accurate links to that material and how to ensure those links are updated whenever we change e-journal subscriptions or database platforms. On top of that, we also needed to ensure that authentication was as seamless as possible. Seeing as our link resolver (360 Link) already does all of the above, it made sense to use that.
So, for journal and article references, we’re storing the OpenURL so that we can query the 360 Link API on-the-fly to fetch back current access links. As 360 Link also handles the creation of EZProxy URLs for authentication, the API will return EZProxy prepended URLs when relevant.
If we take this reference to Iodine status of UK schoolgirls: a cross-sectional survey from The Lancet, we’ve stored the OpenURL as part of the reference:

By calling the 360 Link API with the above OpenURL, we can get back a page of XML.
At the time of writing, the ssopenurl:linkGroups element contains a couple of ssopenurl:linkGroup elements of type holding which, in turn, contain the current article access links for SwetsWise Online Content and ScienceDirect Journals.
So, as long as we’ve got an accurate OpenURL for a reference, we should be able to automatically insert the correct access links into the reading list. But, how do you get the OpenURL in the first place…?
Summon API
Once staff are logged into the reading list software, they’ll find an option to import any result from Summon as a reference into one of their reading lists…

Although Summon doesn’t officially support modifications like this, unofficially it’s possible to execute jQuery by hacking in a link to suitable JavaScript via the “Custom Link” option within the Summon Administration Console…

As doing this isn’t officially supported by Serials Solutions, it’s possible that it could stop working at any time. But, until that day comes, it’s a useful way of making minor tweaks to the Summon interface 😉
I’m only a beginner with jQuery, so the following might not be the most efficient and/or elegant way of adding the custom links, but it does the job…

$(document).ready(function(){ doMyReading( ); });
function doMyReading( )
{
  $( '.metadata' ).each(function(intIndex)
  {
    var myReadingDocID = $( this ).parent().parent().parent().parent().parent().parent().parent().attr("id");
    if( myReadingDocID )
    {
      $( this ).append( '<div style="margin-top:3px;background:#004088;color:#ccf;padding:3px 8px;font-size:98%; white-space:nowrap;">item options: <a title="add this item to MyReading" style="color:#fff;" href="http://library.hud.ac.uk/myreading/perl/admin/import_summon.pl?id='+myReadingDocID+'">add to MyReading</a></div>' );
    }
  });
}

…the important bit is that we grab the document ID value for the result (myReadingDocID in the above), which we can then use to retrieve the exact same result via the Summon API.
When the staff user clicks on the “add to MyReading” link, the reading list software uses the document ID to pull in the reference’s details from the Summon API and automatically populates the reference form…

…which includes the OpenURL and DOI, both of which can subsequently be used to query the 360 API to fetch access links 🙂
We can also use the document ID to retrieve the article’s subject terms and abstract from Summon…

Summary
So, in summary, we’ve used the APIs to:

  1. avoid having to manually maintain links to e-journal content
  2. make it both quicker and easier for staff to add items from Summon (which currently encompasses over 600,000,000 items!) to reading lists
  3. enhance records by bringing in abstracts and subject terms from Summon

Extending the availability messages in Summon

At the recent SummonCamp in New Orleans, there was a question about the local “Availability:” messages that appear in Summon for things like books, e.g.

Availability: available, Huddersfield (Loan Collection Floor 6 – 2 wk loan)


By default, Summon either scrapes your OPAC or makes use of an ILS/LMS API to get real time availability. If neither are available, or if the OPAC takes too long to respond, a “check availability” message appears instead (which typically links through to the item page on the OPAC).
Early on in our Summon implementation, we were concerned about the potential impact on our OPAC — SirsiDynix HIP — of screen scraping. In particular, HIP wasn’t designed to be scraped like this or to be indexed by search engines (many Horizon sites deliberately block Google et al from indexing their HIP) and it creates a new session ID for each request. As each new session takes up some of the OPAC server’s resources, there’s a theoretical limit to the number of concurrent sessions the OPAC can maintain before slowing down (or even crashing). Also, if you’ve done a search in Summon that delivers 25 book results, it takes time for the OPAC to respond to the 25 HTTP requests generated by Summon, and so you often end up getting the “check availability” message anyway.
So, working with Andrew Nagy at Serials Solutions, we implemented a very basic DLF XML web service (code and brief documentation available here) that bypasses our OPAC and pulls the live availability data straight from the Horizon database. Not only does it ensure the OPAC doesn’t take a performance hit, it’s also extremely fast (especially if you run it using mod_perl with a persistent database connection to Horizon) — you can see a typical response (for this book) here: library.hud.ac.uk/perl/summon/dlf.pl?497856
In his Code4Lib Journal article — “Hacking Summon” — Michael B. Klein talks about enhancing an availability API to include extra info and even embedded hyperlinks. This would also be a great way of including item level hold/request functionality into Summon.
At Huddersfield, we’ve done something similar to Michael for our e-resource/database level links, e.g.:

Availability: available, online resource (University network login required)


To help with known item searching, we’ve created some dummy MARC records on our library catalogue for most of the resources listed on our e-resources wiki and these get pushed out to Summon (in the same way that book MARC records do). If the user clicks on the result, they get passed through to the relevant wiki page. However, we also decided we wanted to try and save the user a mouse-click by embedding the actual URL to the resource into the availability message.
To do this, we extended the DLF script so that it detects when an incoming availability request from Summon is for one of the dummy MARC records (rather than a book). The script then does the following:

  1. as the link to the wiki page for that resource is part of the dummy MARC record (the 856 field), it extracts that URL up from the record in Horizon
  2. it then web scrapes that wiki page to extract the actual link to the e-resource (in this particular case, it’s an EZproxy’d link)
  3. the DLF XML is then generated, including the link: library.hud.ac.uk/perl/summon/dlf.pl?646531

One thing that we’ve not done yet, but plan to do, is to include an extra step that queries our E-Resources Blog to check if there are any known problems for that e-resource. If there were, then a link through to the relevant blog post would also be included.

Relevancy ranking in Summon

Yesterday, Tim Fletcher tweeted me a question about Summon:

How does Summon rank results? is there a logic?

…it’s not the kind of question that you can answer in 140 characters, but I quickly knocked off an email to Tim. This morning David F. Flanders suggested I should also blog the response.
So, first of all, a quick caveat: much of the following was gleaned from various presentations over the last couple of years or so and may not be 100% accurate (I’m particularly good at misunremembering stuff!)
The first time I saw Summon (back in early 2009), I believe Serials Solutions were still using the default relevancy ranking that comes with the Open Source Lucene software (which is documented here). In a nutshell, Lucene generates a score for each indexed item (that matches the search query) and then those items are sorted by score (in descending order) to produce the ranked results.
I’ve read quite a few times that the relevancy ranking engine in Lucene is regarded as one of the best, which might be one of the reasons why SirsiDynix recently moved Enterprise from using Brainware to Lucene.
When you mention Lucene, chances are Solr won’t be too far behind. Solr (which is also Open Source) extends Lucene to provide a host of extra features, including facets.
As Summon has developed, and in response to customer feedback, Serials Solutions have gradually tweaked the way their Lucene installation generates the scores by giving each result an additional boost (or reduction) depending on a variety of factors, including:

  • Currency – newer items are given a slight boost over older items
  • Content type – books, ebooks and journal articles get a boost to their scores, whilst newspaper articles and book reviews have their scores reduced
  • Local collections – things that come from the user’s library (e.g. books, repository items, local archives, etc) get a little boost

Additionally, the Summon search engine handles certain words and phrases differently. For example, Lucene normally treats the singular and plural version of words as the same, so searches for “africa hospital” and “africas hospitals” both bring back roughly the same number of results. However, Summon understands that “africa aid” isn’t the same thing as “africa aids“.
Given that few users go beyond the first page of results (I was told the exact figure last week, but it’s slipped from my memory — I think it was less than 5%?), Serials Solutions put a lot of effort into trying to ensure that the most relevant results appear on that first page. Given that the Summon master index is fast approaching 1,000,000,000 items, that’s no trivial task!
As they say, the proof of the pudding is in the eating, so feel free to run some searches on our Summon instance to see how well you think it ranks the results.

Hurricanes and shrimp po'boys (part 1)

I’m jetlagged (this is the first time I’ve had jetlag that feels like being drunk) and still coming down from an-ALA induced high, but here goes a blog post!
I’m currently fortunate enough to be a member of the Serials Solutions Summon Advisory Board, and last week saw the fourth pre-ALA meeting, this time in the one and only New Orleans, the home of hurricane cocktails, shrimp po’boys, high heat & humidity and more seafood than you can shake a stick at…
nola_307
(seafood platter at the Grand Isle Restaurant)
Summon Advisory Board notes

  • there are now more than 250 Summon customers around the world
  • the company is currently concentrating on comprehensiveness (in terms of coverage and seamless access to articles)
  • gone are the days when Serials Solutions had to approach publishers and argue the case for them to make their content in Summon — most publishers now realise the value and are approaching the company directly to have their content added
  • John Law’s manta is currently “relevancy, relevancy, relevancy!” — with 800,000,000 items in Summon, relevancy is key to ensuring the user gets the right articles on the first page of results
  • it wasn’t until I saw some demo searches that the awesomeness of the deal with HathiTrust Collection integration began to sink in — librarians of the world, this truly is a game changer! (on a practial note, it’s going to take Serials Solutions a little while to complete the indexing of the entire HathiTrust Collection)
  • a pilot with JSTOR means that a Summon search box is integrated into the JSTOR web site interface — it appears when a JSTOR search produces only a small number (or zero) results, so that the user’s search can be expanded to other journal platforms
  • due to being en route from the UK to New Orleans, I’d missed this annoucement, but the long-awaited deal with Elsevier has been signed
  • for journal articles, Serials Solutions create “super records” that combine the best metadata from multiple sources — this is de-duping on steroids!
  • coming soon — discipline searching (currently 63 subject disciplines have been defined, which work at the journal title and journal article level)
  • coming soon — new article linking improvements (when relevant, Summon results will link directly to the article abstract page on the supplier’s platform, instead of using OpenURLs)
  • Daniel Forsman (Chalmers University of Technology, Gothenburg, Sweden) suggested that we should promote Summon to our users as being more comprehensive that Google Scholar
  • although librarians often get hung-up on what’s not in Summon, some analysis by a Summon customer indicated that the non-indexed content is often low quality “filler material” added by aggregator platforms to bump up journal totals

nola_323
(a bourbon nightcap after the Advisory Board Meeting)

Summon 4 HN — bits o' code

As part of the JISC Summon 4 HN project, we’ll be releasing some chunks of code that I’ve knocked together for our Summon implementation at Huddersfield.
The code will cover these areas:

  1. updating Summon with MARC record additions, updates and deletions from Horizon
  2. providing live availability information from Horizon without resorting to screen-scraping the OPAC
  3. customising 360 Link using jQuery

In theory, the first 2 might also be of interest to Horizon sites that are implementing an alternative OPAC (e.g. VuFind or AquaBrowser) where you need to set up regular MARC exports. The latter might be of interest to 360 Link sites in general.
Keep an eye on the Project Code section of the Summon 4 HN blog for details of the code 🙂


I couldn’t find a relevant photo for this blog post, so instead, let’s have another look at those infamous MIMAS #cupcakes from ILI2009 🙂
ili2009_013

Here comes Summ(er|on)

It’s probably a sign of getting old and decrepit, but this year has just flown by — it doesn’t seem like two minutes since we kicked off our implementation of Serials Solutions’ Summon and now it’s gone fully live (it actually went fully live halfway through the Mashed Library event we ran the other week).
woods_004
The bulk of the implementation was done and dusted by early January 2010, and the majority of the implementation time was spent populating 360 Link (the Serials Solutions link resolver) with our journal holdings — a task our Journals Team found much easier than when we implemented SFX back in 2006.  As the plan had always been to run Summon in parallel to MetaLib during the 2009/10 academic year, it meant we had lots of time to play and tweak. 
We flipped the link resolver over from SFX to 360 Link in late January and then formally “soft” launched Summon during the University’s Research Festival in early March.  Throughout the academic year, usage of Summon has been growing and the vast majority of the feedback has been positive 🙂
As part of the JISC Summon4HN Project, we’ll be documenting the implementation and releasing chunks of code that we hope might be of use to the community, including:

  • code for automating the export of deleted, new and updated MARC records from Horizon so that they can be imported into Summon (or VuFind, AquaBrowser, etc)
  • code for creating “dummy” journal title records (so that known journal titles can be easily located in Summon, e.g. American Journal of Nursing)
  • a basic mod_perl implementation of the DLF spec for exposing availability data for library collections
  • details of the various tweaks we’ve made to our 360 Link instance

Also, as part of the roll out of Summon, we’ve been revamping our E-Resources Wiki to provide a browseable list of resources — as with the journal titles, we’ve been dropping dummy MARC records into Summon so that known resources can be located via a search (e.g. Mintel Reports).