OPAC search cloud and failed searches

Seeing as I’ve got my head in the clouds at the moment, here’s one showing the most popular keyword search words used on our OPAC during the last 6 months…
To be honest, there aren’t too many surprises in there — students studying business & law and the health sciences are the heaviest users of the library.
Unlike Yahoo, not a single person has done a search for “Britney” on our OPAC in the last 6 months …and “yes”, you would get a relevant hit if you did ๐Ÿ˜‰
I’ve also separated out words that appear in failed keyword searches (i.e. they produced no hits) and removed those which did appear in other successful searches — this gives a list of keywords that probably don’t match anything on the catalogue:

  1. newspapermen (96)
  2. socail (90)
  3. buisness (84)
  4. brantingham (74)
  5. renew (74)
  6. metalib (73)
  7. reserach (72)
  8. mortor (67)
  9. vehclos (66)
  10. gieber (63)
  11. thoery (63)
  12. writting (62)
  13. psycology (59)
  14. contempory (58)
  15. donky (51)
  16. facism (47)
  17. reserch (46)
  18. reasearch (39)
  19. ans (38)
  20. hypodermic (38)
  21. ielts (38)
  22. televison (38)
  23. estimation (37)
  24. priciples (36)
  25. superficial (36)
  26. immanual (35)
  27. infomation (34)
  28. ligament (34)
  29. tuberclosis (34)
  30. centuary (33)
  31. resourse (33)
  32. topshop (33)
  33. treetment (33)
  34. devlopment (32)
  35. petherick (32)
  36. proffesional (32)
  37. quantitive (32)
  38. stamps (32)
  39. theorys (32)
  40. enviromental (31)
  41. pschology (31)
  42. statistic (31)
  43. syringe (31)
  44. hanbook (30)
  45. simnet (30)
  46. stratergy (30)
  47. intoduction (29)
  48. pestel (29)
  49. physio (29)
  50. pratice (29)

The words in bold are valid spellings (according to Microsoft Word) and the figure in brackets is the number of separate searches that contained the word.
Compared to the cloud, this is much more interesting…
1) many of them are simple typos — another good reason to add a spellchecker to your OPAC if you haven’t got one!
2) the fifth most common word is “renew” — are our users trying to renew their books by typing the word into the OPAC, or are they expecting the OPAC to work like a search engine and return something like “How to renew your books” as the first result?
3) the sixth most common word is “metalib” — it looks like a lot of people are trying to find help on using MetaLib in the OPAC… maybe we should create a dummy catalogue record that contains 856 links to MetaLib and our Electronic Resources Wiki?
4) “mortor” is an oddity in the list… but the entry for “pestel” near the end makes me wonder if people were searching for “mortar and pestle”?
Outside of the top 50, there are some other interesting failed keywords (with links to Wikipedia or other sites when relevant):

Lending paths

Whilst working on Pewbot, I wondered if you could really predict the future borrowing pattern of a user based on a specific book — in other words, if they borrow book X will they then go on to borrow book Y and then book Z?
Anyway, I’ve knocked together a basic script that will extrapolate the most likely lending path (both past and future) for a specific book.
For example, here’s the lending path for “Learning SQL: a step by step guide using Oracle”:
The book in question is displayed in bold. The title directly before it (“Java: the first semester”) is the title that is most frequently borrowed prior to “Learning SQL”, and the one directly after (“Database systems: a practical approach to design…”) is the most likely to be borrowed subsequently.
In turn, I then continue to extrapolate the paths in either direction until I run out of data or a title gets duplicated.
What we end up with is a hypothetical path showing what someone is most likely to have borrowed previously, and will then go on to borrow in the future.
What’s interesting is the flow of subjects along the path — the books before are all IT books, but the future path flows into HCI, IT management, and then into corporate strategy and business titles.
If you click on a book title, then it’ll take you though to the OPAC. If you click on the “path” link, then you’ll see the lending path for that particular title.
Once you’re in the OPAC, there’s a link to the lending path at the foot of every full bib page (although the path can only be generated if there’s enough raw circulation data).
If nothing else, it proves that our students are sensible enough to borrow the Harry Potter books in the correct order! ๐Ÿ˜€

A serendipity of clouds

Insipired by the BBC Radio 1 tag cloud mentioned by Richard Wallis on the Panlibus blog, I quickly threw a couple together for the most recent search keywords used on our OPAC:


The pages use Ajax and should automatically refresh with updated content every few seconds (assuming that someone has been searching the OPAC recently).
No points for guessing that the larger the font, the more times the word has been used in recent searches!

white dog poo?

Sarah Houghton (aka LibrarianInBlack) has blogged that Answers.com has a new natural language “Web Answers” feature which lets you pose life’s great unanswerables – e.g.:

(I should point out that Sarah didn’t pose that exact question, but it’s one that’s been niggling at the back of my mind for years!)
After reverse engineering the new feature, it looks like they’re using Ajax and XML – e.g.:

Some of you will already know that we’ve been using Answers.com on our OPAC to provide serendipity keyword suggestions, so I’ll have a go incorporating the “Web Answers” output into those suggestions too.

OPAC keyword email alerts

One of the medical conditions I suffer from is the common “not-enough-hours-in-the-day-itus” — bits of software and new stuff gets prototyped or developed to the proof-of-concept stage, and then put to one side when something more important comes up.
This is something I originally coded in January 2006, briefly blogged about in mid February, then got slightly miffed when Hennepin County Library went live with something similar, and finally almost managed to forget all about it!
Anyway, I’ve dusted off the code and plugged it into ye olde OPACeth.  All I can do now is sit back and see if anyone will actually use it!
Continue reading “OPAC keyword email alerts”

Getting HIP updates & add-ons via a HTTP proxy

A couple of years ago we wanted to try out the optional ADA Profile for HIP 3 but, try as I might, I could not get the add-on to download using the HIP admin pages.
After much pondering, I realised that it’s because our external firewall was blocking HIP from being able to connect to the SirsiDynix server to fetch the download. Even the servers in our DMZ need to be configured to use the university’s Squid HTTP proxy servers before they can get external web access.
Google soon came up with the answer on one of the JBoss discussion sites and here’s what you need to do for a Windows HIP 3 server (it should be a similar process for a Unix/Linux HIP 3 server):
1) locate the batch file that starts JBoss — firstly, find the directory you installed the Application Server into, then open the “jboss” folder, then the “bin” folder, and you should find a Windows Batch File named “run.bat”
2) make a safe backup of the “run.bat” file before you make any changes
3) right-click on the “run.bat” file and select the “edit” option — this should open the file for editing in Notepad
4) if you search through the file, you’ll find several lines that start with set JAVA_OPTS=
5) find the very last occurance, and insert the following two new line of text after it:

set JAVA_OPTS=%JAVA_OPTS% -Dhttp.proxyHost=
set JAVA_OPTS=%JAVA_OPTS% -Dhttp.proxyPort=3128

…where “” is the IP address of your HTTP proxy and “3128” is the port number.
For example, on our HIP 3.04 UK server that section of the file now looks like this:

rem Standard options
set JAVA_OPTS=-server -Xms384m -Xmx512m -DISO_8859_1=UTF-8 %JAVA_OPTS%
set JAVA_OPTS=%JAVA_OPTS% -Dhttp.proxyHost=
set JAVA_OPTS=%JAVA_OPTS% -Dhttp.proxyPort=3128

Once you’ve added the settings for the proxy server, save the file and restart JBoss.
If all has gone well, then you should be able to fetch updates and add-ons using the HIP admin interface!
Or, if all has gone pear-shaped, then simply restore that safe backup of the file you made earlier!
It should be a similar process for Linux/Unix users — locate the script that starts JBoss, open it in Vi, and add the extra two lines to the script.

HIP Tip: changing the timeout

This is in response to an email Anne Barnard posted to the Horizon-L mailing list:

I have my global settings session timeout set to 5 minutes, and my search timeout set to 2 minutes. I’m starting to get complaints from remote users that they timeout to quickly. How long are other libraries making their settings? We’re a public library and people frequently walk away without logging out.

I didn’t see anyplace where this could be set for profiles rather than globally.

Assuming that your public OPACs have a specific range of IP addresses allocated to them (e.g. you’ve set them up on their own subnet), then it’s possible to tweak the expiretimer.xsl to only use the timeout for those machines:


…the bits you need to add are shown in red, and you’ll need to amend the IP address accordingly.
If you need to check for multiple IP addresses, then simply expand that if statement, e.g.:

if(ip.indexOf(" 10.2.8")>0 || ip.indexOf(" 10.2.9")>0)

…will only run the timeout for IP addresses starting with 10.2.8.* and 10.2.9.*
There’s probably quite a few ways of achieving the above, so please let me know if you’ve got a simpler method!
The usual notes apply:

  • this worked fine with HIP 3.04 UK, but may not work with any other release
  • make sure you back the file up before editing
  • try it on a test HIP installation first

Three Coins in the OPAC

Inspired by Lorcan Dempsey’s post about Coins in Open WorldCat, I’ve been messing around with adding Coins to our OPAC.
I still need to research the specification in further depth, but it’s been relatively easy to add a prototype to our OPAC. Here’s how it displays in Firefox using the Openly OpenURL Referrer extension:

I’ve configured the extension to link to our SFX server, so clicking on the SFX icon takes me through to our SFX menu:

Obviously there’s little point linking from our OPAC to our own OpenURL resolver — the idea is more that you can configure the exension to point to your preferred resolver.

More Ajax goodness

I’ve spent the afternoon Ajax-ing the “did you mean?” code on the OPAC, and also finishing off the serendipity suggestions.
The serendipity suggestions take longer to generate than before, as the the code now considers keyword phrases returned by answers.com, instead of just single keywords.  As an example, here’s what appears if I try searching for the film “Faraway, So Close” on our OPAC:

Obviously the suggestion of searching for “Close Faraday” is of little use.  However, most of the serendipity suggestions are relevant to the film, and at least two of them will lead me straight through to the catalogue page for “Der Himmel รผber Berlin” (the prequel to “Faraway, So Close”).
One rather cool outcome of this is that our OPAC can now sometimes answer questions!  Sadly the results don’t always lead to relevant items, but at least our OPAC knows the answer to the Ultimate Question!