Tweet Clouds

I have a confession to make — I grew bored of Twitter after a couple of days.
However, I felt obliged to keep on Twittering something… anything… so I hooked our OPAC into the feed instead. Every 5 minutes, a bit of code checks to see what the most popular keyword(s) used on our OPAC has been recently and, if it’s different to the last run, it fires it off to Twitter. I was so lazy, I didn’t even bother filtering out stopwords.
The result is an eclectic mix of words that encapsulate our student’s usage of the library catalogue — little snapshots of what was important to a bunch of students (or perhaps one particular determined student). Topics meander semi-randomly, occasionally repeating at unusual intervals.
Sometimes, there’s not a single popular keyword, but several. Sometimes the multiple words make sense, other times they create weird phrases…

  • british genetics music
  • angina attachment theatre
  • education picasso sex
  • rape skills study

Anyway, a few days ago I spotted Tweet Clouds and decided to see what it made of my feed…
tweetcloud
http://www.tweetclouds.com/user_pages/daveyp.html
…and here’s a cloud I made back in December 2006
opacsearches
I must admit, I feel kinda guilty that I ate up 23 minutes of CPU time on the Tweet Cloud site :-S

“Spin, spin, spin the Wheel of Justice…”

Kudos if you automatically sang to yourself “…see how fast the bastard turns” 😉
If you’ve no idea what I’m on about, then YouTube is your friend.
Anyway, I got to playing around with the OPAC keyword cloud data and ImageMagick and came up with this (reload that web page to get a new image)…
wheel4 wheel3 wheel5 wheel10 wheel8 wheel11 wheel13 wheel12
I was struggling to remember how to find the points on the circumference of a circle until I remembered that one of the chapters in the original ZX Spectrum manual covered the topic.
The word in the middle is chosen at random from the top 200 most popular keywords used on our OPAC and the surrounding words at those most commonly used with that word.

HIPpie update (20/Nov/2007)

Just a quick update — I’ve not had too much spare time to work on HIPpie since announcing it, partly due to work and conference commitments, but I have been slowly beavering away.
The first chunk is some of the back-end code for the “did you mean” spell checker. To try and make the code as re-usable as possible (especially for other OPACs), the back-end has been coded so that it can be used as a standalone web service:
library.hud.ac.uk/wikis/hippie/index.php/Spell_checker
Various options can also be specified to affect the output, e.g. for “newmonia thrombrosis”:

The grand plan is that anyone who wants to make use of it (either as a web service or the code that will embed into HIP) will have an account. By logging into the account, they’ll be able to specify a dictionary to use (e.g. standard US English) or they’ll be able to upload a their own word list (e.g. generated from the indexes in the ILS).
It’s still early days, but if anyone has any comments or suggestions, please get in touch!

OPAC keyword cloud

This is crying out to be done like the visual word map in AquaBrowser, but here’s a browseable tag cloud based on data from nearly 2 million keyword searches on our OPAC.
shakespeare performance
The code looks for other keywords that were entered as part of the same search (e.g. “ethics of nursing care”) to draw out the most commonly used words. For example, the most common keyword used with “performance” is “management”. The size of the word in the cloud is determined by how often it appears with the search keyword.
nursing
I’ve not removed keywords that generated zero search results, so the cloud for “acrobat” includes “abode”. (I’ve now removed zero result searches)
I’ll have to have a play to see if there’s a way of incorporating the cloud into the OPAC — for example, if you used a vague/general keyword such as “health“, then maybe the OPAC could suggest more specific searches for “health care”, “mental health” or “health promotion”?

Go Danbury, Go!

Congratulations to Danbury Library in Connecticut for being the first to add LibraryThing for Libraries to their live OPAC!
For anyone wondering if it works with the Dynix/Horizon HIP OPAC, let me tell you that it works a treat 🙂
In completely unrelated news, one of my work colleagues visited Grimsby today. A quick look at Wikipedia and I was able to amaze her with the fact that Grimsby produces more pizzas than anywhere else. Not only that, it looks like Grimsby has been making pizzas since the Dark Ages:

Hmmmmmmm… Medieval Margarita with 6 slices of Mozzarella!

More Solr fun

Darn, I should have known I was following in a great man’s footsteps…
http://www.code4lib.org/2007/durfee
Anyway, a couple more hours of coding has resulted in this…
http://161.112.232.18/modperl/facet6.pl?q=medicine
solr5
Hopefully NCSU won’t be setting their lawyers on me (copying is the most sincere form of flattery!), but the prototype has certainly borrowed one or two ideas from their wonderful OPAC.
It’s still a way off being a full OPAC replacement and I need to shrink the book covers down to a more sensible size, but I’m quite chuffed with what I’ve been able to achieve in just a few hours of coding.

Solr + 2 hours = faceted OPAC

I’ve been meaning to have a play around with Solr, which is…

an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, and a web administration interface

It’s mostly the “faceted” part I’m interested in and, after a couple of hours of messing around, I’ve got a basic OPAC search interface up and running with around 10,000 records pulled in from our catalogue.
facet01
facet02
facet03
It looks like Solr automatically handles word stems, as searches for “score”, “scores”, and “scoring” find the same results. The results are also relevancy ranked, although I need to find a way to fine tune the default ranking algorithm.
All in all, I’m very impressed with what Solr can do and how quickly it handles searches.

SirsiDynix to build Rome OPAC on Evergreen

Finally some proof that the new management at SirsiDynix are listening to their customers! I really shouldn’t post this until SirsiDynix make the official announcement on Thursday, but I just have to spill the beans because I’m so excited about the news…
Since the announcement of Rome, many Dynix and Horizon sites have been discussing a move to open source systems (such as Koha and Evergreen) and it looks like the top brass at SirsiDynix have realised that “if you can’t beat them, join them” — on Thursday they’ll be announcing a partnership with the people at Georgia Public Library Service who develop the Evergreen system.
How did I find out about this? Well, a couple of years ago I was given access to the Dynix development website and I regularly check it to see what the company has in the software pipeline. Imagine my surprise when I spotted a link titled “Evergreen Partnership OPAC” this morning — what could I do but click to see what is was!
evergreen
I honestly thought that SD staff might have put it on there as some kind of joke, but a quick phone call to the press office at Huntsville confirmed the news and also that the formal announcement would come before the end of the week. They did ask me to swear that I wouldn’t leak the news, but I had my fingers crossed at the time so it doesn’t count!!! 😀
This is really great news as the Evergreen OPAC has a host of features not currently available in most ILS vendor OPAC products (including facets and lots of cool AJAX stuff).

Revish and reviews

Just spotted that Revish are gearing up for launching at the end of March 🙂
On the blog, Dan mentions that the site will provide APIs for getting at the data and I can’t wait to see if we can do anything with that data in our OPAC.
I quietly flicked on the ability to add reviews and comment to our OPAC last week, but we’ve yet to have our first student generated comment. This has slightly surprised me (i.e. made my right eyebrow rise by about 3mm) as we’ve already had several hundred book ratings added in the last few weeks. However, adding a rating doesn’t require you to login but adding a comment does (at the request of our Librarians).
If we don’t get any bites soon, I’ll probably tweak the code to allow anonymous comments. These will need to be fully moderated, as it seems these days that any HTML <form> on a public web page will attract spam 🙁
However, it does raise some interesting questions:

  • Is having to login to post comments too much of a barrier?
  • Are public library users (e.g. those at AADL) more likely to post comments/reviews than students at an academic library?
  • What motivates someone to write a review/comment?
  • Have I finally managed to code an OPAC tweak that no-one will use?
  • Did I leave the iron on?

OPAC search cloud and failed searches

Seeing as I’ve got my head in the clouds at the moment, here’s one showing the most popular keyword search words used on our OPAC during the last 6 months…
opacsearches
www.daveyp.com/files/stuff/opacsearches.html
To be honest, there aren’t too many surprises in there — students studying business & law and the health sciences are the heaviest users of the library.
Unlike Yahoo, not a single person has done a search for “Britney” on our OPAC in the last 6 months …and “yes”, you would get a relevant hit if you did 😉
I’ve also separated out words that appear in failed keyword searches (i.e. they produced no hits) and removed those which did appear in other successful searches — this gives a list of keywords that probably don’t match anything on the catalogue:

  1. newspapermen (96)
  2. socail (90)
  3. buisness (84)
  4. brantingham (74)
  5. renew (74)
  6. metalib (73)
  7. reserach (72)
  8. mortor (67)
  9. vehclos (66)
  10. gieber (63)
  11. thoery (63)
  12. writting (62)
  13. psycology (59)
  14. contempory (58)
  15. donky (51)
  16. facism (47)
  17. reserch (46)
  18. reasearch (39)
  19. ans (38)
  20. hypodermic (38)
  21. ielts (38)
  22. televison (38)
  23. estimation (37)
  24. priciples (36)
  25. superficial (36)
  26. immanual (35)
  27. infomation (34)
  28. ligament (34)
  29. tuberclosis (34)
  30. centuary (33)
  31. resourse (33)
  32. topshop (33)
  33. treetment (33)
  34. devlopment (32)
  35. petherick (32)
  36. proffesional (32)
  37. quantitive (32)
  38. stamps (32)
  39. theorys (32)
  40. enviromental (31)
  41. pschology (31)
  42. statistic (31)
  43. syringe (31)
  44. hanbook (30)
  45. simnet (30)
  46. stratergy (30)
  47. intoduction (29)
  48. pestel (29)
  49. physio (29)
  50. pratice (29)

The words in bold are valid spellings (according to Microsoft Word) and the figure in brackets is the number of separate searches that contained the word.
Compared to the cloud, this is much more interesting…
1) many of them are simple typos — another good reason to add a spellchecker to your OPAC if you haven’t got one!
2) the fifth most common word is “renew” — are our users trying to renew their books by typing the word into the OPAC, or are they expecting the OPAC to work like a search engine and return something like “How to renew your books” as the first result?
3) the sixth most common word is “metalib” — it looks like a lot of people are trying to find help on using MetaLib in the OPAC… maybe we should create a dummy catalogue record that contains 856 links to MetaLib and our Electronic Resources Wiki?
4) “mortor” is an oddity in the list… but the entry for “pestel” near the end makes me wonder if people were searching for “mortar and pestle”?
Outside of the top 50, there are some other interesting failed keywords (with links to Wikipedia or other sites when relevant):