OPAC keyword cloud

This is crying out to be done like the visual word map in AquaBrowser, but here’s a browseable tag cloud based on data from nearly 2 million keyword searches on our OPAC.

The code looks for other keywords that were entered as part of the same search (e.g. “ethics of nursing care”) to draw out the most commonly used words. For example, the most common keyword used with “performance” is “management”. The size of the word in the cloud is determined by how often it appears with the search keyword.

~~I’ve not removed keywords that generated zero search results, so the cloud for “acrobat” includes “abode”.~~ (I’ve now removed zero result searches)
I’ll have to have a play to see if there’s a way of incorporating the cloud into the OPAC — for example, if you used a vague/general keyword such as “health“, then maybe the OPAC could suggest more specific searches for “health care”, “mental health” or “health promotion”?

6 thoughts on “OPAC keyword cloud”

Josh Greenberg says:

18 November 2007 at 7:25 am
Would you be willing to share the code? I’d *love* to feed this a record of our patron searches (both catalog and website), and build off of existing code rather than start from scratch.
Dave Pattern says:

18 November 2007 at 10:35 am
Hi Josh
It’s just a crappy prototype at the moment, but I’ll tidy up the code today and upload it to the blog.
Dave Pattern says:

18 November 2007 at 1:15 pm
Here’s a cleaned up version of the code:
http://www.daveyp.com/files/stuff/keywordcloud/
The Perl script is cloud.pl and it uses a list of stop words (i.e. words to ignore) from stopwords.txt.
The list of keyword searches needs to have each search on a separate line. I’ve included a short sample file (newcache.txt) to give you the idea. You can speed up the code by removing any single keyword searches from the file (i.e. any entries where just a single word appears on a line by itself), as you’re only interested in searches where multiple keywords were used.
The main chunk of ugliness in the original code was for working out the font sizes, so I’ve removed that and replaced it with the HTML::TagCloud module (which you’ll need to install).
You can see the new code in action here.
Dave Pattern says:

18 November 2007 at 4:11 pm
I’ve added the suggestions to our OPAC — they only appear if you’ve done a single keyword search…
Pingback: "Spin, spin, spin the Wheel of Justice…" » "Self-plagiarism is style"
Pingback: 2008 — The Year of Making Your Data Work Harder » "Self-plagiarism is style"

Comments are closed.