Many thanks to those of you who’ve tested the code from yesterday! Those of you outside of the UK might want to see if this version works slightly faster for you:
hippie_spellcheck_v0.02.txt
The next thing I’ll be looking at is how to optimise the spellchecker dictionary for each library. Some of you will already have read this in the email I sent out this morning or in the comment I left previously, but I’m thinking of attacking it this way:
1) Start off with a standard word list (e.g. the 1000 most commonly used English words) to create the spellcheck dictionary for your library, as the vast majority should match something on your catalogue.
2) Add some extra code to your HIP so that all successful keyword searches get logged. Those keywords can then be added to your dictionary.
It could even be that starting with an empty dictionary might prove to be more effective (i.e. don’t bother with step 1) — just let the “network effect” of your users searching your OPAC generate the dictionary from scratch (how “2.0” is that?!)
To avoid any privacy issues, the code for capturing the successful keywords could be hosted locally on your own web server (I should be able to knock up suitable Perl and PHP scripts for you to use). Then, periodically, you’d upload your keyword list to HIPpie so that it can add the words to your spellchecker dictionary.
What about if you don’t have SirsiDynix HIP? Well, as mentioned previously, the spellchecker has been implemented as a web service (more info here), and the HIP spellchecker makes use of that web service to get a suggestion. At the moment it only returns text or XML, but I’m planning to add JSON as an option soon. Also, if you have a look at the HIP stylesheet changes, you can see the general flow of the code:
1) insert a div with an id of “hippie_spellchecker” into the HTML
2) make a call to “https://library.hud.ac.uk/hippie_perl/spellchecker2.pl” with your library ID (currently “demo”) and the search term(s) as the parameters
3) the call to “spellchecker2.pl” returns JavaScript to update the div from step 1
4) clicking on the spelling suggestion triggers the “hippie_search” JavaScript function which is responsible for creating a search URL suitable for the OPAC (which might include things like a session ID or an index to search)
None of the above 4 steps are specifically tied to the SirsiDynix HIP and should be transferable to other OPACs. I’ve put together a small sample HTML page that does nothing apart from pull in a suggestion using those 4 steps:
example001.html
If you do want to have a go with your own OPAC, please let me know — at some point I’ll need people to register their libraries so that each can have their own dictionary, and I might start limiting the number of requests that any single IP address can make using the “demo” account. Also, it would be good to build up a collection of working implementations for different OPACs.
11 thoughts on “HIPpie — how to build a dictionary”
Comments are closed.
Dave, just a note to let you know we set this up this afternoon and it seems to be working very well. Thanks!
http://staff.lanepl.org/?q=node/515
Is there any library that uses your scripts with PICA catalogues? I wonder if and how it works.
Hi Chip — that’s really cool! As soon as the scripts are ready for building custom dictionaries, I’ll let you know.
I’m not aware of any PICA catalogues using the script yet, but please feel free to try and figure out a way of making it work 🙂
Dave,
This is very, very nice! I’ve added it to our development box and, with Admin’s blessing, plan to have it available to our patrons. You are a gem.
Just FYI, got an email about this today:
http://www.jaunter.com/
(We’d looked at them ages ago.)
Thanks Chip — I’ve never heard of Jaunter before! It looks like they’re storing details of all successful searches.
dave, i’ve got this running on our test server calling your perl webservice, and it looks fine, but i was interested in implementing it with our particular set of keywords … any update on this post in terms of local dictionary use, or local implementation for those of us with in-house resources? thanks! — lare mischo
Hi Dave,
The link to hippie_spellcheck_v0.02.txt is broken – is this code still available?
Thanks
John
Hi John
Thanks for the “heads up”! Just moved the site to a new server, so it looks like I forgot to copy across the “hippie” stuff. I’ll see if I can find it again.
Dave –
I was so pleased to find your HIPpie spellcheck after a frustrating year working with Jaunter trying to implement their spellcheck piece. I hope you are able to get this back up & running. I would also be interested in more information about putting in a local dictionary for this. I was not successful in following the blog posts on this from last year.
Thanks.
Judy
Dave,
We are looking at implementing your spell check in our Horizon HIP (roundrocklibrary.org). Will I be able to get what I need of the blog posts? I haven’t read through all of them yet.