HIPpie “Did you mean?” ready for testing (again!)

A thousand apologies to those of you who’ve been waiting for HIPpie to reach the testing phase — has it really been 6 months since I last posted anything?! HIPpie was/is a project that I’ll be doing in my spare time and, unfortunately, since Christmas, my spare time has been taken up with everything but working on HIPpie!
Anyway, having realised that it’s so long since I posted anything, I was shamed into making some time and I’m now at the stage where some brave HIP 3.x library can alpha test the spellchecker code. Ideally you want to be doing this on a test HIP 3.x server, unless you’re feeling particularly reckless.
The usual caveats apply — make sure you safely back up any files you edit and you promise not to hold me responsible if your server room mysteriously burns down shortly after you add the code. Also, altering your XSL stylesheets may have an impact on what support SirsiDynix will be able to give you.
To test the code, you’ll need to edit the searchinput.xsl stylesheet. Once you’ve found the file, make a safe backup before you make the changes! Open up the file and scroll down to around line 580 — you should see a <center> tag. After that tag, you need to insert (i.e. copy & paste) in the contents of this file:
hippie_spellcheck_v0.01.txt
Save the altered file and give your HIP server a minute to pick up the altered stylesheet. Now fire up a web browser and run a search for a misspelled word. If you get an error message, then double-check the changes you made to the stylesheet and, if all else fails, you can revert back to your backed up version. Touch wood, you should get a “did you mean” suggestion which looks like this:
hippie_v001
If you do test the code, please feed back!
Notes
This test version of the code is using a fairly small American dictionary of words, so you may not get appropriate suggestions for your locale.

19 thoughts on “HIPpie “Did you mean?” ready for testing (again!)”

  1. Just a few more notes…
    1) The bit you add to the stylesheet includes a JavaScript variable for the default HIP index you want to search (var hippie_index = ‘.GW’;). If your default/general index has a different code, then change the value accordingly (e.g. yours might be GW rather than .GW). Unfortunately, I couldn’t spot a way of determining the default/general index “on the fly”.
    2) In the event that the user searches for several correctly spelled terms that simply don’t match anything on your catalogue, the “did you mean” probably isn’t going to be too helpful to them as it’ll take a “second best” guess at the words. So, a search for “hypodermic syringe” might suggest “did you mean hypothermic springe?”. In those kinds of cases, do you think it would be better not to return anything? It’s partly for those scenarios that I added serendipity keywords to our OPAC.
    3) The next step will be to figure out the best way of optimising the spellchecker dictionary to your own library. I’m leaning towards letting your users to the hard work for you by logging every successful keyword search and then using those keywords to create and refine the dictionary. To do that, it would mean I’d be harvesting keyword searches from your users — would that create any privacy issues for your library? The only information I’d need to collect is the keywords used and the number of results returned. In return, I’d also be able to generate things like breakdown stats, the keyword cloud we have on the front page of our OPAC, and also the “Search for X combined with A, B, or C” that you can see on searches like this.

  2. Hi Dave,
    I just put this on our system and it seems to be working. I have now removed it as I wasn’t sure what implications there were elsewhere in terms of support – and I hadn’t read your further info at that point. I think this is really good and I am sure we would be interested in helping with development.

  3. Hi Tim
    Support in terms of support from me or impacting on support that might be offered from SD?
    If it’s the former, then I’m happy for you to leave it live if you want to. The only initial issue is that the demo dictionary is a US word list, so you might get a few Americanisms creeping in.
    The next stage of testing is to figure out the best way of customising the dictionary for each library who wants to use the code.

  4. I’ve just sent an email out to those of you who’d previous expressed an interest in HIPpie. If you’d like to be included in future emails, just let me know (d.c.pattern@hud.ac.uk).

  5. Thanks Talin!
    Apologies — the blog’s been up and down all day. I think I’ve traced it back to a problem with the MySQL database and hopefully it’s now been fixed.

  6. That’s cool Aaron! The word list I’m using for testing does have a few surnames and placenames, although I’ve noticed it’s missing some common words (e.g. warming).
    By the way, if anyone who’s testing would like to send me an alternative word list, just get in touch (d.c.pattern@hud.ac.uk). The list just needs to be a text file with each word either on a new line or separated by whitespace.

  7. Hello Dave:
    This is awesome! Just put it on our test server now and we love it. Thanks for the woderful work. Please include me in your emails about HIPpie in the future.
    Have one question though. Maybe it’s not going to be easy to do. Right now the “did you mean” search will default back to the default general keyword search(.GW). Is it possible at all for your script to detect whatever search that was used initially (title keyword, author keyword…etc.) and just have the “did you mean” search use that? Just curious if this is possible.
    BTW I checked with my boss and it’s fine for you to harvest the keywords used in our HIP. We’d like to help in any way we can!
    Jie@Loudoun County Public Library, VA

  8. It’s definitely possible. Using the general keyword gives you the best chance of the spelling suggestion matching something, but if you’d prefer to keep the same index in use, then replace that “.GW” line with:
    var hippie_index = "<xsl:value-of select="/searchresponse/yoursearch/searchdata/search/shortcut"/>";
    (if your browser has broken that up, it needs to be added as a single line)

  9. Hello Dave I tried the suggestion and it works perfectly! I am also running on spellcheck 0.02 now. Here it is on our test HIP site. http://10.3.9.24/ipac20/ipac.jsp?&profile=train#focus
    I am concerned that our web server will not be able to host our own dictionary. The county has our webserver run on dot net so sadly it cannot run PHP or PERL. It looks like we have to depend on the script hosted by you.
    Thanks again,
    Jie

  10. Hi Jie
    No problem at all.
    I’ll put together the Perl and PHP scripts, which should be fairly simple as all they do is dump the keywords to a text file, and it could be someone can come with an ASP version quite easily.
    Failing that, and if you’re happy with the privacy issues, we can capture the keywords to create the dictionary at Huddersfield.
    If neither of those options are viable, then we can look at other ways of generating a word list from your ILS system.
    Once all the code is a little more complete and we move out of the testing phase, I’ll make all the code available under an Open Source license (so someone might be able to host a version of HIPpie in the US).

  11. Hi Dave,
    Belated resonse to your “support” question (Comment 4). I was thinking more in terms of the support I give to colleagues who give support to users seeing this for the first time. I thought it would only polite to warn them that something was on its way! Not very “permananet Beta” of me but makes for a quieter life 😉
    Tim

  12. Hi Dave,
    I just added this to both of our HIP servers. We are a consortium of 49 public libraries and 31 school libraries and this spell suggestion is awesome. Our members have been asking for something like this since we migrated back in 2004.
    Will this work with the Kid’s HIP? If so would it be added to the kidsipac_searchinput.xsl around the same lines or will it need to go elsewhere?
    Thanks,
    Melissa

  13. Hi Melissa
    I’m not familiar with the Kid’s HIP, but try giving it a go anyway. The line numbers might not be the same, so you may need to experiment a little.
    Dave

  14. Hi Dave,
    I was able to get it to work in the Kids HIP. In the kidsipac_searchinput.xsl your code needs to be added around line 942.
    Feel free to capture our keywords to build the dictionary. If you need anything from us to gather the keywords, please let me know.
    Thanks,
    Melissa

  15. Hi Dave,
    I have just found time to put the script on to our (live)catalogue and it seems to be working.
    I was amused to see that if you look for the word “libary” it suggests “libra”

  16. Howdy Dave,
    I just added this to our catalog and it’s working! I haven’t done much in the way of customizing our HIP, but this was so easy! Thank you so much!
    Beth

Comments are closed.