I’ve been having a few email discussions relating to whether or not it’s best to use a standard dictionary of words for an OPAC spellchecker or an index created from the actual holdings of that library…
Standard dictionary
pros: correct spelling
cons: suggestion might not find any results, might not contain buzz/new words
Custom dictionary
pros: suggestions should find results
cons: will contain mis-spellings (e.g. “mangement”), needs regular updates, might be difficult to extract the words from ILS/LMS/OPAC
I’m beginning to think that the best of both worlds might be to start with a standard dictionary and then let your users/patrons build upon that. In other words, whenever someone carries out a successful keyword search on the OPAC, automatically add the keyword(s) they used to your dictionary so that they can appear as spelling suggestions in the future.
Any comments?
3 thoughts on “Spellchecker + Network Effect = Better Spellchecker?”
Comments are closed.
“I’m beginning to think that the best of both worlds might be to start with a standard dictionary and then let your users/patrons build upon that. In other words, whenever someone carries out a successful keyword search on the OPAC, automatically add the keyword(s) they used to your dictionary so that they can appear as spelling suggestions in the future.”
I think this could be an interesting way to go and may provide a way in to ‘healing’ a catalogue where typos in records, for example, get surfaced by typos in search query terms?
If they carried out a successful keyword search in your catalog, that means their keyword was present in some record in your catalog.
So why not just add all the words found in your catalog records to your dictionary? And skip the step of the search needing to be done first to ‘prime’ it. If it’s a word in your corpus, then it’s a valid search target, no?
Of course, the problem with this is that we all know that there are _mis spelled_ words in our corpuses too. Of course, if there are misspelled words in the catalog, then one DOES need to search under the misspelled word to find that particular record, so it’s still a valid query target. But there are obvious problems with this.
The only solution I can think of to this is to periodically spell check your entire catalog against an actual dictionary, and then (the important step we often dont’ do) actually fix the mis-spellings found!
Hi Jonathan
I should have given a bit of background to the initial discussions — some of the sites that want to use HIPpie (I’m thinking especially of Dynix sites) might not be able to extract a word list from their catalogue, so might have been stuck with having to use a standard dictionary.