Apologies to everyone who was interested in this for the delay in getting it posted!
The Horizon database stores several fields (e.g. title, author, etc) in the format “processed” / “reconst”, where the “processed” column contains the text stripped of punctuation and the indefinite/definite article, and “reconst” contains the stripped characters.
For example, the title “The great Aussie fashion : Australian fashion designers 1984-1985 /” is stored in Horizon as:
great Aussie fashion Australian fashion designers 1984 1985
The “processed” version of the title is much more suitable for sorting than the original title.
Browsing through the Horizon mailing list archive, I came across a set of instructions for interpreting the “reconst” value. As we generate a lot of custom HTML reports at Huddersfield, I decided to have a stab at coding the instructions in Perl:
To use it in your own Perl script, just paste the subroutine in and call it with the “processed” and “reconst” strings, e.g.:
reconstructTitle( $processed, $reconst );
I’ve only used the code for reconstructing titles so far, but it might also work with author names, call number, etc.
The code is definitely “beta” and I’m not sure if it handles every “reconst” command yet, but feel free to make use of it. If you can improve it, please do!