Taggytastic! (part 2) – Self Plagiarism is Style

Wow! Fame and glory – hopefully the untold riches will be just around the corner! 😉
For anyone who wants to have a go with their Horizon/HIP, I’ve uploaded the script to here:
http://www.daveyp.com/files/stuff/tags/
I’ve done a little bit of tweaking, and the final keyword list now looks like this.
You’ll need to download the Perl script and the sample config.txt file.
As with many of the other scripts I’ve uploaded, you’ll need a working ODBC connection to your Horizon database – if you’re running ReportSmith or EasyAsk, then you’ll know all about that. You’ll also need to have Perl installed, along with the DBD::ODBC module from CPAN.
The config.txt file has three columns:

the first column defines the range for each keyword count, and this works with the $threshold variable to select the font size & colour for each keyword in the HTML output
the second column defines the font size – you should be able to use any valid CSS value (e.g. 50%, 10px, or x-small, etc)
the final column defines the font colour – in the example file I’ve gone for a blue gradient (#006 thu #77D), but if you prefer a single colour then just change all the entries to that (e.g. #00F) – again, you should be able to use any valid CSS value (#123456, red, etc)

To run the script, just put it in the same directory as the config.txt file and run it (e.g. perl getsubjects.txt). The HTML output file should get created in the same directory.
There’s a few variables that you can tweak:

$minimumBibs – this is used in the intial SQL query on the subject table, so a lower value means more subjects will be included for processing, but the query might take longer to run and/or hit your Horizon server harder
$threshold – once all the subjects have processed, any whose total number of matching bibs fall below the threshold value will be exlcuded from the output – if you’d prefer a smaller list of keywords in the HTML output, then choose a higher value and vice versa
$spacing – this is a string of characters to insert between each keyword
$hipUrl – unless you really want to link to our HIP, then you’ll need to tweak this URL

Have fun with the script!
Jenny emailed me to ask if the script could work with other systems (e.g. Innovative), so I’m going to have a go writing a smaller version of the script that will take a list of keywords and counts, such as the example below, and then create the same output:
1237 American poetry
381 Java
857 World Wide Web
…so, as long as you can query your system to get something in the above format, then it should work.
[one quick sandwich and Coke later]
…and here is a more general version of the script that should work with other systems:
http://www.daveyp.com/files/stuff/tags2/
As well as the Perl script, you’ll need to download the config.txt file. Also, you’ll need to create your own subjects.txt file – I’ve included a sample one so you can get a rough idea of the layout.
As before, you can do a bit of tweaking with the variables and the config.txt file to customise the final HTML output.
Horizon users who don’t want to faff around with getting the first script to use ODBC can generate their own subjects.txt by copying the output of running the following SQL statement in SQL Advantage (or similar):
select n_bibs,processed from subject where n_bibs > 50
…however, you won’t get the advantage of the way the first script collapses sub-subjects together.

8 thoughts on “Taggytastic! (part 2)”

What would be really sweet is if you clicked on one of the collapsed tags, it would replace the cloud with a new cloud of just the sub-subjects. So you click on “Great Britain” and the cloud gets replaced with a new one made up of of “18th Century”, “1936-1945”, and so on.

Nice idea 🙂
Even though the subjects are essentially a flat file list, it should be possible to use the punctuation (e.g. “a — b (c)”) to build a tree heirarchy that’s 2 or 3 layers deep for browsing.

Just a quick update – I’ve had a stab at doing the heirarchy browsing, but knocking the subject headings into shape (using an automated script, rather than tidying them up manually) is taking a little longer than I originally thought it might.
As they say, “keep watching this space”!

Inspired by your subject cloud (and using some of your code) I created an LC Class cloud of items currently checked out of our library. Unlike your subject cloud, I can’t do a live connection to our ILS (Innovative), so it’s snapshot data. But it still gives us an interesting way of looking at how our collection is being used.
My circ cloud is at:
http://faculty.washington.edu/murata/circdata/
Next step is to do subject clouds for the same circ data.
Thanks for the inspiration and bits of code.

Hi Corey
Using the circ data is a great idea! I’m still trying to think of new/better ways that we can make use of our circ and circ history data at Huddersfield.
I think that (for most purposes) a snapshot is just as relevant as a “real time” cloud. For most libraries, generating their clouds automatically whilst their ILS is idle (e.g. during the early hours of the morning) would ensure that daytime performance of the system wouldn’t be affected.

I just wanted to give you an update.
I’ve got my subject clouds working and I have it set to go to a second level subject heading cloud if there are more than 50 items for the heading. I’m not sure it’s the most efficient code, but it works.
http://faculty.washington.edu/murata/circdata/

That’s cool 🙂
I’ve been trying to knock our subject keywords into shape (e.g. we have 5 separate entries for Great Britain), although I’m not quite there with a second level cloud yet…
https://library.hud.ac.uk/subjects/
https://library.hud.ac.uk/subjects/out2/index_d.html

Dave-
I think this is a terrific idea, especially the real time aspect. I agree with Casey about that collapsable tags would be a great feature. Even more so, I think that this application has the potential of being a very popular feature by patron if it were applied to a “Hot Titles” list based on what was popular. The most frequently checked out titles is the last year(or 6 months, whatever) Coming from a public library, I favor the latter application. Both applications could prove useful to collection development librarians.
Tricia Brauer

Comments are closed.