November 2005 – Self Plagiarism is Style

Quotes of the Month

Jenny Levine has linked to an excellent article by Roy Tennant on the Library Journal web site:

I love Roy’s statement that:

I wish I had known that the solution for needing to teach our users how to search our catalog was to create a system that didn’t need to be taught — and that we would spend years asking vendors for systems that solved our problems but did little to serve our users.

A few minutes later, I stumbled across Jennifer Matthews‘ blog – she’s a student of English and Comparative Literary Studies at the University of Warwick:

So I figure that the library is evil. And it hates me.
(The Library)

Do your 856 URLs show up in a big font size that doesn’t seem to quite fit in with the rest of the text on the full bib page?
The quickest way to fix it is to fire up the Horizon table editor, select marc_map, and then locate the marc_map that you use for your 856 URLs.
In the “HTML format (Info Portal only)” field, insert class="smallAnchor" before the href. For example, if your HTML format looks like this:

<a href="$_">{<img src="$9">|$y|$_}</a>

…then change it to:

<a class="smallAnchor" href="$_">{<img src="$9">|$y|$_}</a>

Save the change, and then restart JBoss and the 856 links should pick up the formatting of the “smallAnchor” element from your HIP cascading style sheets (CSS).
And, for the more adventurous – if you’d like to know which 856 links your users are clicking on, then you can set your marc_map up to redirect to a CGI script that logs the URL and then redirects the user’s web browser to the true 856 link.
Once you’ve got your CGI script ready (in this case, I’ve called it logit.pl), you just need to change the 856 marc_map to link to the script – e.g.

<a href="http://foo.com/cgi/logit.pl?$_">{<img src="$9">|$y|$_}</a>

Once you’ve saved that and restarted JBoss, your 856 URLs look like this in HIP:

http://foo.com/cgi/logit.pl?http://www.ebooks.com/12345

Your CGI script just needs to take the contents of the QUERY_STRING environment variable (in the above, it’s http://www.ebooks.com/12345), append it to your log, and then issue a redirect to that URL.
(disclaimer: all of the above was done with Horizon 7.32 UK and HIP 3.04 – your mileage may vary depending on which versions you’ve got!)

CODI 2005 – session links

I’ve put together a page listing each of the CODI 2005 sessions along with (hopefully!) all the PowerPoint, handout, podcast, blog, etc links.

http://www.daveyp.com/files/stuff/codi2005links.html

Please feel free to re-use the link or to circulate it.
If you have any additions or corrections, please email them to me:

d.c.pattern [at] hud.ac.uk

using “circ_tran” to show borrowing suggestions in HIP

One of the things we’re trying to do this year at Huddersfield is to make better use of our data archives:
…as each student goes through a library turnstile, data is written away…
…as each student borrows a book, more data is quietly written away…
…as each student uses an electronic resource, data is written away…
…as each student logs onto a PC, yet another piece of data is…
…okay, enough already – you get the idea!
We’re not particularly interested in what an individual student has done, but we’d like to see the broader pictures. For example, we open the Library 24/7 at certain times of the year (e.g. Easter) – we’d like to know more about the kinds of students who come in late at night and leave early the next morning:

are certain ethnic groups more likely to use the Library outside of the standard opening hours?
do we get more male or female students using the Library in the wee small hours?
are students coming in to use the computers, to issue/return items, or to sit quietly in a corner and study?

The answers to those kinds of questions tend to be found in several databases. The Sentry database tells us when someone entered the Library, but it doesn’t tell us if they are male or female, Asian or Caucasian – that kind is information is stored in the Student Records System. Also, the Sentry database doesn’t tell us what the student actually did – Circ transactions are in Horizon and PC usage info is stored in other databases.
So, long term we’re looking at ways of trying to combine data from all of those sources into meaningful and enlightening stats.
“What has this got to do with showing borrowing suggestions in HIP?”, I hear you ask!
Well, once I’d had a hunt around in our circ_tran table in Horizon, it seemed like a great use of all that historical Circ data would be to do an Amazon-like “patrons who borrowed this book also borrowed…”.
Before I proceed with the “how to”, I’ve got a hunch that not everyone has got a circ_tran table – it might be something that SirsiDynix needs to set for you, rather than a default table that ships with Horizon (can anyone confirm this?)
The circ_tran table contains (amongst other things) two very useful bits of information – the borrower# and the item# of the item they borrowed. You can use the item# to look up the bib# of that item (using the item table).
Once you’ve got the borrower# and bib#, you can use that to create two lists of data:

a list of all the bib#s that a specific borrower has ever borrowed
a list of all the borrower#s who have borrowed a specific bib#

To build the list of borrowing suggestions, you start with a bib# and:

1) build the list of all the borrower#s who have borrowed that bib#
2) for each of those borrower#s, compile all the bib#s of all the items they’ve borrowed to a single big list of bib#s
3) take that big list and count how many times each bib# appears in the list
4) sort your list of individual bib#s by the count of how many times they appear in the big list

…those bib#s that appear the most times in the big list are therefore the most appropriate ones to suggest.
Unfortunately those 4 steps can take some serious CPU time, so it’s not possible to do it on the fly as each of your patrons brings up a full bib page in HIP. Therefore, you need to pre-process each of your bib#s to generate a list of other suggested bib#s.
I wrote a Perl script this evening (which I’ll make available soon) that slurps up the entire circ_tran table into your PCs memory and then processes each of the bib#s to create up to 10 other suggested bib#s. Each of those suggestions is then pumped into a MySQL database where it will sit until a patron views that bib#s page in HIP.
A single line of JavaScript added to the fullnonmarcbib.xsl stylesheet then pulls in dynamic content from a Perl CGI script. That CGI script simply fetches the list of suggested bib#s from the MySQL database, quickly runs them via the title table in Horizon, and then displays a random selection of them underneath the copy/holding info:

The only real drawback is that it’s not working with your circ_tran data in real time – the list of 10 possible suggestions per bib# won’t change until I run the slurping Perl script again to rebuild all of the suggestions. On our database of 2,046,180 circ_tran entries, that took about 3 hours to process. So, in theory, you could schedule it to run once a week or once a month.

Taggytastic! (part 2)

Wow! Fame and glory – hopefully the untold riches will be just around the corner! 😉
For anyone who wants to have a go with their Horizon/HIP, I’ve uploaded the script to here:
http://www.daveyp.com/files/stuff/tags/
I’ve done a little bit of tweaking, and the final keyword list now looks like this.
You’ll need to download the Perl script and the sample config.txt file.
As with many of the other scripts I’ve uploaded, you’ll need a working ODBC connection to your Horizon database – if you’re running ReportSmith or EasyAsk, then you’ll know all about that. You’ll also need to have Perl installed, along with the DBD::ODBC module from CPAN.
The config.txt file has three columns:

the first column defines the range for each keyword count, and this works with the $threshold variable to select the font size & colour for each keyword in the HTML output
the second column defines the font size – you should be able to use any valid CSS value (e.g. 50%, 10px, or x-small, etc)
the final column defines the font colour – in the example file I’ve gone for a blue gradient (#006 thu #77D), but if you prefer a single colour then just change all the entries to that (e.g. #00F) – again, you should be able to use any valid CSS value (#123456, red, etc)

To run the script, just put it in the same directory as the config.txt file and run it (e.g. perl getsubjects.txt). The HTML output file should get created in the same directory.
There’s a few variables that you can tweak:

$minimumBibs – this is used in the intial SQL query on the subject table, so a lower value means more subjects will be included for processing, but the query might take longer to run and/or hit your Horizon server harder
$threshold – once all the subjects have processed, any whose total number of matching bibs fall below the threshold value will be exlcuded from the output – if you’d prefer a smaller list of keywords in the HTML output, then choose a higher value and vice versa
$spacing – this is a string of characters to insert between each keyword
$hipUrl – unless you really want to link to our HIP, then you’ll need to tweak this URL

Have fun with the script!
Jenny emailed me to ask if the script could work with other systems (e.g. Innovative), so I’m going to have a go writing a smaller version of the script that will take a list of keywords and counts, such as the example below, and then create the same output:
1237 American poetry
381 Java
857 World Wide Web
…so, as long as you can query your system to get something in the above format, then it should work.
[one quick sandwich and Coke later]
…and here is a more general version of the script that should work with other systems:
http://www.daveyp.com/files/stuff/tags2/
As well as the Perl script, you’ll need to download the config.txt file. Also, you’ll need to create your own subjects.txt file – I’ve included a sample one so you can get a rough idea of the layout.
As before, you can do a bit of tweaking with the variables and the config.txt file to customise the final HTML output.
Horizon users who don’t want to faff around with getting the first script to use ODBC can generate their own subjects.txt by copying the output of running the following SQL statement in SQL Advantage (or similar):
select n_bibs,processed from subject where n_bibs > 50
…however, you won’t get the advantage of the way the first script collapses sub-subjects together.

Taggytastic!

Inspired by Jenny Levine‘s mock up of an OPAC with keyword tags, I’ve gone a step further and used our Horizon database (the “subject” table in particular) to generate a real page based on subject keywords with more than 10 bibs:
http://www.daveyp.com/files/stuff/subjects.html
I did a bit of tweaking so that sub-subjects (is that a real word?) are collapsed into the parent subject – if you hover your mouse pointer over one of the links, then you should get a better idea of what I mean. Once I’d got the totals for each parent subject, I excluded anything with less than 100 bibs.

CODI 2005 – Homeward Bound!

Sadly the two “spare days” after the end of CODI 2005 have flown by and tomorrow morning we’re setting off back to the UK. By the way, just in case anyone wants to know what “the sun going down on CODI 2005” looked like, here it is/was:

We spent most of yesterday in St Paul (the sibling to Minneapolis in the title “The Twin Cities”). As some of the Horizon mailing list regulars will know, we were keen to visit the Catbus exhibit at the Minnesota Children’s Museum – and just to prove we did, here’s Bryony in the front seat:

…and here’s Totoro himself:

…and there’s more pictures here!
For lunch, we walked about a mile out of St Paul to Red’s Pizza Savoy. After the cosmopolitan Minneapolis, Red’s felt like a taste of true Americana.
We even made it back in time to go and give the Loring Park squirrel’s their tea:

CODI 2005 – Day Three (pm)

Planning for Hardware: It Doesn’t Have to be Hard (Tim Hyde – tim.hyde@sirsidynix.com)
Tim’s presentation covered a lot of the same ground that Jolynn’s Planning for 8.0 and 4.2 did. In fact Tim’s session was really a summary of what many of us had seen throughout the 3 days. As one of the final CODI sessions it was ideal – we didn’t want any new shocks or dropping of bombshells 🙂
Tim started off by summarising the Horizon 8, and listed the main new features as:

state-of-the-art uPortal
record ownership
agency modelling
support for native open SQL databases (Oracle, DB2, MS SQL)
full Unicode support
total Java/J2EE solution
e-commerce
UniMARC, MARC21, MARCXML…
LDAP
Kerberos encryption
Shibboleth
thin client (can run on Windows, Mac, and Linux)

Tim also shed some more light on the lack of Sybase in that list of databases: apparently Sybase isn’t 100% Unicode compliant so, until Sybase resolve that, SirsiDynix won’t certify it for use with Horizon 8.0.
For those of you who are thinking about running HIP 4.0 or the Horizon 8.0 application server under Windows 2003, you need to be aware that Microsoft currently limits the Java Virtual Machine to using a maximum of 2GB RAM. In other words, if you load your hardware up with 8 GB of RAM, then HIP/Horizon ain’t going to use it all!
The official hardware recommendations won’t be available until the end of Jan 2006. However, the unofficial word is that if your current hardware is recent, isn’t being stressed out by running Horizon 7.x, and (ideally) has some room for expansion (e.g. extra CPUs or extra memory), then that chances are that it will be suitable for running Horizon 8.0.
For small to medium sized libraries, you should be able to run the application and database servers on the same box, but large libraries should look to run them on separate servers. Every session I’ve been to where that has been stated, a hand has always gone up and someone has said “can you define what you mean by small, medium and large?”…

Yeah – a medium sized library is one that’s smaller than a large one, but bigger than a small one.
(paraphrasing Tim Hyde, SirsiDynix)

Finally, clustering options won’t be available until the release of Horizon 8.1 (Q2/Q3 2006).

CODI 2005 – Day Three (am) – pt 2

Tailored Just for U: uPortal Customised for Academics (Dennis Todd)
The HIP 4 admin tool is built on the 8.0 code base and will run on any desktop that can run Java.
Dennis had prepared a useful “HIP 4.1 Customisation Parameter List” document, but it wasn’t too obvious where this was going to be available to download from.
Dennis also introduced some of the new HIP 4 terminology:

targets – anything that HIP 4 can search against (e.g. Horizon database, Z39.50 targets, Digital Library, etc)
common codes – groups together similar result/search attributes from the targets so that a single author keyword search could match against authors, composers, editors, etc – whatever was closest to the concept of an “author” for that particular database or resource (MetaLib sites will already be familiar with this “lowest common denominator” idea)

A hint from Dennis – change the quick search (in the portal properties record) so that it searches against more than just the Horizon database.
Templates:

profile_user
template user – the “look & feel” that a logged in user gets
profile_guest
template guest user – the “look & feel” a non-logged in user gets

…make sure that the “system admin” box is ticked for both of the above!
When you log into HIP to make changes to those templates, firstly save the layout and then click save template user – don’t do it the other way around!!!
Remember that guest users shouldn’t have a “preferences” tab.
Anyone who is set up as a “system admin” gets a “manage channels” icon so that they can add new channels to HIP.
In HIP 4, new tabs are added in HIP – not from inside the Java HIP admin tool.
Dennis stressed the importance of getting your templates set up exactly how you want them before going live. If you decide to make changes after going live, then any patrons who had already logged in won’t get to see those new changes.
That raised the question of exactly when do you make those changes – if you’re busy upgrading from HIP 3.x to HIP 4.x, then you don’t have the luxury of saying to your patrons “Hey – the PAC’s not going to be available for a couple of weeks, because we want to play around with it first and make it all pretty for you!”.
Someone suggested that you disable logins until you’ve got the “logged in” template set up – but that would mean patrons wouldn’t be able to make requests or renew items via the PAC.
The only solution seems to be that you need to plan ahead and decide in advance what you want to appear in your HIP and how it should be laid out. Then, once you’ve finished the upgrade, cross your fingers and try and set it up as quickly as possible!

CODI 2005 – Day Three (am)

Insights into Web Reporter and NarrowCast (Eileen Kontrovitz & Brian Rawlings)

Wonderful product, but the roll-out hasn’t been the best!
(Brian Rawlings, Alpha G)

Optional components (add on services) for Web Reporter…

OLAP – used for data mining:
- report objects – include items included in the SQL but not included in the report (e.g. correct sorting by “reconst” fields such as title or call/class number)
- view filters – includes items in the SQL Query but filters the results displayed in the report
- derived metrics – create a new metric on the fly based on existing metrics on the report
- …who benefits? – sites with large databases will benefit the most, as well as people creating “what if?” reports
- consider purchasing OLAP only for the administrator

Report Services:
- Crystal Reports type interface that can draw data from multiple grids
- useful for creating letter-type output (e.g. invoice notice letters)
- …who benefits? – schools, home services, anyone wanting to create form letters
- in the future, larger number of Report Services documents will be created in future metadata releases
- not every user needs Report Services

Narrowcast:
- pro-active, automated report delivery
- reports can be sent to email, files, printers, or SMS devices (text messaging)
- …who benefits? – everyone!
- Narrowcast users are cheaper – you may have plenty already
- savings – you can enter multiple email addresses for the same user

[Narrowcast] is the most exciting part of Web Reporter
(Brian Rawlings, Alpha G)

Brian’s general recommendations:

buy as few users as possible
buy analysts licenses rather than reporter licenses
compliance is based on the number of logins created
enable add-on services for individual users as needed

General MetaData rule: when including item attributes, look for them first in the request, circ, circ_history, burb, and burb_history folders
Narrowcast automation can…
1) save you and your staff money and time:

automate report delivery
notices (inc. pre-overdue)
newsletters
performance based alerts

2) deliver reports to a fixed group of people, or a dynamic group of people on a specified schedule
3) deliver any type of email notice:

dynamic subscription, dynamic content
hold notices, pre-overdues, overdue, billing, etc
html formatted email, text completely customisable
queries database for notice conditions, updates records after sending email
needs a few custom attributes if using MSTR 7.5.0

Narrowcast can keep a copy of emails sent, or it can write to the Horizon database to write a block.
Narrowcast Newsletters:

email newsletters and event calendars
keep your patrons informed of library events

Narrowcast can sent performance based alerts – e.g. alert me when Day End did not run

Month: November 2005