The “Harry Potter Effect”

If you look at the overall keyword cloud for HotStuff 2.0, you can see librar* bloggers like to talk about libraries, books, reading, books and libraries.
When some things are more popular than others, this gives rise to Tim Spalding’s “Harry Potter Effect” — everyone’s got the HP books on their shelves, so, if you’re not careful, they end up becoming the top recommendations/suggestions for almost any type of book.
In our case, in many of the keyword clouds, “library” and “book” keep on coming out as the largest words. Whilst this is an accurate reflection of what the blogs are talking about, it does hide some of the more interesting and relevant keywords.
In honour of Mr Spalding, and at the risk of getting sued silly by Mrs Rowling, I’ve added a bit of JavaScript to toggle between a full version of the cloud (“incrementum!”) and one that can sometimes bring out more interesting/relevant keywords (“redactum!”).*
As an example, the full keyword cloud for presentation has “library” as the largest word…

…click on “redactum!” and you get a cloud with some more interesting words such as “audiences” and “interaction”…

* apologies for the cod Latin!

Another day, another bit of code…

I’ve added another bit of code to HotStuff 2.0 to try and locate blogs with similar content. In theory, the suggestions should improve as more posts are consumed (the good ol’ Network Effect) as this gives the code more data to find matches on.

For those interested in such things, the code compares the word frequencies of the blog in question with those of all the other blogs to try and locate those whose content is similar.

HotStuff 2.0 widgets

For anyone who’s interested, I’ve just posted a couple of HotStuff widgets: www.daveyp.com/hotstuff/widgets/
If you’ve got a blog which is listed, you can add a widget to show your current “Hot or Not” rating…

The second widget allows you to add a word cloud (based on either all words, words used by a specific blog, or for a specific word)…

Both widgets are available as either WordPress sidebar widgets or as embeddable JavaScript.

HotStuff 2.0 – new features

I’ve added a couple of new features to HotStuff 2.0 today…
1) “Top blogs” for specific words — this locates the blogs which contain the highest ratio of posts containing that word (matching on the common word stem). For example, currently The Kept-Up Academic Librarian is the top blog for universities and Phil Bradley is top for searching.
2) “Hot or not” score for each blog — using a top secret formula (which I might patent as “BiblioBlogRank”!), for each day’s blog posts, points are added or subtracted to the overall score for that blog. Points are gained for using words which have seen a recent increase in usage, but are lost for using words that are declining in usage. For reasons that even I’m not too sure about, Slaw is today’s hottest blog and TangognaT is the least!

HotStuff 2.0 – live and kicking

As promised/threatened just before Christmas, the new version of HotStuff is now up and running: www.daveyp.com/hotstuff/
It’s still early days, so it’ll be a week or two before it really starts to pick up on the hot new topics in the biblioblogosphere. So far, it’s sucked in just under 1,000 blog posts and found nearly 17,000 unique words.
Each day, it’ll create a new Word of the Day blog post using a word that’s seen a sizeable increase in usage in the previous few days. Today’s word was “skills“.
You can also search for specific words (e.g. Dewey, LCSH or cool) or view keyword clouds for specific blogs (e.g. “Walt at Random” or “Tame the Web“). There’s also a keyword cloud that pulls everything together to show the most used frequently words from all the blogs.
Once again — if you’d like your RSS/Atom feed adding, just leave a comment (same goes for if you’d like your feed removing!). You can see a list of the current feeds on Bloglines: www.bloglines.com/public/liblogs

HotStuff 2.0

After killing off Hot Stuff due to a server upgrade, I find that I’m kinda missing it!
So, I’ve decided to have a second stab at the problem and this time the code is much cleaner and faster. In particular, I’m using Bloglines to handle fetching all of the feeds and then grabbing the new posts via the Bloglines API.
It’s too early for the code to start spotting new keywords and topics yet, so it’ll be early in the new year before it launches fully. In the meantime, feel free to check that your favourite library/librarian blogs are included in the list of sites I’m pulling content from: http://www.bloglines.com/public/liblogs.
Please post a comment with the URL of any blogs you’d like including!
I’m hoping the make the new code a little more visual, so expect to see things like these…
final6_50_1 final_015
[edit] HotStuff 2.0 is gradually appearing here: http://www.daveyp.com/hotstuff/

spam in the hot topics

Apologies for the spam words that are currently appearing in the hot topics cloud at the moment.
It looks like the BlogJunction blog has been hacked — if you view the page source for the blog, you’ll find multiple hidden links to gambling sites (the links are currently being hosted by Universitat Oberta de Catalunya UOC).
I’ve removed BlogJunction from the list of sites used for the cloud, so the spam should disappear in the next 48 hours.