Wednesday, February 04, 2009

Random tags for scholars

Someone asked me to come up with a page of truly random tags for an academic project that needed to assess typical tagging. It might prove interesting to other students and scholars doing projects on LibraryThing.

Here's the page.

It's an HTML page, not an XML feed or other such format. Techies will scoff, but I've been asked for a lot of data like this, particularly from MLS students. The people who can easily parse XML in programming languages are not generally writing graduate school papers.

Labels: ,


Anonymous staffordcastle said...

Tim, quite a few of the tags just say "private member" - can you add on a script to exclude private libraries from the pool you're drawing from?

2/05/2009 12:30 AM  
Blogger Tim said...

Yeah, the user wanted to know how much was being excluded. Meh. I don't think it hurts too much.

2/05/2009 12:34 AM  
Anonymous kevinashley said...

Interesting stuff. One discovers all sorts of things looking at random data. Until now, I was blissfully ignorant of 'sports sex' as a genre.

How are you making the random choices ? Is it choose a random work, then choose a random tag from that work, or do you start with the tag ? I ask because it can make a difference to the analysis (depending what the students are doing with it.)

I can understand you not offering it in XML given who wants it. But CSV should be accessible to most students without effort, and would some tasks a little easier, such as sorting.

2/05/2009 4:31 AM  
Blogger Xach said...

If only there were some hybrid of HTML and XML.

2/05/2009 5:05 AM  
Blogger Katya said...

How are you making the random choices ? Is it choose a random work, then choose a random tag from that work, or do you start with the tag ?

My assumption is that he's starting with the tags, picking one at random, and then pulling out the individual using the tag, the work being tagged, and the rest of the tag cloud assigned by that individual to that work. (Tim, am I right?)

It's very interesting because it's not just a generic tag, but a specific instance of the use of a tag, which is actually even more specific than I originally envisioned. Thanks again!

2/05/2009 9:00 AM  
Anonymous Anonymous said...

Interesting way to find spam postings- see line 340, and the other 'books' in that user's/author's library

2/05/2009 11:38 AM  
Blogger ladycassilis said...

anonymous, the page is different every time, so your spammer has sliped through the system I fear :(

2/05/2009 7:16 PM  
Blogger jm said...

Feature request: accept a parameter, mininum_uses, and only return tags that have been applied to that many things.

2/07/2009 3:38 AM  
Blogger Tim said...


Can't do that accurately. I'd have to work off summary tables (ie., frozen information).

2/07/2009 6:06 AM  

Post a Comment

<< Home