Thursday, November 16, 2006

Arguing against tags

I just read the short "Beneath the Metadata: Some Philosophical Problems with Folksonomy" by Elaine Peterson (D-Lib, Nov. 2006), which demonstrates that "A traditional classification scheme will consistently provide better results to information seekers [than a folksonomy]."

I hardly know where to begin, but take this idea:
"[I]f users can continuously add tags to articles, at some point it is likely that the whole system will become unusable. A folksonomic system threatens to undermine its own usefulness."
The reasoning is that, as more tags are added, the number of wrong tags will incease. More bad tags mean less usefulness, eventually sliding all the way to complete uselessness—our old friend, the map of China that is the size of China, gets a mention. But tags are deployed statistically where possible, not by the one-for-one correspondence of a card-catalog subject heading. All arguments in favor of tags and all significant efforts to find and order information with tags (eg., Del.icio.us, LibraryThing, Flickr, CiteULike) are predicated on the heavy use of algorithms and statistics. This is a key part of the argument for tags, but Peterson's article doesn't mention it.

Imagine an argument against subject headings with a similar deficiency of key information—"LCSHs won't work because most of us live too far away to visit the Library of Congress regularly." I don't think this misunderstanding is any less basic. Once you factor in statistics you'll understand that as tag density increases, it becomes easier to spot and discount noise, not harder. If the census visited just one house in Maine, it might decide state residents were all Aleutian Islanders. As they visit more, the chance of coming to that conclusion swiftly vanishes.

I need to decide how to approach this stuff. I do not have, and never will have an MLS. This is a real disadvantage. There are also political minefields to be negotiated. When you're in a discipline you know whom you can safely argue with, and whom you can't.

I was contacted by an academic publisher today, interested to find out if I had a book in me. A proposal to discuss user-contributed metadata, particularly tags, in the library catalog did not prove interesting. I had meant to bow out anyway—I have no time!—but being refused lit a fire under me. Someone needs to write a good book on the topic. If not me, who? All I need are a dozen more plane trips without wifi. Fortunately or unfortunately, it looks like I'll get that.

PS: I'm going to see Abby (and John Blyberg) talk tomorrow at a NELINET event, "OPAC 2.0: Reinventing the Library Catalog." I'm thinking I'll tape her talk on my MacBook. I wish the iSight camera faced outward. As it is, I'll have to film myself reacting—pensive! amused! shocked! itchy!

Update: The article is similarly (if more politely) panned on Dystmesis. The blogger also wrote a paper on LibraryThing's tagging, which I'll blog soon.

19 Comments:

Anonymous Anonymous said...

> heavy use of algorithms and statistics. ... but Peterson's article doesn't mention it.

Maybe she doesn't know it??


Link to Opac 2.0 comes up with http://www.nelinet.net/travreg/invaliduser.asp , btw.

11/17/2006 2:29 AM  
Anonymous Anonymous said...

> A traditional classification scheme will consistently provide better results to information seekers

So: a book is mainly about A but also a bit about B. With traditional subject headings you have to decide to either leave B out or give it the same weight as A.

With LT's tag cloud (bigger or smaller font, depending on how much the tag was used) - I can see withing seconds: 'Oh, it's mainly about A, quite a bit about B, a little bit about C. D also gets mentioned, which is unexpected (but as the tagging comes from somebody who had time to read the book, it's additional information), and quite a few people liked it (from the 'terrific' tag). Also, somebody got it for their birthday. ;-)

11/17/2006 3:02 AM  
Anonymous Anonymous said...

Another argument in favour of tags might also be quite simply that they can be very personnal: see some amazon books tagged with tags like "present John" or "christmas tim"...
Tags which will not be a problem for the general public, given the statistics, but which will be useful to the person who chose them.
Bad tags are good for me when I use them.

11/17/2006 3:29 AM  
Anonymous Anonymous said...

(Link to Opac 2.0 now working fine - might have been on their side.)

11/17/2006 3:48 AM  
Anonymous Anonymous said...

After having read the article:

- LT allows to combine tags (and many users are ready to put time into doing so), thus eliminating the difficulties arising from wrong or different spelling.

- Tagging is a completion to classifying, not its alternative. User generated content should be clearly marked as such. There is a difference between following a link to 'similarily tagged' and one to 'put under the same classification / subject heading'. The classification is narrower, the tagging wider. Both have advantages and disadvantages.

- Displaying only the most used tags for a book, maybe with a link to 'see all', would hide the 'noisy' tags.

- "... inconsistency that allows a work to be both about A and not about A."
Only relevant if you don't weight the tags.

> I need to decide how to approach this stuff.

Can't you just say: yes, classification is better - in what classification wants to do. But tagging opens a pool of _additional_ information.

How much of the heat in the argumentation does boil down to the fear of loosing library funding? It does remind me of your link to crowdsourcing (http://www.wired.com/wired/archive/14.06/crowds.html) ..

11/17/2006 4:41 AM  
Anonymous Anonymous said...

> pensive! amused! shocked! itchy!

doodling! ;-)

11/17/2006 5:33 AM  
Blogger Robert J. said...

As someone who specializes in systematics and who has even published on the "1:1 map" (which goes back to Lewis Carroll in 1893; is the Borges citation really correct?), I guess I didn't find that essay terribly deep, either philosophically or in terms of its understanding of the technology of the moment. Is there reason to think anyone will actually pay much attention to it?

11/17/2006 8:30 AM  
Blogger Abby said...

no doodling! I'll be too engaging a speaker for Tim to have time to doodle!

11/17/2006 9:21 AM  
Anonymous Anonymous said...

:-)

11/17/2006 9:55 AM  
Anonymous Anonymous said...

I would actually love to write that book.

I know of a university library who is experimenting with user-supplied tags for their library functions, so I have a suspicion that when working in tandem, both LCSH (or any subject heading standard) and tags could benefit the searcher greatly.

11/17/2006 11:07 AM  
Anonymous Anonymous said...

Is there something in that the citation to a solid foundation (i.e., mathematics) is from 1954, and so completely misses Mac Lane?

11/17/2006 11:11 AM  
Blogger Dystopos said...

Regarding your Macbook's iSight, I wonder if one of these wouldn't come in handy.

11/17/2006 1:19 PM  
Blogger Tim said...

Sunny: "Can't you just say: yes, classification is better - in what classification wants to do. But tagging opens a pool of _additional_ information."

I can get behind this, or almost. I think classification is better for some things, tags for others. It's TV and radio.

11/17/2006 1:49 PM  
Anonymous Anonymous said...

Huckleberry?

11/17/2006 2:42 PM  
Blogger Poss said...

I think I go with Nicomo on this one. My tags are personal in that they represent what the book means to me and not an alternative to classification. I doubt that I'm actually consistent myself over time. It would be pie in the sky to expect that users would agree on some sort of controlled vocabulary or terms derived from a thesaurus. Classification is for a public library that can put its copy of a book in only one place on its shelves. I sometimes have problems deciding where to put a book. Sometimes size alone settles the matter — though I've yet to use 'big book' as a LibraryThing tag.

11/17/2006 3:10 PM  
Anonymous Anonymous said...

Hmm - I think she's missed a step - categorisation of items into one or more categories using a controlled vocabulary.

Classification results in an item having only one 'place' (eg using Dewey to shelve books, where a book can only ever be one place on a shelf), whereas categorisation can result in an item being placed in one or more categories (eg Library of Congress subject headings, which is somewhat more useful).

Folksonomy is just distributed categorisation without the use of a controlled vocabulary. My personal feeling is that folksonomies have only novelty value for finding textual material on the web but they're really useful for non-text data (eg video, images, audio - just look at how handy they are on Flickr).

Academics are so impractical :)

11/19/2006 11:10 PM  
Anonymous Anonymous said...

In my indexing class, a fellow student cited this article about "democratic indexing": Hidderly, Rob and Pauline Rafferty. "Democratic indexing: an approach to the retrieval of fiction." Information Services & Use 17.2/3 (1997): 101-110.

Basically, the authors discuss exactly what you did, Tim: that a process they call "reconciliation" will examine the various privately assigned tags and create a public index based on that. The article refers specifically to image indexing but they do also discuss the applicability of democratic indexing to any format, particularly fiction.

11/29/2006 6:14 AM  
Anonymous Anonymous said...

'Big book' as a tag? yes, I think I might need that one - how else can I find those hot-off-the-press hbs that won't fit on the shelf with all the other (pb) titles by the same author?

11/01/2007 10:26 AM  
Anonymous Anonymous said...

Peterson's article says "Wikipedia allows any person on the Internet to contribute articles to it without judgment from others." This is absolute nonsense. There are lots of rules, dispute resolving procedures, transparency and nasty troll factions pushing POVs, all of which constitutes "judgement from others". So my guess is that she doesn't know about algorithms and statistics.

Her core argument about relativism is probably correct, though. Tags let anyone impose whatever scheme on the data they want, including spam and other very questionable schemes of use only to the author or tagger, not to the reader.

Factionally-defined schemes could become a halfway - a relational or pluralistic solution that doesn't let any individual set up a scheme that doesn't meet some basic audit criteria. Instead, groups that genuinely share a point of view on the data can collaborate to sponsor a scheme of use to their own purposes. Choosing one of these or several of these would be easy, but getting one past audit criteria would be relatively hard. You would need, for instance, an exclusion test for each category - how do I know NOT to apply this tag to this object?

With that requirement imposed, it becomes very easy to decide not only whether a given tag applies in a specific hierarchy or taxonomy but also to decide which of several meanings of the same tag is closer to what is meant.

2/16/2008 5:21 PM  

Post a Comment

<< Home