Sunday, May 14, 2006

Tagging meets Subject Headings

Tim has just announced a new LibraryThing feature - the addition of subjects. Now you can look at a book and see both the user-created tags as well as the librarian-assigned subject headings. This puts us in the middle of the age old debate: tags or subject headings? Folksonomies or taxonomies? Ok, maybe the question isn’t quite that old, but it’s certainly debated. Subject analysis is a fuzzy discipline - decisions on "aboutness" are hard. But is it necessarily a question of one over the other? Can they work together at all?

Tags are touted as one of the new great things coming out of Web 2.0. People organize their information using their own vocabulary, deciding for themselves what their books are “about”, and what words they will use to classify them. Tags can also be incoherent, unsystematic, and haphazard. Some tags, like fiction or unread are more useful to the user who provided the tag then to other people. (The unreadable tag which I just discovered, on the other hand, is fascinating!)

There are certainly cases where tags work well. Take Armistead Maupin’s Tales of the City, for example. The top tags include queer and gay fiction, whereas the subject headings are City and town life > Fiction, Humorous stories and San Francisco (Calif.) > Fiction. Someone looking for Tales of the City is unlikely to start their search under City and town life > Fiction (San Francisco, however, might prove a good access point, which is also highlighted in the tags).*

Subject headings, on the other hand, use controlled vocabularies to show hierarchical relationships. They’re assigned by professionals, and are vast, structured, consistent, and organize books into conceptual categories.

Subject headings work great for browsing a subject area, because of their hierarchical structure. Under the tag for civil war is a haphazard collection of books. The subject page for United States > History > Civil War, 1861-1865, on the other hand, provides a list of subdivisions, giving you the ability to do more educated browsing. Interested in the fiction? Historiography? Women in the Civil War?

There are far more subject headings than tags, and their use is indeed a balance of precision. When LCSH terms are too specific, they will pull up only a few books (conversely, if they are too general, thousands appear). Check out the subject heading Married People > Drama which brings up four books in LibraryThing, including two Shakespeare works - but strangely, not Macbeth.

The ordered structure of subject headings gives added meaning. History > Philosophy is very different from Philosophy > History - a distinction that isn't necessarily apparent when searching history or philosophy separately as tags.

Another example - if we look at the tag dystopia, the top two books are 1984 and Brave New World. Interestingly enough, the subject Dystopias gives the exact same top two books. This is also a good demonstration of the binary nature of subjects—something either does or does not belong to a subject. According to the LC, The Time Machine is a dystopia. By contrast, a tag can essentially say The Time Machine is "sort of" a dystopia.

And still, there are times where tags and subjects appear to be enmeshed. Check out Jamie O'Neill's At Swim Two Boys - the tags and subject headings are pretty complimentary.

This comparing and contrasting is getting addictive, but I'll stop. The data's there - go try it yourself!

*[We owe the idea of looking at "gay" and "queer" tags to Clay Shirky's seminal talk/essay "Ontology is Overrated." The phrasing of a low tag score saying something is "sort of" something is David Weinberger adapting Joshua Schachter (source). — Tim]

16 Comments:

Blogger Steve Lawson said...

As a librarian, I'm all for Library of Congress Subject Headings plus user-generated tags. The tags can be more meaningful for individuals and for certain user communities (imagine a college library catalog where students have been tagging books with course numbers when they use them for assignments--actually, it would probalby bring up a whole new set of problems, but they would be interesting problems).

If you forgive the self-link, you might enjoy the recent fiction subject headings quiz at my blog, See Also. Can you identify novels just from the (sometimes odd) Library of Congress Subject Headings? Maybe Tim could whip up a "guess that book" from its top five tags?

5/15/2006 12:16 AM  
Blogger Robert J. said...

Some stray analytical thoughts that have been accumulating for a while:

LibraryThing's tags, as they are used in practice by people, fall into several groups, and I think it's important to keep these separate in discussion because they perform different functions:

1. Personal call number tags: these are the location notes you sometimes see, like "in the bedroom," or "home" vs. "office"; conceptually, a book may be given only one of these by a user because the physical object is only in one place, just as with library-assigned call numbers.

2. Personal flags: these are the "unread" vs. "read" markers you see, or "on loan," or one of mine, working copy, used for mass-market paperbacks or damaged copies that I would like to be able to replace someday. (I took the tag's name from one of the great condition-descriptions in bibliography, used by the venerable natural history dealer Wheldon & Wesley: "Working copy: useful for information but not a pleasure to possess.") These personal flags are "status markers" that could be assigned to any book regardless of its subject matter, which is what distinguishes them from...

3. Personal subject tags: these are the personal (currently uncontrolled) counterpart of the controlled subject headings of LC or any other cataloging agency. Many LT users have elaborate systems of personal subject tagging; I haven't used very many, but have put a few in where the existing LC headings aren't as effective as I would like, such as numismatics and residential colleges. The primary operational difference between these personal subject tags and the LC subject headings as they operate at the moment in LT is that the LC subject headings are hierarchical. Personal tags are not hierarchical now, but this is an obvious next step that could be taken. (I know WordPress allows you to specify a "parent tag" for any given tag in a blog, but I have never experimented with that; that's just creating a tag hierarchy, similar to the one in the LC subject headings.)

So let's think about each of these three categories of tags.

1. Personal call numbers will always be just what they are: personal. These are the kinds of tags you'd probably want to omit from system-wide displays and calculations.

2. Personal flags or status markers: in most cases these are similarly personal, and should probably be left out of system-wide calculations.

3. Personal subject headings: this is of course the interesting part. To subdivide this category further:

3a. Descriptive (meta)data that in some cases would have natural home somewhere in a MARC record, but because of LT's simplified structure, doesn't obviously fit anywhere. Some of these things would be note fields (5xx) in a MARC record; others might correspond to an illustrator or binder. I have flagged a few books Bruce Rogers when I know they were designed by that famous typographer. They aren't *about* Rogers, nor is he the author: he's the typographer/designer of the volume. A similar flag I've used is presentation copy for autographed/signed copies of a work. (More on that terminology in a moment.)

3b. Personal subjects proper: these are the user-generated tags that really are subject headings, like my examples of "numismatics" or "residential colleges" above. As I see it, these will always exist along a spectrum from casual and irregular, to formal and controlled (like the LC subject headings). At the casual end, some people will always want to use tags like "pets" or "favorite stories," and that's just fine. At the formal end, one obvious enhancement would be to let users select offical headings from the LC Subject Headings and assign them to their own titles as they please. Hennig's Phylogenetics, for example, really should have the LC subject heading Cladistic analysis added to it, since it was one of the foundational works in that field, but it probably was catalogued before the term "cladistic analysis" came into widespread use. (Some old librarians will remember having to explain to people looking for information about "World War I" in the card catalog that they should look under "Great War" or "European War," since those were the headings used before WWII caused the Great War to be renamed WWI.) I think it would be extraordinary to be able to pull down a list of LC subject headings directly and assign them to items in my catalog. LibraryThing's "works" system would explode with valuable information unlike anything now in existence. (Maybe you would would have to put a limit on it; say, no more than 10 or 20 LC headings for any given title.)

The in-between zone, between casual and formal, is where most of the creative action will be. I think it is inevitable that particular subject domains will develop their own controlled vocabularies, and that these will be pointed-to via something like RDF. This general approach has been around for centuries in any number of specialized fields; any serious work in natural history, for example, will always tell you what nomenclatural authority it is following, and highly technical works will point you directly to the original sources for how a particular name or vocabulary term is used. I can imagine LT's tag pages sprouting links to authority pages (or directly incorporating wiki-like editable authority data) that specify the vocabulary used within specialized domains. None of this will preclude casual tagging; it will just make non-casual tagging more powerful.

A small example of this from the (3a) category: I mentioned above my use of "presentation copy" as a tag. This is a bit fuzzy in my usage, and I'd like to clean it up, but I need a controlled vocabulary to do so. Some copies are simply "signed by author," others are "author's presentation copy," while others might be "presentation copy from third party." These are categories that are probably described with precision in the ISBD for rare books, but I haven't delved into it yet. This is a perfect example of where I'd really appreciate a specialist providing an easily-accessible controlled tag set that a non-specialist like me could apply with ease in LT.

5/15/2006 12:52 AM  
Blogger Tim said...

Additional type: Opinion tags.

These are like your flags, but they're meant for other people to read. Yes, sometimes someone will tag something "crap" because they keep it in their "crap shelf" (we all have one). But more often they're trying to review-by-tag.

I mention this particularly because Amazon's book tags are much heavier in opinion tags. This is largely because you've no other reason to tag on Amazon and because you can tag things not in your collection--so controversial books get lots of nasty tags.

I think one could come up with other, more complex tag "classifications." But maybe we should just tag the tags!

:)

5/15/2006 1:14 AM  
Blogger Robert J. said...

But maybe we should just tag the tags!

OMG, that is so meta!

5/15/2006 1:19 AM  
Anonymous Anonymous said...

For instance, if you have "ancient languages", "latin", and "greek", do you put the first on everything with the latter two? The answer seems to depend on your planned needs and not the set theory.

That's a beautiful example of why this really is practical taxonomy and not ontology.

5/15/2006 1:49 AM  
Anonymous Anonymous said...

The published Library of Congress Subject Headings contains tons of additional information about "related tags," beyond the straightforward nesting of divisions and subdivisions. Here's the entry for one class (P123), transcribed as part of an exercise I did some years ago on the LC subject headings relating to the historical sciences:


Comparative linguistics [P123]

Here are entered works which compare languages or groups of languages for the specific purpose of determining their common origin, or discuss the method of comparison, as represented by the 19th century comparative philology and its subsequent developments. Works which compare or contrast two or more languages with the aim of finding principles which can be applied to practical problems in language teaching and translation are entered under the heading Contrastive linguistics.

Used for:
Comparative philology
Philology, Comparative

Broader topics:
Historical linguistics

Narrower topics:
Glottochronology
Nostratic hypothesis
Semantics


As LibraryThing's subject pages develop, this is just the sort of data that should go at the top in an "About this subject heading" area, so people will be able to connect to related fields directly (and understand why the headings have been assigned as they have been in some cases).

5/15/2006 2:13 AM  
Blogger Tim said...

Good point about the LCSH's. There's a good deal of additional info both in that document and in the fields themselves. In my experience, OPACs don't expose that either.

5/15/2006 8:29 AM  
Blogger Robert J. said...

Good point about the LCSH's. There's a good deal of additional info both in that document and in the fields themselves. In my experience, OPACs don't expose that either.

Right on. One of the most profitable things one can do in a real library is browse the stacks -- that's the classic "recommendation system." Historically, OPACs have not been very good at supporting an equivalent kind of browsing. (I'd argue that old card catalogs were even better for browsing than most OPACs.) Imagine LibraryThing with the full LCSH incorporated into it, including heading commentaries like the one shown above, and with a linked overlay of personal tags. That's real browsing and recommendation power.

RJO, D.Reb.
Councilor at Large, American Thingology Association
Editor, Progressive Thingologist

5/15/2006 12:40 PM  
Blogger Tim said...

Hey, I'm the editor!

5/15/2006 1:40 PM  
Blogger Tim said...

RJO get's blogged: http://washtublibrarian.blogspot.com/2006/05/tagging-versus-subject-headings-redux.html

5/15/2006 1:45 PM  
Anonymous Anonymous said...

Hey, I'm the editor!

Hmmmm, maybe you're the Editor-in-Chief and Publisher? (More prestige, less work?)

RJO get's blogged

I thought I felt a little funny today.

Could we please standardize on the degree conferred upon Thingologists? Is it D.Re. or D.Reb.?

Standards, standards, standards. Everybody keeps talking about standards. Abbreviations just want to be free!

(I was thinking of the parallel to D.Phil. and D.Mus., but then there's also D.Sc. I should have paid more attention in high school Latin: what would it be in full, Doctor Rebologiae?)

5/15/2006 2:32 PM  
Blogger Tim said...

See (my site) AncientLibrary.com, which has Edward's _Greek-English Lexicon_. Chrema, I suppose.

http://ancientlibrary.com/eng-grk/0286.html

Since I don't have my Big Liddel online, and I'm lazy, I don't know if there's an adjectival form for bibliotheke. CheremaBibliothekion?

Latin: ResLibraria?

I'm sure Languagehat will correct me...

5/15/2006 5:32 PM  
Anonymous Anonymous said...

You left off one other characteristic of LCSH: expensive. Here's another: in small databases, LCSH too often creates a "onesy" problem, where a SH leads to one or maybe two items.

OCLC's FAST is an intriguing tag-like alternative, sort of "LC Lite." After reading about it, all I can think is "tagging." :-)

5/15/2006 7:00 PM  
Blogger Tim said...

FAST is interesting—and new to me. Abby's looking at it to. I wonder if LibraryThing can "reverse engineer" FAST the way it did FRBR :)

5/16/2006 7:58 AM  
Blogger Robert J. said...

We are not alone:

Classification modules or mini-classification schemes with special focuses can also be built on the basis of LCC to meet the needs for effectively organizing web resources and digital libraries in specific subject areas (e.g., education, human environmental sciences, mathematics, engineering), industries (e.g., petroleum, manufacturing, entertainment), consumer-oriented topics (automobiles, travel, sports), problems (e.g., environment, aging, juvenile delinquency), as well as the needs of specialized user communities (e.g., special libraries, corporate information centers, personal resource collections). Where more details are needed in a particular situation, the basic structure of LCC can be extended, thus making the specialized scheme interoperable among one another, within the main LCC structure. The availability of tools such as Classification Plus and Classification Web greatly facilitates the creation of these domain-specific taxonomies and specialized subject classifications. With these new vehicles, the Library of Congress Classification, initiated over a century ago, may look forward to another productive century of service, extending its usefulness beyond that of a shelving device to a tool for organizing and providing access to electronic and networked resources.

5/17/2006 1:38 PM  
Anonymous Anonymous said...

It's so hard for me to read that subject headings (in LC practice), are "structured, consistent". Eeeek!

I really don't know why you, American librarians, are still using the thing at all! The LCSH give subject indexing a bad name and it is so sad to see it in contrast with folksonomies as examples of structure.

Historically, American librarianship seems insulated by LC practice, and winds up believing that LCSH do present a sound syndetic structure and that LC Classification is a good system!

Meanwhile... British Library's PRECIS was dismissed by the very library who begot it. Sigh!

Caio Monteiro
Brazilian librarian

4/25/2008 1:44 AM  

Post a Comment

<< Home