Friday, February 27, 2009

Rocky Mountain News: Final Edition

Another important newspaper dies.



Sure, models change and things are gained too. But things are also lost. Denver is definitely the worse for this. You've got to worry it'll be publishers and libraries in ten years.*


*Both are suffering now—witness the recent HarperCollins layoffs and the Philadelphia closings, but Newspapers are in really deep trouble.

Labels: ,

Monday, February 23, 2009

Classify your heart out

Here it is, the revised list of top level categories. These have been vetted by all of us for awhile and it's time to start building subcategories. We've created threads in the Group to discuss the subcategories of each top level. Keep in mind that these need to be comprehensive, but not excessively granular. Take a look at this example of possible subcategories for PETS.

After more of the second levels are fleshed out, we plan to have a new classify-this feature to test out the classification system on books in LibraryThing.

Until then, classify and discuss!

Labels: ,

uClassify contest winner

After some delay, I can announce that the LibraryThing/uClassify contest has been won by Kelly Vista—the only entrant, but a worthy one. (Kelly gets a copy of Programming Collective Intelligence and $100 from Amazon or IndieBound.) She described her "LibraryThing classifier" as follows:
"My goal was to create a classifier that would automatically “tag” any book description based on actual LibraryThing tags. For example, if you paste the book description for “Truman” into UClassify, it should return to you LibraryThing tags that suit the book. This is one step more general than one of [Tim's] ideas (fiction vs. non-fiction)."
In my testing, it does a pretty good job of hitting the top tags. Pasted descrptions of Harry Potter give "young adult" and "children's." John Adams gives "american history" and "biography." It's not perfect—Adams is also labelled "young adult"—but the initial results are good and the whole point of uClassify is to enable accelerating accuracy.

uClassify seems to be growing apace. They recently opened up public classifications for external access, so I'll be looking into automatic text-language classification of LibraryThing reviews.

Labels: ,

Research libraries clobber OCLC Policy

The Association of Research Libraries released its report on the new, now delayed OCLC Policy, and it's a doozy—a forceful rejection of both the process and content of the Policy.

The full report makes for enjoyable reading—outside of Dublin, Ohio anyway. The task force members, research-library heavyweights all, fully and finally put to rest the notion that the only people bothered by OCLC's power grab are open-data crazies and evil commercial companies.

There appears to have been a significant split. The majority felt it "desirable to have a policy that limits large-scale redistribution of records that could be harmful to the collective" and a minority did not. (It's great to hear that a team of veterans had at least one member willing to reject the whole structure of cooperative-restriction!) But if the majority felt some policy was called for, they were apparently unanimous in condemning OCLC's unilateral, non-consultative approach and concerned by a host of issues, large and small. Surveying the current Policy they urge a "fresh start."

Vague legal language, unclear goals, worrying process, the split between the "nice" FAQs and the actual language of the Policy, issues of clouded ownership and responsibility for bibliographic data, termination provisions, the lack of respect for federal libraries and the legal impossibility of binding them without explicit renegotiation—it's all here! There's even a legal opinion, attached to the document, pouring cold water on the idea that the Policy will have any "downstream" effect on parties that haven't explicitly agreed to it (ie., LibraryThing members). In all, a good drinking game could be invented—every time the ARL report validates or recapitulates a point made on this blog, and on other opponents', drink. (If you're going to Code4Lib this week, I'll buy the drinks!)

Most striking are the report's vision of OCLC as a cooperative, and the ways the OCLC policy undermined that trust:
"The collective activity of shared cataloging is a source of deep pride and success in libraries in the U. S. and around the world. OCLC was created as, and is viewed as, a membership organization formed for the purpose of enabling this collective activity.... Members view WorldCat as a collective enterprise, not as a product that they license for use. ..."

"The new Policy is clearly intended as a unilateral contract, unilaterally imposed on any entity using records from the WorldCat database, including member libraries.... The member community has seen the introduction of the new Policy as a fundamental change in the nature of the relationship between OCLC and its member libraries. In the eyes of the community, the guidelines expressed a mutual social contract, and the new Policy represents an authoritarian, unilaterally imposed legal restriction."
Now let's see what comes of this. OCLC has a needle to thread. The ARL report sets a high bar for consultation and consensus—higher than I think OCLC can reach without rethinking its whole communication model. And the core research-library concerns are serious*. I don't think they can address them without failing to ensure what I believe to be the Policy's true intent—establishing a permanent and lucrative data monopoly.

My prediction: Keep an eye on OCLC's "regional service providers." Various signs, including what reporters call "highly-placed sources" confirm that OCLC/regional tension is at an all-time-high, with OCLC increasingly rewriting the rules there too—selling directly to libraries in unprecedented ways. I think we can see in these moves a common historical pattern: when the structures that give a powerful institution strength start to weaken, it reaches for a new level of authority not based in the previous structure and therefore not susceptible to weakening. (In this case, OCLC is moving from a robust, often mediated cooperative to a unmediated, contractually-drawn licensure.) Sometimes the effort succeeds; sometimes the attempt crystalizes opposition and hastens and ensures the institution's decline.


*Even if they picked the members of the Review Board, they may still face trouble from that direction. I doubt that OCLC's Review Board has what the ARL board apparently had—members who apparently questioned the very idea of restricting access and use!—but all but one of the board members are academic/research librarians and can be expected to understand and appreciate the concerns raised by their ARL colleagues.

Labels: ,

Friday, February 20, 2009

What do Ben Franklin and C.S. Lewis have in common?

Answer: They're both on LibraryThing!

I'm pleased to announce the completion of Benjamin Franklin's LT catalog. This project wouldn't have been possible without the gracious permission of the American Philosophical Society and the Library Company of Philadelphia, the publishers of The Library of Benjamin Franklin (Philadelphia: American Philosophical Society, 2006). Not only have they made the book available via Google Books (here), but they also gave us permission to enter the data from it completely, including the wonderful and incredibly useful annotations by Edwin Wolf 2nd and Kevin Hayes, whose hard work and bibliographical sleuthing made the book possible in the first place.

On the LT end, thanks to pdxwoman, who got the project off the ground way back in January 2008, to hopeglidden and benjclark who cataloged portions of the collection, and to katya0133, who
entered a major chunk of the titles. I jumped in in November and worked to add more titles and augment the records by entering the annotations. We got on a roll in January; since the start of the year, Katya and I added 2,009 titles, ~800 of them in the last ten days.

You can browse the catalog here, read Franklin's reviews, and check out his stats. Not surprisingly, he shares many titles with his other Early American comrades.

No sooner is one finished than another is begun, around here. I'll be tackling the Virginia Georges next (Washington and Wythe) but BOB81 has taken on the task of heading up the creation of an LT catalog for C. S. Lewis, based on a listing created by the Marion E. Wade Center at Wheaton College. If you're interested in helping out, sign up here.

[So far Lewis and Franklin only have one work in common, The Spectator. More to come, I'm sure.]

Labels: , ,

Flash-mob cataloging tomorrow in Rhode Island

Join us tomorrow for the second Flash-Mob Cataloging Party, at the Audubon Society of Rhode Island in Smithfield, RI.

See the main post.

I'll be driving some people from Boston tomorrow morning. If you want a ride—no guarantees—drop me an email (tim@librarything). I check it all the time.

Labels:

Thursday, February 19, 2009

Seeing parallels

Steve Lawson wrote this wonderful piece for his blog See also..., reprinted here (by permission) in full:
There is a large organization whose main business isn’t producing information, but instead hosting and aggregating information for many thousands of users on the web. Users upload content, and use the service to make that content public worldwide, and, likewise, to find other users’ content. Then one day the large organization decides to change the rules about how that information is shared, giving the organization more rights–to the point where it sounds to some people like the organization is trying to claim ownership of the users’ content, rather than simply hosting it and making it available on the web.

A small but vocal and influential group of users object to the policy change. The organization protests that it isn’t their intent to fundamentally change their relationship with their users and that legal documents tend to sound scarier than they really are. Most customers are either unaware or unconcerned by the change in policy, but the outcry continues until the organization backs down a bit, sticking with the old policy for the time being. The future, though, is up in the air.

Facebook? Or OCLC?
Perfect, just perfect.

Labels: , , ,

Monday, February 16, 2009

Portland, not the other one!

American City Business Journals has named LibraryThing's home town, Portland, ME as the 10th-best place to start a small business. Best of all, Portland beat "the other Portland." (And did you know they were named after us?)

Three cheers for Portland. But at the risk of being ejected from the ranks of Portland, Maine's tech startup community, I think that—wait, there's no local startup community to be ejected from! There's LibraryThing. There's Foneshow (two guys?) and that's about it! What businesses are they talking about anyway?

This city has grown on me. It's scenic, quirky and cheap. My wife and I think we can find both the right school and the right house, and avoid some of the craziness of Boston. But the business climate here leaves a lot to be desired, especially if you aren't in tourism.

American City Business Journals must be talking about some industry I'm not in, with very different inputs. For a tech startup the labor market is a train wreck—way too small and illiquid. Even if you could hire them, the people are wrong. There aren't any top-notch universities spitting smart young hackers out into the local community.* And there are too many people who want "quality of life," which is great if you can get it, but hard-driving companies want hard-driving employees.** As Paul Graham wrote, ambition is a big city phenomenon. New Yorkers want to get richer. Cambridge people smarter. I still don't quite understand what Portland people want. Smart, ambitious people tend to leave Maine—it's a big problem.***

I'm sorry for the harsh tone of this post, but I generally don't hide my feelings. Do you run a local small business? A local tech business? Send me a comment and I'll buy you lunch. As we both know, there are some amazing places to eat around here.


*There are, it's true, more local tech people that it seems at first. But, like Alexandria, they're mostly "in" not "of" Portland—Bostonians who moved to Portland and still service Boston-area clients.
**That comment will no doubt draw objections. But nobody with knowledge of the community in Cambridge or the Valley work can dispute it. Startups work because people make them their lives. Any anyway, when startup people aren't working, they want to hang out with other driven people.
***Back in 2003, a study concluded that "half of the state's college graduates in 1998 wanted to live and work in Maine, but three of four ultimately left." Subsidizing Maine graduates who stay in Maine probably helps, but it's not the answer.

Photo by PhilipC, from Wikimedia Common (link).

Labels: ,

Sunday, February 15, 2009

Can your Kindle read to you?

The new Kindle apparently can "read out loud"—that is speech-synthesize—its books. Paul Aiken, director of the Author's Guild, told the Wall Street Journal they can't do that:
"They don’t have the right to read a book out loud. ... That’s an audio right, which is derivative under copyright law."
Renowned (and Newbery) author Neil Gaiman begs to differ:
"When you buy a book, you're also buying the right to read it aloud, have it read to you by anyone, read it to your children on long car trips, record yourself reading it and send that to your girlfriend etc. This is the same kind of thing, only without the ability to do the voices properly, and no-one's going to confuse it with an audiobook."
My opinion. Gaiman is right on the way it should work. The Kindle, with its DRM model, undermines what Gaiman got from "buying" a physical book, but it's certainly strange to imagine people can own a piece of text free and clear, but not be allowed to run a program that reads it aloud.

On the legal grounds, however, I fear Aiken might be right. As a rule authors grant publishers highly specific rights. These limits generally include countries, copies, covers, formats and timeframes. That's one reason eBooks took so long to take off—a million contracts needed to fly here and there before publishers could sell their books in the new format.

Anticipating future media is hard. My favorite passage in Shelley's Prometheus Unbound (okay, the only passage I remember from that deeply weird work*), predicts a world of freedom in which
[L]ovely apparitions...
Shall visit us the progeny immortal
Of Painting, Sculpture, and rapt Poesy,
And arts, though unimagined, yet to be.
In the real world, I fear, "arts, though unimagined, yet to be," require a contract addendum.


*The passage made it into the LibraryThing terms of use. I love my job.

Labels: , , ,

Why Wirral? One partial explanation.

A recent article in the Telegraph describes a worrying fall-off in library books and library usage in the UK.

Over the past six years books in public libraries in the UK have fallen 12%, from 116 million to 103.2 million. Library check-outs have fallen faster—16.5%. According to the Telegraph, UK librarians are bracing for another round of declining numbers, coming amid budget shortfalls across the board—and expecting to get their budgets slashed.

Reflecting on these problems, the CEO of the Museums, Libraries and Archives Council (MLA) told the Telegraph:
"[W]e live in an age where books can be bought cheaply from supermarkets or the internet so the reasons to visit a library have changed for many users."


Wirral as a microcosm. Cuts have started. The Wirral council system in NW England (LibraryThing Local), is closing 11 of 24 branches.

They sure don't deserve it. Taking a look at the Wirral Libraries website, anyone can see they're doing a lot of things right. The branches look well-organized and inviting. They've got a fair number of computers and free Wifi. They have a special outreach program for the house-bound. They even lend toys!*

But they are doing one thing very wrong—namely that Wirral, like most libraries, isn't really "on" the web.

People are finding things in supermarkets and the internet because it's easy to do so. On the internet, one-stop shopping means that a huge panaply of useful and interesting things are available from a single, unified and well-understood interface—from local bars, to local bands, to some 600 pizza and 400 curry joints in the area (Man, I love Britain!). Many of these resources are not only in Google searches, but Google will plot them on a map for your convenience.

What isn't online are library books! The Wirral Libraries' catalog, a Talis Prism OPAC, hardly registers in Google, which knows only 7,000 pages, from a library with more than 300,000 items. Worse, virtually every Wirral page in Google is broken. On the right are a representative sample of what Google knows about from the Wirral catalog. Each link has the same title. And each links to an expired session that proclaims:



You can, of course, get to the Wirral Libraries catalog if you know that's where you want to go—fifth link down, then the top rounded button on the right. That's not the same thing.

And even if you find a book, you can't bookmark it for yourself or forward it to a friend--the links will die off in a few minutes. In refusing to allow links and spider, the Wirral website sets itself apart from the other websites Wirral residents might use. The rest of the web just works—it's in your search box, where most internet-aware people do most of their information finding.

Lastly, where is WorldCat in all this, the "switching mechanism" and "point of concentration" (Karen Calhoun) OCLC provides libraries as an alternative to the "lunacy" (Roy Tennant) of libraries being on the web for themselves? Nowhere. None of the Wirral Libraries are in it, and WorldCat doesn't list a copy of Harry Potter in the Deathly Hallows closer than 60 miles away (postal code: CH46 6DE‎). One may speculate that Wirral wasn't willing to pay for the service, which anyway gets quite insignificant traffic.***

Who's to blame? Wirral Libraries' misfortunes are no doubt many, and not being part of the web is not the largest. But it's a part. Wirral citizens aren't seeing their library appear in their search results. They aren't as aware of its riches as they might otherwise be. If they were aware, it's likely they'd use these resources more, and the system would be easier to defend politically.

It won't do to blame Wirral for this. Library vendors have long handicapped their products in this way, and Wirral Libraries surely bought their Talis Prism system a while ago.** Budgets are short—and getting shorter. Both the web and this recession have hit libraries by surprise.

But refusing to participate in the central information technology of the age has its costs. And the leaders of Libraryland who advocated and continue to advocate for closed solutions, closed data and staying out of search indexes—except as "negotiated" with Google—have contributed to this situation. The respected guides have taken libraries off the great river of information, and left them grounded on the shore. Now someone's coming for the boat.

I hope the residents of Wirral fight like hell to keep their libraries open. Then they should fight like hell to make their libraries truly open.


*I don't know how common this is in Britain. I get the sense it's not too common in the US, but it happens. The Hingham Public Library in Hingham, MA lends practically everything, from toys to paintings on the wall.
**It's ironic that Wirral's OPAC was made by Talis, now one of the more progressive and forwarding thinking library vendors. I'll put this in a footnote to avoid "shilling," but if Wirral can get a new OPAC, I'll arrange for them to get LibraryThing for Libraries for free until they get back most of their funding. Maybe Talis would kick in an incentive to upgrade their OPAC?
***WorldCat is supposed to be the central website of Libraryland, but third-tier websites like LibraryThing and Dogster—the social network for dog lovers!—are currently beating it.

Labels: , , , , ,

Monday, February 09, 2009

Open Shelves Classification Update

Hello! Well we have been busy since Tim announced the classify-this feature. The OSC group has been extremely active with over 300+ posts about the top level categories (not to mention insightful threads popping up to discuss second level categories). Thank you for your feedback! Meanwhile, at the Midwinter meeting of the American Library Association we were able to have a really valuable face-to-face conversation with LibraryThing users.

We have been processing all your feedback and working on version 2.0 of the top level categories. Before we get to that, we wanted to let everyone know that we do read all the posts in the Open Shelves Classification group. Because of the high quantity of posts (and our day jobs) we cannot comment or respond individually as often as we would like.

Some key points after discussion, feedback and analysis:

-The number of categories in the top level. As decided last summer, we will have more rather than fewer top level categories. The top levels are not supposed to represent an even distribution of all possible branches of knowledge. Instead, the OSC top levels should represent the largest categories that public libraries will want to use. [Similar to how Library of Congress classification was built to meet the needs of the Library of Congress, while Dewey's system tried to contain all recorded knowledge.]

-Complaints about specific topics in the top level. Remember, there is no value judgment in a topic being placed at the top level or underneath a broader topic. For now, topics like Pets, Gardening, and True Crime are present because of feedback from public librarians that these are heavily requested books that are often pulled out into their own sections. As a guiding principle, the OSC will be statistically tested, so some of our top level categories may change as actual libraries begin to reclassify their collections.

-The nature of classification. Any classification system forces us to choose one topic for the book, even though that book may be about more than one topic. This is not a flaw in the OSC categories but in the nature of classification. Libraries will still use multiple subject headings in the catalog to capture all the topical aspects of the work.

-Facets. As talked about a few months ago, we currently plan on the top level categories being only topical while other aspects of the work will be represented by facets. For example, format will be captured in a separate facet. [And to clear up any lingering confusion, Comics will be a format facet.] Another facet talked about was audience. This means children's books will be tagged in the audience facet. We envision that these facets will be optional and libraries can use them if, for example, they want to pull out all the comics and shelve them in a unique section. Alternatively, the facet could be ignored and then graphic novels would be intershelved with other like topics. Here is a picture of what we are envisioning:
-Classification versus Signage. The top levels categories have nothing to do with
signage. This is particularly true with children's books, which can be grouped/displayed as the library desires (e.g. picture books, infants, board books, etc.).

We will posting an updated version of the top levels very soon, so stay tuned!

Labels: ,

Friday, February 06, 2009

Facebook in reality

Wednesday, February 04, 2009

Random tags for scholars

Someone asked me to come up with a page of truly random tags for an academic project that needed to assess typical tagging. It might prove interesting to other students and scholars doing projects on LibraryThing.

Here's the page.

It's an HTML page, not an XML feed or other such format. Techies will scoff, but I've been asked for a lot of data like this, particularly from MLS students. The people who can easily parse XML in programming languages are not generally writing graduate school papers.

Labels: ,

Monday, February 02, 2009

Microsoft Songsmith: The only blog post you need to read.

The ascending hilarity around Microsoft Songsmith is a bit far from our usual topics here. I could attempt to connect it to social networks, open data and virtues like experiment, remixability and authenticity, but I think I should just shut up and let you enjoy three videos on the topic.

1. The Microsoft Songsmith promo video. It makes me want to turn myself inside out like a slug in beer. What does it do to you?



Hat-top: TechCrunch.

2. White Wedding, redone in Microsoft Songsmith.



Hat-tip: Mashable, with many more. "Beat-it" is also wonderful.

3. Economic Failure Medley by Microsoft Songsmith. Melodies from stock charts, proving yet again how the web can spin gold from tin.



Hat-tip: Mashable

Labels: , , , ,

OSC gets the once-over at ALA in Denver

As most of you know, back in July the Open Shelves Classification was conceived as a free, crowdsourced alternative to the Dewey Decimal System. The Group has been very active during initial development, and the top levels are being heatedly debated.

David, Tim and I held an OSC open-discussion at the American Library Association (ALA) conference in Denver. A great group of people participated in a lively debate about the project.

To summarize:

There was some room confusion with the Marriott and, unfortunately, many people left before it was all figured out.

10 people attended: Tim, Laena, David, a mix of public librarians, academic librarians, and one interested non-librarian. The librarians were catalogers, reference librarians, and one library director.

Comments during the meeting included:

The random works feature is not that useful because half of all the works are fiction and fiction is not broken out at the top level.

If a public library may reasonably want to aggregate at a certain level (e.g. fiction or science) then it should exist as a top level. No one aggregates at non-fiction, hence it is not useful.

Working on the second level for fiction should happen sooner rather than later.

Children’s books are a challenge.

Perhaps using an audience facet would help (for example, CH, YA)?

  • Yes, but the topics of some books are hard to determine. Should they be put in fiction? If so, a scope note is needed.
  • Speaking of which, there is no good way of dealing with series when written by separate authors, like Spongebob Books.

How should series be handled in a classification?

The Darien library is reorganizing their collection, particularly children’s books, in interesting ways (here, you can listen to Gretchen Hams tell you all about it).

For OSC to be successful, it must be easy to implement for public libraries.

It must be inexpensive to go from DDC-OSC.

A crosswalk is essential!

  • There needs to be a way to determine how much space is needed ahead of time to move the books around.
  • It must be easy to print labels.
  • Backstage Library Works was a company that moved Duke University Libraries from DDC to LC, so there must be models out there on how to do this.

An audience facet would be a good way to handle reading level as well, either by grade or age.

  • Example: 0-1, 1-3, 9-10, etc.
  • There is a tension between having too many optional facets and universality.

The facets need to transcend stickering, the current practice in most public libraries.

We need a reality check before getting to far down the road with proposed schedules for OSC-- will it work in an actual library?

  • We could upload a library’s MARC records into LT and try it there virtually before asking a library to use it.
  • Two potential public libraries were listed as testing grounds.

So far, the top levels testing on LibraryThing has provided the following results:

  • 56641 acts of classification
  • By 1000+ users
  • On 22,000+ works

What are the biggest tags in LibraryThing, can we use those to determine the levels?

  • They were looked at and evaluated, hence True Crime is a top level.
  • This can’t really be done in an automated way.

What is the product plan for OSC?
  • The data is open source & free.
  • If people want to package services around the data (such as reclassifying books for you), then that is a possibility, but we do not see this developing for at least a year or so.

What does “shelf-ready” mean?

  • A vendor puts on labels, dust jackets, tattle tape, creates catalog records for a public library.
  • Different people at the meeting had differing levels of success with outsourcing their books to be made shelf-ready by vendors.

Is bleed over between categories in OSC a bug or a feature?

  • Memoirs/Autobiographies was seen as a bug.
  • Others such as Pets/ScienceAnimals were not seen that way.

Putting categories in an order may help people’s confusion of where to put things.

  • This is called “flow” in bookstores.
  • E.g. Cooking—Health—Sports or Biography—History—Poly Sci

Confusion arose over facets.

  • You add and delete depending on the libraries needs.
  • Huge collection? Use them all. Small and only need Science and Religion? Go ahead, the system is flexible.

The top level testing will stop and the levels will begin to be re-worked this week.

How should Art, Architecture, Design, and Photography be handled?

  • After much discussion, the consensus was reached that Art, Architecture, and Design should be separate top level categories, but that Photography would go under Art.
The first test round has been closed. Visit the Open Shelves Classification group for details.

Meeting like this was great and very helpful in making OSC usable. Another meeting is planned in New York for early April--we'll keep you posted!

Labels: ,

Sunday, February 01, 2009

Right this way for the homophily, sir.

Over on the main blog I posted a long-ish blog post on "homophily" and serendipity. I should have posted it on Thingology instead (the main blog tends to focus on feature development and other, more concrete issues). But people have commented on it and made links, so I can't move it.

Check out the post here.

Labels: ,

The evil 3.26%

The question has arisen of why I advocate against OCLC's attempt to monopolize library data. Roy Tennant of OCLC, an intelligent, likeable man whom, although we disagree on some issues, has done more for libraries than most, accused me of writing and talking about the issue because:
"... your entire business model is built on the fact that you can use catalog records for free that others created and not contribute anything back unless they pay (yes, there is a limited set of data available via an API, but then they need the chops to do something with it)."
Fair enough. Let's look at the numbers, and the argument.

I did a comprehensive analysis, available here as a text file, with both output and PHP code. If anyone doubts it, send me an email and I'll let run the SQL queries yourself.

The numbers. As of 6:17pm Sunday, some 3.5 years after LibraryThing began, our members have added 35,831,904 books from 690 sources:
  • 85.48% came from bookstore data (almost exclusively Amazon).
  • 4.88% were entered manually by members
  • 9.63% were drawn from library sources
Now, where did that 9.63% come from?

These sources were in every case free and open Z39.50 connections our members accessed through us. Very frequently they accessed records of their own academic institution, but in any case, these members accessed these records alongside everyone else—libraries, museums, public agencies of one sort or another and all the students and scholars who use RefWorks, EndNote and other such services. Meanwhile LibraryThing has never been asked to stop accessing a source. On the contrary, libraries frequently ask to include themselves on our list of sources.

Of the 9.63%, by far the largest source is the US Library of Congress, the source of 2,203,182 books, or 6.15% of the total. The Library of Congress is a Federal organization, created for the benefit of the country and falling under the government-wide rule that public work is for the benefit of the public, and cannot be copyrighted or otherwise "owned." As long as technology was there the Library of Congress has allowed access to its cataloging data; the OCLC policy change will not affect that.* We are grateful the Library of Congress does this. But insofar as we are taxpayers and support American notion of public ownership of public resources, I will not apologize for it. (On the contrary, I feel that OCLC should apologize for attempting to restrict and profit from public work.)

3.26%. That leaves 3.48%—more appropriately 3.26%**—the evil sliver upon which our "entire business model is built." Take a look at the top fifteen here:
  • Koninklijke Bibliotheek — 130,406 books (0.36%)
  • National Library of Scotland — 80,826 books (0.23%)
  • British Library (powered by Talis) — 80,205 books (0.22%)
  • Gemeinsamer Bibliotheksverbund (GBV) — 77190 books (0.21%)
  • National Library of Australia — 72,896 books (0.2%)
  • Helsinki Metropolitan Libraries : 70,551 books (0.2%)
  • The Royal Library of Sweden (LIBRIS) : 63,430 books (0.18%)
  • Italian National Library Service : 60,643 books (0.17%)
  • Vlaamse Centrale Catalogus : 58,936 books (0.16%)
  • LIBRIS, svenska forskningsbibliotek — 54,339 books (0.15%)
  • ILCSO (Illinois Libraries) : 28,517 books (0.08%)
  • Yale University : 26,885 books (0.08%)
  • Det kongelige Bibliotek : 24,564 books (0.07%)
  • University of California : 20,098 books (0.06%)
  • Bibliotek.dk : 19,628 books (0.05%)
With 690 possible sources, it's a long, long tail. We take 2087 from the Russian State Library, 1067 records from the Magyar Országos Közös Katalógus, 286 from Princeton, 106 from Koç (in Izmir), 63 from Hong Kong Baptist, 4 from the Universidad Pública de Navarra, etc.

It should be apparent to anyone looking at the above that the 3.26% is largely about satisfying the needs of foreign LibraryThing members--a small percentage of our membership and hardly central to our "business model." Equally clear is the government orientation of the list—only one, Yale—is a private institution. The rest are all government agencies. Of course, no records actually came from OCLC itself!

All-in-all, library data from non-federal sources is a negligible component of LibraryThing's content. LibraryThing is not some big plot to capture library records. That idea is simply not in the figures.

Do we give back? What of the second half of the accusation, that we "not contribute anything back unless they pay" and the bit against APIs.

First, assuming Roy means LibraryThing data generally, it's absurd to suggest that because LibraryThing draws 3.26% of its data from free, unlicensed sources, our members' data and services are owned by OCLC or its members. OCLC no more owns members' tags and reviews on bibliographic metadata than Saudi Aramco owns the furniture I bring home in my car. Who in their right mind would every accept a list of titles and authors from a library, if that meant ceding ownership over what you think about the book?

LibraryThing and OCLC both have terms. But LibraryThing license terms are unlike OCLC's in a number of ways. LibraryThing members knew what they're getting, unlike OCLC members, who thought they were sharing with other libraries, but find themselves the lynchpin of a monopoly. From our inception LibraryThing has reserved a right to sell aggregate or anonymized data. We also sell some reviews—giving members the option to deny them to us. All our member data is non-exclusively licensed, so members can do anything they want with it outside of LibraryThing, and members can leave at any time. Neither is true of OCLC members' data under the Policy.

Cataloging data. That leaves LibraryThing cataloging data, of which we have three types. We don't have any legal responsibility to make it free, but we do so anyway.

First, we would be happy to offer downloads of original or modified MARC records! We haven't done so in order to avoid attracting a suit from OCLC. But perhaps we were mistaken. If OCLC would like us to start releasing our MARC records to others, someone should let us know. We will release them under the same terms they were given to us—freely.

Second, our Common Knowledge cataloging (series, awards, characters, etc.) is free and available to all. We can't think of a better way to provide it other than through an API, but we're all ears if Roy knows of a better way. And if OCLC would like to admit it to WorldCat, without subverting its always-free license, they don't even need our permission. Go on, OCLC, make my day!

Thirdly, there's ThingISBN, which was directly patterned on OCLC's xISBN service. Despite Roy's criticism, they are identical in format and delivery so if there's something wrong with its XML APIs, OCLC has only itself to blame. Indeed the only difference is cost: ThingISBN is completely free, both as an API and as a feed; xISBN, which member data creates, is sold back to members.

Stop killing the messenger. It's time for OCLC to recognize they made this mess, not others. They have perpetrated some astouding missteps—from attempting to sneak through a major rewrite of the core member policy in a few days without consultation, to a comic series of rewrites and policy reversals, culminating in withdrawing the policy entirely for discussion. (It now seems clear they did so on the heels of a member revolt, whether general or just of some key libraries.)

It's also important to see that, before OCLC started threatening companies and non-profits doing interesting but non-competing things with book data—notably LibLime, Open Library and LibraryThing—they had none of the problems they have now. Now, by attempting to control all book data, they've spurred the creation of LibLime's ‡Biblios system, a free, free-data alternative to OCLC and, well, sent me, Aaron Swartz of Open Library and dozens of prominent library bloggers into orbit.

Being caught so flat-footed can't feel nice. It must be hard feeling like royalty and discovering your subjects think themselves a confederacy. But this is no time for OCLC to start attacking the credibility of its opponents. Surely LibraryThing is an unusual case—a company that has an opinionated, crusading—okay, loud—president. But the thousands of librarians and other individuals who supported our calls, or raised other objections to the OCLC policy are not less well-motivated than OCLC and its employees. They do not love libraries less. They are, rather, concerned that OCLC's urge to control library metadata threatens longstanding library traditions of sharing, and sets libraries on a path of narrowness and restriction that will surely prove no benefit in this increasingly open, connected world.

*I need to write a blog post on this, but I was recently informed that whatever changes OCLC makes cannot touch federal libraries without explicit authorization. That is, federal law does recognize clauses like "if you continue to use" or "we can change this at any time."
** It should more accurately be 3.48%, because we are getting our British Library records through Talis, who have a contract with the British Library.

Labels: