thingISBN data in one file
thingISBN is a simple API for discovering related editions. Give it an ISBN and it returns a list of other ISBNs—different formats, translations, etc. We offer the API free for non-commercial use. Today we're releasing thingISBN in one giant feed, under the same conditions.*
thingISBN is based on LibraryThing's first-of-its-kind "work" system, by which regular people—LibraryThing members, mostly—combine and separate editions. Members run over 2,000 work-combination actions per day. Although some do it for pure altruism, combining editions helps LibraryThing users by improving the quality of their connections.
LibraryThing's results compare very favorably with its competition, OCLC's xISBN service (also free for non-commercial use). xISBN's coverage is better, but where LibraryThing is built on the collective judgment of humans, xISBN is just a computer algorithm. As the fella says, xISBN is "based on a world which is built on rules and because of that, [it] will never be as strong or as fast as [thingISBN] can be."**
APIs, while nifty, can be a pain. Both thingISBN and xISBN have a 1,000-per-day limit. So, starting today, thingISBN is also available in feed format—one giant XML file with all the data from over two million unique ISBNs.
Here's a sample file with just 1000 ISBNs:
http://www.librarything.com/feeds/thingISBN_small.xml
As you can see, the format is not ISBN-to-ISBNs. This would involve too much repetition—the full XML file is already 96MB! Instead, it goes work by work, listing the ISBNs inside them:
So, you can use the "uncertains" if you are willing to accept more errors. Otherwise, ignore them.
The feed itself is in http://www.librarything.com/feeds/ and is called "thingISBN.xml.gz". It is 16MB compressed.
We'd love to hear what people are doing with the data.
*Commercial use requires our permission. See http://www.librarything.com/api.php.
**Okay, the comparison in inexact, but OCLC does have a "Matrix" feel to it.
thingISBN is based on LibraryThing's first-of-its-kind "work" system, by which regular people—LibraryThing members, mostly—combine and separate editions. Members run over 2,000 work-combination actions per day. Although some do it for pure altruism, combining editions helps LibraryThing users by improving the quality of their connections.
LibraryThing's results compare very favorably with its competition, OCLC's xISBN service (also free for non-commercial use). xISBN's coverage is better, but where LibraryThing is built on the collective judgment of humans, xISBN is just a computer algorithm. As the fella says, xISBN is "based on a world which is built on rules and because of that, [it] will never be as strong or as fast as [thingISBN] can be."**
APIs, while nifty, can be a pain. Both thingISBN and xISBN have a 1,000-per-day limit. So, starting today, thingISBN is also available in feed format—one giant XML file with all the data from over two million unique ISBNs.
Here's a sample file with just 1000 ISBNs:
http://www.librarything.com/feeds/thingISBN_small.xml
As you can see, the format is not ISBN-to-ISBNs. This would involve too much repetition—the full XML file is already 96MB! Instead, it goes work by work, listing the ISBNs inside them:
<work workcode="183">This format should go into a database well, e.g.,
<isbn>0802150845</isbn>
<isbn>0802143008</isbn>
<isbn>2020006014</isbn>
<isbn>0745300359</isbn>
<isbn>0394179900</isbn>
<isbn>9867574397</isbn>
<isbn uncertain="true">999107371X</isbn>
</work>
CREATE TABLE isbn_to_work (As you can see, some ISBNs are listed as "uncertain." This happens when an ISBN crosses works. In a perfect world, these works would be combined, but LibraryThing doesn't do it automatically. There are a couple ways that can go wrong. For example "great books" sets often sport a single ISBN across volumes. It wouldn't do to combine "Pride and Prejudice" with "Moby Dick" just because their publisher wouldn't pony up for two ISBNs.
itw_workcode mediumint(8) unsigned NOT NULL,
itw_isbn char(13) NOT NULL,
itw_uncertain tinyint(4) NOT NULL default '0',
PRIMARY KEY (itw_workcode,itw_isbn)
)
So, you can use the "uncertains" if you are willing to accept more errors. Otherwise, ignore them.
The feed itself is in http://www.librarything.com/feeds/ and is called "thingISBN.xml.gz". It is 16MB compressed.
We'd love to hear what people are doing with the data.
*Commercial use requires our permission. See http://www.librarything.com/api.php.
**Okay, the comparison in inexact, but OCLC does have a "Matrix" feel to it.
8 Comments:
Tim,
This is excellent news, just what I need for a desktop z39.50 client app I've been writing.
I love the open data attitude that LibraryThing and Index Data are demonstrating (see: Index Data launches Open Content service). It is very much appreciated.
--bill (Simmons MLS student)
Great stuff! How often will the big file be update? Is it a daily dump?
-- William Denton
At whim?
In case you're curious, the database (excluding the uncertains) will be almost two million rows. However, if you're only interested in finding alternative editions for books and thus don't need the works with just one ISBN to them, it's just 800000 rows.
Anyway, this is a very useful tool. Thanks!
Is there an API that returns tags for a work, or tags for a user for a work? Is there one that returns all the books in a user's collection?
Not yet. There should be one for individual users.
hi Tim
love it
What do you mean the libraryThing's user combine and separate editions.
I am a libraryThing's user, where and how can I do it?
thanks
Post a Comment
<< Home