Saturday, May 13, 2006

Universal library

Nice article in the Times updates us on how various book digitizing projects are coming along. As usual, lawyers gumming up progress :-)

I look forward in a few years to storing 10 million books (equivalent to, e.g., Widener Library at Harvard) plus an image of the entire Web on the terabyte drive of my laptop. This is completely feasible technically -- the main barriers are legal and economic. Sadly, this does suggest that I have more faith in storage technologists than superfast broadband roll out. If we had really good broadband I wouldn't have to carry any data around with me on my laptop!

See previous post with more numbers here.

Like many other functions in our global economy, however, the real work has been happening far away, while we sleep. We are outsourcing the scanning of the universal library. Superstar, an entrepreneurial company based in Beijing, has scanned every book from 900 university libraries in China. It has already digitized 1.3 million unique titles in Chinese, which it estimates is about half of all the books published in the Chinese language since 1949. It costs $30 to scan a book at Stanford but only $10 in China.

Raj Reddy, a professor at Carnegie Mellon University, decided to move a fair-size English-language library to where the cheap subsidized scanners were. In 2004, he borrowed 30,000 volumes from the storage rooms of the Carnegie Mellon library and the Carnegie Library and packed them off to China in a single shipping container to be scanned by an assembly line of workers paid by the Chinese. His project, which he calls the Million Book Project, is churning out 100,000 pages per day at 20 scanning stations in India and China. Reddy hopes to reach a million digitized books in two years.

The idea is to seed the bookless developing world with easily available texts. Superstar sells copies of books it scans back to the same university libraries it scans from. A university can expand a typical 60,000-volume library into a 1.3 million-volume one overnight. At about 50 cents per digital book acquired, it's a cheap way for a library to increase its collection. Bill McCoy, the general manager of Adobe's e-publishing business, says: "Some of us have thousands of books at home, can walk to wonderful big-box bookstores and well-stocked libraries and can get Amazon.com to deliver next day. The most dramatic effect of digital libraries will be not on us, the well-booked, but on the billions of people worldwide who are underserved by ordinary paper books." It is these underbooked — students in Mali, scientists in Kazakhstan, elderly people in Peru — whose lives will be transformed when even the simplest unadorned version of the universal library is placed in their hands.

...Just as a Web article on, say, aquariums, can have some of its words linked to definitions of fish terms, any and all words in a digitized book can be hyperlinked to other parts of other books. Books, including fiction, will become a web of names and a community of ideas.

Search engines are transforming our culture because they harness the power of relationships, which is all links really are. There are about 100 billion Web pages, and each page holds, on average, 10 links. That's a trillion electrified connections coursing through the Web. This tangle of relationships is precisely what gives the Web its immense force. The static world of book knowledge is about to be transformed by the same elevation of relationships, as each page in a book discovers other pages and other books. Once text is digital, books seep out of their bindings and weave themselves together. The collective intelligence of a library allows us to see things we can't see in a single, isolated book.

When books are digitized, reading becomes a community activity. Bookmarks can be shared with fellow readers. Marginalia can be broadcast. Bibliographies swapped. You might get an alert that your friend Carl has annotated a favorite book of yours. A moment later, his links are yours. In a curious way, the universal library becomes one very, very, very large single text: the world's only book.

...So what happens when all the books in the world become a single liquid fabric of interconnected words and ideas? Four things: First, works on the margins of popularity will find a small audience larger than the near-zero audience they usually have now. Far out in the "long tail" of the distribution curve — that extended place of low-to-no sales where most of the books in the world live — digital interlinking will lift the readership of almost any title, no matter how esoteric. Second, the universal library will deepen our grasp of history, as every original document in the course of civilization is scanned and cross-linked. Third, the universal library of all books will cultivate a new sense of authority. If you can truly incorporate all texts — past and present, multilingual — on a particular subject, then you can have a clearer sense of what we as a civilization, a species, do know and don't know. The white spaces of our collective ignorance are highlighted, while the golden peaks of our knowledge are drawn with completeness. This degree of authority is only rarely achieved in scholarship today, but it will become routine.

Finally, the full, complete universal library of all works becomes more than just a better Ask Jeeves. Search on the Web becomes a new infrastructure for entirely new functions and services. Right now, if you mash up Google Maps and Monster.com, you get maps of where jobs are located by salary. In the same way, it is easy to see that in the great library, everything that has ever been written about, for example, Trafalgar Square in London could be present on that spot via a screen. In the same way, every object, event or location on earth would "know" everything that has ever been written about it in any book, in any language, at any time. From this deep structuring of knowledge comes a new culture of interaction and participation.

The main drawback of this vision is a big one. So far, the universal library lacks books. Despite the best efforts of bloggers and the creators of the Wikipedia, most of the world's expertise still resides in books. And a universal library without the contents of books is no universal library at all.

There are dozens of excellent reasons that books should quickly be made part of the emerging Web. But so far they have not been, at least not in great numbers. And there is only one reason: the hegemony of the copy.

2 comments:

Anonymous said...

"I look forward in a few years to storing 10 million books plus an image of the entire Web on the terabyte drive of my laptop"

Why not just put it on your iPod?

Steve Hsu said...

Yes, iPod beats laptop (and will be feasible), but I/O is kind of a problem. How will I do my linking and meta tagging (or just reading)?

Perhaps the iPod will have a projection screen and keyboard.

Blog Archive

Labels