A Visit to the Internet Archive

The Internet Archive HQ

A number of us from Ace Monster Toys went to visit the Internet Archive on April 1, 2011. The creator of the DIY Book Scanner, Daniel Reetz, had come into town to meet with people there on, you guessed it, book scanning. As always, he came by Ace Monster Toys when he arrived to take a look at the progress on our own book scanner. He invited me, Myles, Robbie, and others to go with him to the Internet Archive to check out the book scanners that they use for their ongoing book digitizing efforts.

Friday morning, I gathered everyone up in the trusty prius and we drove over to the archive in San Francisco. As you can see above, they’ve taken over a Christian Science church building, which gives the place the kind of regal air that you would expect of an institution dedicated to the public good. We had lunch with the archive’s employees and guests and got to hear a round of introductions and updates on what people had been working on or why they were interested in the archive. The diversity of people there, including visitors from two different universities (I believe), was pretty inspiring. They’d just moved their data center into the building and were excited to be retiring the last of the old servers that week.

After lunch, we were taken next door to check out their book scanning machines. Let’s just say that theirs were built on a whole different scale than our own. In comparison, here is the current image of the one we’ve been working on at AMT:

This is what they’re using at the Internet Archive (with the darkening covers pulled off):

As you can see, there is a little difference in sizing. Their design is much more robust that our own. Of course, other than our cheap cameras, ours has probably cost us $60 or $70 in parts and theirs, when first built many years ago, cost more than $10,000 each. These days, they can probably make them much more cheaply but they are looking to simplify their designs and to also make them much more portable. Our design, based on the work of Daniel and others, is table top and I carry it around the AMT space quite often as we work on it.

We were given a walk through of how they use their scanners. Myles and Daniel had a number of engineering questions as the weakest link in most book scanner designs is the platen that we use to hold pages spread for the photos. The Internet Archive’s design is elegant and a thing of beauty, as you can see, and we took many pictures so we can reference things as we try to figure out a way to make our own design better.

One of the things that Myles and I have spoken about a bit, and he’s been conferring with Daniel on this also, is coming up with practical ways to make DIY Book Scanner kits. These kits would include (potentially) the parts for the platen (which is the most complex), and plans and some parts for assembling the rest of the scanner out of standard materials that you can get at a hardware store. We’re still very much in the idea stage but it is goal in our thinking as we work on our scanner. We’re already planning to redo the platen on ours as soon as we figure out what we want to do as it is has some issues (though it works).

It was rather cool to get a chance to go out to the Internet Archive, see what they are doing, and meet a group of people that are really working to benefit everyone rather than enrich a series of investors. There is a phenomenal body of ephemeral materials in the world, be they books in local languages dating back hundreds of years, records in war zones, home movies and films, even television and other popular media of recent decades. Much of this material will be lost if people do not work to preserve it but it is the kind of thing for which universities have lost much of their funding. The Internet Archive seems to be pretty well focused on this problem, along with the ephemeral nature of the Internet.

I’ve put a photo set of the pictures that I took (mostly of their book scanners) up on flickr for those interested.