Former Defense Department General Counsel Appointed Harvard’s Top Lawyer


Democracy Center Protesters Stage ‘Emergency Rally’ with Pro-Palestine Activists Amid Occupation


Harvard Violated Contract With HGSU in Excluding Some Grad Students, Arbitrator Rules


House Committee on China to Probe Harvard’s Handling of Anti-CCP Protest at HKS


Harvard Republican Club Endorses Donald Trump in 2024 Presidential Election

Digitizing Knowledge

Harvard-supported Digital Public Library of America looks to share intellectual wealth of top research libraries

The Digital Public Library of America hopes to digitize the vast troves held by Harvard's libraries.
The Digital Public Library of America hopes to digitize the vast troves held by Harvard's libraries.
By Gautam S. Kumar and Sirui Li, Crimson Staff Writers

In a secure glass case in the Memorial Room at the heart of Widener Library—the centerpiece of Harvard’s system of 80 libraries and 16 million volumes—is Harvard’s Gutenberg Bible.

The Gutenberg Bibles, with 21 copies in the world, were the first major volumes printed with a movable type printing press, and they are said to have immeasurably transformed the history of printing.

But 560 years later, books may undergo one of the most aggressive transformations since the printing of those Bibles.

A core of Harvard professors are currently leading a national endeavor to digitize every book in the world—including the Gutenberg Bible—and make them available to every American through an internet-based service called the Digital Public Library of America.

The DPLA, its leaders say, will place the resources of top research libraries in the world in the pocket of every American. They foresee a time when some of the most scarce books are easily accessible to “independent scholars” on their home desktops and smart phones.

“In human history, the concept of a digital library is potentially the greatest revolution in learning since the creation of the university,” says Jim Leach, current chairman of the National Endowment for the Humanities, an agency that represents federal concerns for the advancement of knowledge.


Harvard University Librarian Robert C. Darnton ’60 has an office in the yellow Wadsworth House, constructed in 1726 and home to a long line of the University’s former presidents. His walls are lined with bound volumes, some dating back to the French Revolution, worn but in good condition.

But in conversation, his words project a future in which books are immortalized in ones and zeros.

Darnton is heading up DPLA, a venture that began in October 2010 when he called together leaders from libraries, universities, and the technology world to discuss the possibility of putting all the volumes from America’s biggest research collections online.

Teaming up with Law School Professor John G. Palfrey Jr. ’94 at the Berkman Center for Internet and Society, Darnton has become the face of a cadre of Harvard professors supporting the project.

The online collection, Darnton says, will build a bridge between people across the country and the treasures that are the nation’s top libraries.

“In my view, the Harvard library is so great that it’s a national asset,” Darnton says. “We wanted to reach out to say it’s meant for American citizens and not just college professors.”


Darnton, former director of the Harvard University Library, is coming from a perspective of change.

An active member of the former Library Implementation Work Group, he contributed to the most drastic recasting of the University library system in decades.

The reform will consolidate the many libraries at Harvard under a single administrative body, reducing barriers to facilitate access to the Univesity’s full collection.

“It has evolved over the years to truly be complex and labyrinthine,” says University Provost Steve E. Hyman, who oversaw the library redevelopment. “The goal is to make a great library better, to make it more efficient,” he says.

The new structure appoints a Harvard Library Board—a rotating collection of professors from each of the University’s schools—and installs an executive director of the library system.

Helen Shenton, who will serve as the first executive director, says that her charge means that she will coordinate across the University.

“We’re going in a collaborative direction,” Shenton says.

Shenton says Darnton has played a key role in the transformation of the libraries, calling him, as University Librarian, the “intellectual voice of the Harvard Library.”


During his tenure, Darnton has already overseen the incorporation of some of the most advanced technology in the field as the library has sought to conserve its most valued volumes.

The library also began preserving its texts in digital form through state-of-the-art scanning machines, taking its first steps into the digitizing its books.

It’s an intensive process—in costs, personnel, and hours.

An archivist must go page-by-page—though some scanning technology can turn the pages automatically—as a camera scans all the images and text on an open spread of a book.

According to Darnton, the scanning of a vast collection like Harvard’s can run in the millions of dollars.

So it seemed like a great deal to Harvard library administrators when Google Inc. asked the University for permission to digitize its books, and would do it for free.


In 2005, the Google Books Library Project approached the Harvard University Library with a proposition. Harvard would join four other research libraries in scanning its books for what Google hoped would become an international online collection under the corporation’s control.

Harvard agreed to take part, and Google proceeded to scan 850,000 of Harvard’s books in the public domain.

The Google Books Library Project represented the most ambitious mission to date to create an online collection. A pet project of Google co-founder Larry Page, it aimed to scan every book in existence.

But Darnton says he began to question what he considered the program’s sloppy workflow. Some famous works were miscategorized—Walt Whitman’s “Leaves of Grass” was filed under “Gardening,” he says—and unanticipated costs popped up on Harvard’s balance sheet.

“We spent $1.9 million on the ‘free’ project,” Darnton says.

While scanning is costly, it accounts for only part of the associated costs. For every volume requested by Google, library staff members had to check out the book and make sure it was strong enough to endure the transportation, meticulously cataloging its journey.

Darnton says that when Google pressed for copyrighted books, it crossed a line that would fracture the partnership and set the stage for DPLA.

In October 2008, following a $125 million settlement between Google and the Authors Guild that extended the life of the program, Harvard backed out. Citing legal risks involved in digitizing copyright protected materials, the University announced it would not continue its partnership with Google.

The company spent the next two years defending the new settlement against Justice Department inquiries into possible anti-trust violations.

But the courts said no. In his March 2011 decision, Judge Denny Chin wrote in a statement that the Google Books Library Project would represent a “significant advantage over competitors, rewarding it for engaging in wholesale copying of copyrighted works without permission.” Google is appealing the court decision.

But Harvard faculty and the DPLA steering committee were already underway with the Google Books Library Project’s little cousin.


As a scholar of the French Revolution, Darnton says he uses “the word ‘revolutionary’ with caution.”

But when referring to the DPLA project, Darnton genuinely calls its mission as “revolutionary as during the Gutenberg period.”

After securing a donation from the Sloan Foundation in December 2010, the DPLA had collected enough capital to enter its planning phase. And with the Library of Congress and Smithsonian Institution’s quick support, the DPLA has garnered a formidable pool from which to draw.

In a March op-ed in the New York Times following Google’s legal setback, Darnton optimistically called on the corporation to contribute the 2 million scanned books in the public domain—of 15 million total—to the DPLA.

“Perhaps Google itself could be enlisted to the cause of the digital public library,” he wrote.

The company has made no such move, but in a statement to The Crimson, Google praised the “vast undertaking” of moving the world’s books online.

“Digitization and preservation initiatives, such as the Digital Public Library of America project, are important for making these books accessible online,” the statement says.

DPLA leaders acknowledge that they are following in the footsteps of the Google Books initiative, but they say they have a very different mission in mind.

Unlike Google’s project, DPLA is non-profit organization—envisioned as an online public library.

And the distinction, those familiar with the project say, could mean a world of difference in the legal sphere.

“Google Books wasn’t a digital library. Google Books was a digital bookstore,” says Maria A. Pallante, acting register of copyrights in the Library of Congress.


DPLA has confined itself, in its early stages, to the collection of works in the public domain, hoping for future legislative reform that will open up access to copyrighted material.

Currently, these books represent only a sliver of world literature: Though most works published before 1923 are in the public domain, for works produced after 1923, U.S. copyright law severely limits the available pool.

But DPLA leaders have their sights set on a far larger class of works. Looking for new legislation that will help the online collection include all copyrighted out-of-print works, DPLA leaders say that this reform will allow them to make millions of more books accessible to the public.

If Congress authorizes the type of collective licensing in question, DPLA could move to obtain the rights to works on an entirely new scale, no longer necessitating collectors to examine individual works.

“Collective licensing, this is the piece that I think people are sad to see go from the Google case. This is something that Congress has to legislate,” Pallante says.

While Google failed to forge a legal path—coming under fire for alleged anti-trust violations—DPLA organizers hope that, as a non-profit public library, the law will change for them in the future.


DPLA is only entering the beginning phases of its execution, and members involved with the project are quick to emphasize that fact.

In September, a beta version of the web interface will go live, and DPLA will begin an 18-month implementation phase in the fall.

“My best hope is that even if it won’t turn out what we exactly expect now, it will turn out to be something extraordinary,” says David Weber, vice president of programs of the Alfred P. Sloan Foundation.

Three months before users are able to peruse the library’s collections, Darnton discusses the project with obvious practice.

But he is careful to stress that this is not Harvard’s “pet project”—rather, it is a mission in which Harvard, as the largest private library system in the world, should participate.

“Part of this is being able to share the extraordinary resources of our libraries much more broadly through digitization processes that we’ve done some of in the last decade or so,” University President Drew G. Faust says. “I think ten years from now ... sitting here, we would be astonished at what had happened.”

—Staff writer Gautam S. Kumar can be reached at —Staff writer Sirui Li can be reached at

Want to keep up with breaking news? Subscribe to our email newsletter.


Related Articles

Book Digitization