Tool Finds Trends in Google Books

A team of Harvard researchers has created a new tool that analyzes language patterns in published books to quantify cultural and historical trends from 1800 to 2000.

The innovative research tool made its debut yesterday in an article titled “Quantitative Analysis of Culture Using Millions of Digitized Books,” which was published online in the journal Science and launched as a feature on Google.

Dubbed “culturomics,” the tool enables the public to use the Google Books database as a “genome of culture,” according to Adrian Veres ’12, one of the article’s authors.

“Fundamentally speaking, you can see what society is interested in by tracking the frequency of the word,” Veres said. “The more common the word, the more important it is.”

Google’s launching of the online tool will allow users to type in a word or a phrase and see how their usage has changed over the past two centuries.

Google Books is a set of digitized texts that includes about 4 percent of all books ever printed. While more than 70 percent of the books are printed in English, the database also includes texts in French, Spanish, German, Chinese, Russian, and Hebrew.

This publicly accessible database provides a “quantitative aspect to the social sciences and the humanities that has never been paralleled before,” said Veres, adding that he believes published books provide important historical insights, as they reveal traces of culture as perceived by people.

“These books are a meaningful representation of what’s important at [a specific] time,” he said.

In addition to discussing how the tool was conceived, the article showcases the type of analysis it provides. For example, users can see how language and grammar have evolved, how lexicography and censorship have trended over time, and how individuals have garnered fame by elucidating the frequency of their names in published works.

Psychology Professor Steven Pinker, a co-author of the article, noted the academic significance of the tool, which will enable linguists to see how often and in what context certain words were used.

“The tool revolutionizes the humanities by answering questions about the influence of humans and ideas quantitatively,” said Pinker, who focused in the article on past tense and lexical ‘dark matter’—a term that describes infrequently used words that do not appear in standard dictionaries.

Led by two Harvard affiliates, Jean-Baptiste Michel and Erez Lieberman-Aiden, the team included researchers from Harvard, Google, Encyclopaedia Britannica, and the American Heritage Dictionary.

—Staff writer Jane Seo can be reached at janeseo@college.harvard.edu.

Tags