Harvard’s Office of Technology Development recently licensed a new technology for analyzing large quantities of unstructured text to Crimson Hexagon, Inc., a start-up co-founded by Government Professor Gary King last year.
King, who developed the technology with a team of researchers at the Institute for Quantitative Social Science, said in an e-mail that the algorithm can can sort through thousands of blogs, books, articles and other sources of information in real time and extract a common opinion.
In 2007, King founded Crimson Hexagon with Candace Fleming—a Harvard Business School graduate—and serves as its “Chief Scientist,” according to the company’s Web site.
King said that the technology, referred to as “readme” by its developers, arose from a previous project that the team had worked on for the World Health Organization that analyzed causes of death around the world in order to better allocate health resources.
In the process, they developed what they call a “fully automated procedure” that required no personal judgment from physicians and factored in statistical uncertainties.
This objective technique, King said, proved useful for another project—interpreting opinions expressed about U.S. presidential candidates in political blogs.
For this project, the team needed a technique that could analyze up to 185 million blogs throughout the world everyday, tracking changes in opinion.
“We decided to use our own software,” said Matthew L. Knowles ’07, a Harvard Law School student who helped code the algorithm.
After adopting the algorithm to analyze text on political blogs, Knowles wrote, the team realized the software could have broader appeal for the general public.
“I knew that other people could use our software for content analysis on the Internet,” he wrote.
Several companies have already been using the technology to analyze customer reaction to their products and marketing tactics, King added.
In the short term, King said, he believes companies will continue to use “readme” technology through Crimson Hexagon for use as a marketing tool.
But King said he hopes that in the future, clients will be able to apply the technology more broadly—“from the microblog to the data warehouse.”
“Every 15 minutes an amount of information appears on the Internet equivalent to all the information in the Library of Congress, and a lot of this is unstructured text,” he wrote. “There would appear to be many potential applications on the horizon.”