Banned User Abused Factiva

Business school blocks user who downloaded 5 million articles from web service

Since July, a data-hungry user has downloaded from Factiva over 5 million articles, an amount so excessive that it jeopardized the University’s contract with the popular online research service. Yesterday, library administrators at Harvard Business School (HBS) blocked the conspicuous Harvard-network IP address from accessing Factiva and notified the suspected offender of the infringements.

The mystery user downloaded an average of 55,000 documents per day, according to Lydia Petersen, a content manager for HBS’s Baker Library. The user retrieved the documents at a rate as high as four per second, which led Factiva and library officials to believe that an automated script controlled the downloads. The use of such a script is prohibited by Factiva.

Under normal conditions, users across the entire University download a total of just 1,000 Factiva articles per day.

A Factiva representative notified Petersen in late July about the unusually high number of downloads coming from one server port at HBS. Petersen and other University library staff have been working with Factiva since August on the problem.

“We waited for the statistics to come in,” Petersen said. “We thought that it was just some fluke.”

But the excessive downloads persisted and eventually prompted officials to trace the IP address and block its access.

“We believe we have identified the person,” said Peter Kosewski, director of communications for the Harvard University Library. Kosewski said the library’s Office of Information Services notified the user of the violations.

He would not release any information about the user or how the identification was made.

Neither Kosewski nor Petersen said they suspect any purposeful wrongdoing. “There might be someone who is trying to repackage the articles [for sale], but I doubt that this is the case,” Petersen said.

Instead, Petersen said that the user might have been downloading articles for “text-mining,” a research method that uses complex natural language processing to extract information from, and identify patterns in, large aggregations of text.

“Text-mining is increasingly becoming an legitimate research method,” Petersen said. “Vendors and academicians are going to have to come to some understanding about this use. Academicians need these texts to do this kind of research, but at the same time vendors need to protect their intellectual property.”

Factiva is an important resource for HBS students because it provides extensive access to industry and trade publications, and is Harvard’s only access point to the Wall Street Journal’s digital archive.

“There is no particularly good alternative to Factiva,” Petersen said. “Content in LexisNexis Academic is quite limited, certainly when you get to the corporate world.”

Harvard pays a undisclosed flat rate to Factiva for usage across the University by a specific number of simultaneous users. But because the service is so valuable to business students, HBS has signed a contract with Factiva to provide its students with six additional access “seats.”

The Factiva customer representative for Harvard is on vacation and could not be reached for comment.

The University frequently investigates and addresses cases of excessive service use, according to Petersen.

“It’s not, unfortunately, a terribly unusual situation,” she said. “In most cases, we’ve been able to resolve these situations satisfactorily, once we’ve been able to educate our users about the terms of use.”

There are no explicit limits to how many articles a user can download, Petersen said, but Factiva reserves the discretion to terminate a contract at will.

“Essentially, they just cut you off and refuse to do business with you anymore,” she said. “That is not something we want to face.”

—Staff writer Jeremy S. Singer-Vine can be reached at jsvine@fas.harvard.edu.