Network Suffers Heavy Instability
The Faculty of Arts and Sciences (FAS) computer network experienced long periods of instability yesterday, stranding thousands of users without access to e-mail or the Web.
At the time of publication, Harvard Arts and Sciences Computer Services (HASCS) officials had not yet pinpointed the problem but predicted the network would be up and running again by this morning.
"The working assumption here is that by the morning, the system will be stabilized," said Kevin S. Davis '98, the coordinator for residential computing at HASCS.
Davis said the FAS network started experiencing huge spikes of activity at 3:45 p.m. yesterday through one of its core routers, a computer that directs most of the Internet traffic over the FAS network.
The network was so unstable that HASCS engineers were unable to diagnose whether the problem was hardware or software related.
"The load levels are so high they can't even do diagnostics on it," Davis said.
A group of five HASCS network engineers were replacing the router with a more advanced model early this morning, in the hopes that the higher capacity device would allow them to stabilize and diagnose the problems with the network.
HASCS officials said yesterday's network problems were the worst in recent memory.
"We've had our fair share of isolated issues, but it's been a long time since we've had something this widespread and this permanent," said Rick Osterberg '96, a database applications specialist for HASCS.
Osterberg did not rule out the possibility of an attack on the FAS network, but said he thought the length of time that the network was down indicated something more fundamental was wrong.
"This sounds like a Cisco router issue, either software or hardware gone haywire," he said.
Davis said HASCS engineers would be working on the problem throughout the night--and perhaps into today--until the network was up and running again.
"We realize that our services need to be available 24 hours a day, so when something goes down we're here until it gets fixed," he said.
Davis said HASCS's immediate priority would be to get the network up and running again, and finding out what went wrong would come later.
"At this point, the focus is not on finding out why things failed," he said. "Right now, the most important thing is to get systems up and running."