The story is as close to a feel-good fairy tale as one might imagine. The protagonists were novices, an unlikely team from Virginia Tech: a brilliant young computer scientist, a creative architect turned multimedia expert, a diligent aerospace engineer, a former technocrat in Gov. Doug Wilder's administration, and an assistant dean with a doctorate in curriculum and instruction. Their coaches: a former chief scientist for the San Diego Supercomputer and one of the developers of the Blacksburg Electronic Village (BEV). They entered a race in which there was already a "TOP500 List" of contestants--and they emerged as international heroes. The Tech team of underdogs outplayed every other university in the world, building the world's fastest supercomputer in higher education circles and the third fastest among all competitors. Equally amazing was the price tag, about one-tenth the cost of the average supercomputer. Unorthodox minds come together The game plan to create the supercomputer started, from a burgeoning friendship, in the spring of 2003. Although Srinidhi Varadarajan and Jason Lockhart (architecture '95) work in different areas of Torgersen Hall, their offices are only four doors apart, and the two men first met when they moved into the building, which is dedicated to multidisciplinary research, in September 2000. Varadarajan, who directs a small supercomputing facility, was a boy wonder in his homeland of India. At 15, he had designed one of the first marketed anti-virus computer software programs. Pace Computers sold his product, and the software, PC-Clinic, gained prominence when the Indian Institute of Management adopted its use. He was reared in an intellectual household, with his father a director of one component of India's space program and his mother a Sanskrit lecturer at the LD Arts College. Vararadajan moved to America in 1995 to pursue his doctorate in computer science at the State University of New York at Stony Brook. Afterward, he joined Virginia Tech and quickly became a National Science Foundation (NSF) CAREER Award winner. When he secured an additional NSF grant to upgrade the Virginia Tech facility he directed, he started brainstorming with Lockhart, who administers the College of Engineering's Multimedia Laboratory. Lockhart, whose father has worked for IBM since 1966, says he had been using DOS/Windows "since birth" and wasn't introduced to the Mac platform until he came to Virginia Tech to study architecture. He began working in the Multimedia Laboratory as an undergraduate and was named director in 1997. Lockhart's immediate supervisor, Glenda Scales (Ph.D. curriculum and instruction '96), who oversees distance learning and computing for the College of Engineering, also became involved. One of her goals is to foster a closer relationship with the University's information technology organization, managed by Erv Blythe of Blacksburg Electronic Village fame. Working for Blythe are Patricia Arvin, an associate vice president who enjoys strong ties to state government, and Kevin Shinpaugh (Ph.D. aerospace and ocean engineering '94). Scales' boss is Hassan Aref, the new dean of Virginia Tech's College of Engineering, a physicist by training, whose credentials included a stint as chief scientist at San Diego's supercomputing facility. All of these faculty members became part of the team to build a supercomputer, and they began to plan their strategy. Vararadajan's idea was to use off-the-shelf products to design a supercomputer that recorded a minimum of 10 trillion operations per second, or 10 teraflops. He targeted price/performance since he did not have the hundreds of millions of dollars available to him that it had taken to build the current top two supercomputers in the world. Japan's Earth Simulator, estimated to cost between $250 and $350 million, remains No. 1 at 35.86 teraflops. The Department of Energy's ASCI-Q, a dedicated weapons facility, stays at No. 2, operating at about 14 teraflops, with an estimated construction cost of $215 million. Racing against the clock
Vararadajan and Lockhart brainstormed. How could they achieve this incredible goal of building a 10 teraflop machine that would rank among the top five in the world, and do it in a race against time? They had about six months to make the Oct. 1 deadline for University of Tennessee Professor Jack Dongarra's annual TOP500 List of supercomputers. This listing would then constitute who was eligible to compete for funding from NSF's Cyber Infrastructure program, a new agency thrust area. Lockhart was an Apple Mac devotee. Varadarajan had yet to use one. The entire team decided to try partnering with Dell Computers. For two months, they worked together on the details of the plan, but, in the final hour, when papers were ready for signatures, Dell withdrew. Now it was mid-May. The team was devastated--for about 24 hours. Then, they quickly returned to work, had some talks with a few vendors, and settled in on Apple and its newly announced Power Mac G5. With the support of Aref and Blythe, a contingent flew to Cupertino, Calif., and met with Tim Cook, vice president of Apple. Although Apple had never played in the supercomputing arena, Cook promised the Virginia Tech team delivery of 1,100 G5s as they rolled off the manufacturing line in August. The race became much tighter. Arvin and Scales assumed stronger roles, establishing time lines, weekly internal meetings, and conference calls among Apple and the three other companies that contributed to the project: Mellanox Technologies, Emerson Network Power, and Cisco. The company representatives caught the excitement of the project, working hard to meet seemingly impossible deadlines. For example, Mellanox provided the primary communications fabric between the 1,100 G5s, but its product, InfiniBand, had never been used with Apple's Mac OS X on a Power Mac G5. And Mellanox was based in Israel where its engineers, several thousand miles away, worked day and night to execute their part of the project. Also requiring some ingenuity was Emerson's new rack-mounted cooling system, designed specifically for the Virginia Tech cluster--if the 36 tons of equipment in the 3,000-square-foot machine room had been cooled by traditional floor mounted systems, the winds generated would have exceeded 60 miles per hour. On the opposite side of the world from Mellanox was Japan's Kazushige Goto, who optimized the IBM processor, a PowerPC 970, used in the Apple G5 for the Virginia Tech team. Back in the states, Dhabaleswar K. Panda of Ohio State assisted with the communications language between the 1,100-node cluster. Lylah Sartin, the computing center's facilities manager, coordinated the construction crews as if she were building the arena for a U.S. Olympics event. She prioritized orders, cleared hurdles with town government, and even helped host a congratulatory party for the contractors who had worked double shifts for weeks to keep on target. The management team relied on student power. Volunteers worked assembly lines to unpack and process the G5s as they arrived, and were paid in pizzas and free T-shirts. Running at full speed, the students processed 238 machines in under two hours. "We were all dead tired. We worked on the physical aspect of the project with everyone, but we were also mentally working out everything," Lockhart recalls. Vararadajan, the main architect of the new supercomputer, familiarized himself with the Mac. Within three days of having a PowerBook, he was a convert. He then compiled his own compatible software and made his ace-in-the-hole responsive to Macs. Vararadajan had created Deja vu, a software program currently being licensed by Virginia Tech Intellectual Properties (VTIP) that provides a fault-tolerant software environment so that if any one component in the new supercomputer were to fail, the queuing system would be alerted. Within milliseconds, a free node would take over, averting the need to restart a calculation from scratch, which can potentially represent months. The countdownAs the project progressed, leaks to the media occurred. With more than 100 student volunteers, the geek underworld started talking. The Web sites ThinkSecret.com and Slashdot.org created some of the early buzz, forcing the university to distribute its first cautiously worded news release at the beginning of September. By mid-September, the university took a stronger stand while co-hosting a major information technology conference. A release, based on President Charles Steger's announcement that Virginia Tech was building a supercomputer that would be among the top 10 in the world, was distributed to about 100 outlets. The British Broadcasting Corporation (BBC) decided the venture was worth a trip to Blacksburg. Ian Hardy, the BBC correspondent, produced a five-minute feature that aired for seven straight days with an estimated 250 million viewers. Now the world was taking notice--The New York Times, Business Week On-Line, The Chronicle of Higher Education, the Voice of America, Fortune magazine, The Wall Street Journal, and others were calling to talk with team members. An Associated Press story written by Chris Kahn of Roanoke, Va., went international. The Indian, German, and French press called. In late September, Vararadajan received a standing ovation at a computing conference. And the announcement was not yet even official. That changed on Nov. 16, during the Supercomputing 2003 conference, when Virginia Tech's newly named System X--representative of the more than 10 teraflops of speed it records--was named the third fastest supercomputer in the world and the fastest in any academic setting on Dongarra's TOP500 List. Dongarra, who also holds an appointment at Oak Ridge National Laboratory, told The Richmond Times-Dispatch that the "notable aspect" of Virginia Tech's supercomputer "is the $5.2 million price for all that computing power." Aref says, "We believed that we could build a very high performance machine for a fifth to a tenth of what supercomputers now cost, and we did. And we wanted to have our own supercomputer to use for our new Institute for Critical Technology and Applied Science, where we will be conducting multidisciplinary work on such topics as nanoelectronics, aerodynamics, and the molecular modeling of proteins. With this machine, our researchers will be able to build computer modeling in days, not years." Exhausted, but thrilled, the team has already announced its next plateau. With the first machine named "X" for 10 teraflops, they think it might be fun to name their second one a higher Roman numeral. Time will tell what that number is. Lynn Nystrom is director of news and external relations for the College of Engineering. |