Supercomputers - Past, Present and the Future

"At each increase of knowledge, as well as on the contrivance of every new tool, human labor becomes abridged" - Charles Babbage
Computers have become an integral part of everybody's everyday life, more than what Charles Babbage might have foreseen. They are more popular and much faster today than what they were several years ago. There has been a constant improvement in the speed and complexity of microprocessors - the heart of every computing machine. Gordon Moore, the co-founder of Intel, in 1965 predicted that the complexity of integrated circuits would approximately double every year. From single core to multi-core processors, the complexity of these microchips are directly linked to their processing speed and they have been improving exponentially over time, strictly following Moore's Law. IBM 704 was the first mass-produced computer with floating point arithmetic hardware and was introduced by IBM in 1954. It was capable of executing 40,000 instructions per second. Today's desktop or laptop computers can perform tens of millions of instructions per second. We have come a long way in our desire for speedier calculation. Supercomputers - computers that form the forefront of all computing machines - were designed to fulfill those dreams. From the mysterious Antikythera mechanism calculators to modern age petaflop supercomputers, we have constantly been searching for ways to make computations faster. Today's supercomputers can do thousand trillions of floating point computations per second and have been constantly improving. Applications such as weather prediction or nuclear reaction simulations comprises of gazillions of operations and may take more than a few days to complete. IBM, Cray, Hitachi, SGI, Fujitsu and many others have invested large amount of work-hours and millions of dollars to develop systems to solve these complex problems. In the past, these systems were available only to government funded national and international research centers but recent advancement in technology and competition in technology industry has made these machines relatively cheaper and more valuable for IT organizations.
Past: Early Supercomputers
CDC-6600, a mainframe computer from Control Data Corporation in 1965 is regarded as the first successful supercomputer. Architected by Seymour Cray and Jim Thornton, the machine was capable of operating at 9 megaflops (MFLOPS). That is thousand times slower than our current desktops. CDC-6600 was also the first machine to introduce separate processors for handling of housekeeping tasks such as memory access and input/output. Until then, the central processing unit (CPU) was in-charge of performing all operations - computing, memory, input/output. I/O in those days were usually done using punch cards or standard magnetic tapes and were extremely slow. By providing separate processors for each event, the CPU was responsible only for computations, and was thus left with a lot fewer instructions. It also resulted in smaller size of the processor allowing it to be operated as higher clock rate. This novel idea, which later came to be known as reduced instruction set computer (RISC), allowed the CPU, peripheral processors (PPs) and I/O units to operate in parallel, thus improving the overall speed and performance of the machine.
Cray's engineering continued at CDC resulting in CDC-7600, a successor to and ten times faster than CDC-6600. Jim Thornton on the other hand became part of a new project STAR-100 which was designed to operate at 100 MFLOPS. CDC's STAR-100 was released in 1974 and was one of the first machines to use vector processor for improving math performance. Vector processing design allowed CPUs to perform mathematical operations on multiple data elements simultaneously. The CPU thus had to decode only a single instruction, setup the hardware and start feeding the data. This technique remained very popular in the scientific community and formed the basis of design of several supercomputers in the 1980s and 1990s. But, STAR-100, although designed to perform at 100 MFLOPS, gave lower than expected numbers in 'real world' environment because the serialized part of the processing was still slow. Switching from vectors to normal data was still time consuming making the real world performance slower than expected. This theory was put together by Gene Amdahl in Amdahl's Law in early 1967 and ignored by architects of STAR-100.
In 1971, Seymour Cray, unable to secure sufficient funds for his project at CDC, left the company to form Cray Research where he designed Cray-1 (160 MFLOPS). Design of Cray-1 provided good balance between scalar and vector performance and also used registers to dramatically improve performance. Registers are small amount of memory storage available on processors. Their content can be accessed at much faster speed compared to external I/O components. But since they reside on the processor's chip, they are more expensive to manufacture. They also provide less flexibility in terms of size and thus Cray's machine could only read small parts of data at a time. The first release of Cray-1 was in 1976 and it dismissed STAR-100 from its top spot as the fastest supercomputer of that time. The first official customer, National Center for Atmospheric Research (NCAR) paid $8.86 million to own the supercomputer. This machine shaped the computer industry for years to come. Cray-1 was also Cray's first supercomputer to use the integrated circuits (IC).
Cray-1 was succeeded in 1982 by Cray X-MP (800 MFLOPS), the first multiprocessing computer and in 1985 by Cray-2, the first machine to break the gigaflops barrier at 1.9 GFLOPS. It started using all IC components instead of individual components. Cray-2 remained the fastest machine until 1987 when ETA Systems, a spin-off from CDC designed a 10 GFLOPS machine called ETA-10. ETA-10 used fiber optics for communication between processors and I/O devices. ETA later merged back with CDC in 1989. In the mean time, two new companies - Thinking Machines Corporation (1982) and nCUBE (1983) were founded. Both companies specialized in parallel computing architectures. Thinking Machines, started by graduates from Massachusetts Institute of Technology, produced several supercomputers released as Connection Machines. By 1993, four of the top five fastest supercomputers belonged to Thinking Machines. nCUBE on the other hand was started by a group of Intel's employees who wanted Intel to enter into parallel computing but couldn't convince the decision-makers. nCUBE released a parallel computer with the same name. In mid-1990s the supercomputer market collapsed and both companies were acquired by bigger players in the business. The crash also forced Cray Research to merge with Silicon Graphics, Inc (SGI) in 1996.
One of the big companies which never got mentioned until now was IBM. Although IBM until then had built several fastest computer in the world (IBM 7030), it was not until 1993 that it entered the supercomputer market with IBM SP-1. It was the first member of the IBM's Scalable POWERparallel distributed memory parallel computer based on RISC System/6000 processing element which later was known as POWER (Performance Optimization With Enhanced RISC). In a distributed memory system, the memory and address space of each processor of a multi-processor system is local to itself. The data can only be shared between processors using message passing interface like IBM's message passing library (MPL). IBM continued releasing several successors to IBM-SP and faced stiff competition from other players like Hitachi and Intel. At the turn of the century, IBM was at the top of the fastest supercomputer list with IBM ASCI White. It had 8,192 processors with 6TB of memory and 160TB of storage space. It operated at 7.226 TFLOPS.
Present: Supercomputers Today
In 1993, based on ideas of Hans Meuer, a professor of Computer Science at University of Mannheim, Germany, project TOP500 was started. The aim of this project was to list 500 most powerful computer systems in the world. The list, which is compiled twice a year, ranks supercomputers based on their performance on LINPACK benchmark - a linear algebra library for digital computers that tests the floating point computing power of the system. Table 1 shows the list of fastest supercomputers since 1993.
After IBM's ASCI White, Earth Simulator developed by NEC in Japan topped the list from 2002 to 2004. It was developed to understand the global climate models and was capable of operating at over 35 TFLOPS. IBM returned with BlueGene to reposition itself as the leader in building fastest supercomputer. Several prototypes of BlueGene were announced - BlueGene/L (released March 2005), BlueGene/C (in-design), BlueGene/P (released June 2007) and BlueGene/Q (due 2011). BlueGene remained the fastest supercomputer until 2008, when replaced by RoadRunner designed by IBM itself. Other powerful supercomputers released during this period include Cray's XT-3 Red Storm, Cray's XT-4 Franklin, Cray's XT-5 Jaguar, Dell's Thunderbird, SGI's Columbia and HP's Cluster Platform.
According to the list released in November 2008, the top three supercomputers and their average performance are:
- IBM's RoadRunner at Los Alamos National Laboratory, USA -- 1.105 PFLOPS
- Cray's Jaguar XT5 at Oak Ridge National Laboratory, USA -- 1.059 PFLOPS
- SGI's Pleiades Altix ICE 8200EX at NASA/Ames Research Center, USA -- 487.01 TFLOPS
| Period | Supercomputer Name | Maker |
| 06/1993 - 11/1993 | CM-5 (Connection Machine) | Thinking Machine Corp. |
| 11/1993 - 06/1994 | Numerical Wind Tunnel | Fujitsu |
| 06/1994 - 11/1994 | Paragon XP/S | Intel |
| 11/1994 - 06/1996 | Numerical Wind Tunnel | Fujitsu |
| 06/1996 - 11/1996 | SR 2201 | Hitachi |
| 11/1996 - 06/1997 | CP-PACS | Hitachi |
| 06/1997 - 11/2000 | ASCI Red | Intel |
| 11/2000 - 06/2002 | ASCI White | IBM |
| 06/2002 - 11/2004 | Earth Simulator | NEC |
| 11/2004 - 06/2008 | BlueGene | IBM |
| 06/2008 - 06/2009 | RoadRunner | IBM |
Table 1: Faster Supercomputer (1993 - 2009)
IBM's RoadRunner
IBM's RoadRunner is a hybrid system. It uses two different processor architectures - a dual core AMD Opteron server processors based on AMD64 architecture and IBM's Cell processor based on POWER architecture. RoadRunner which was built by IBM at Los Alamos National Laboratory in United States sports 6,562 Opteron processors which takes care of standard processing, such as file system I/Os and 12,240 PowerXCell 8i processors which handle CPU intensive tasks like mathematical calculations. The system boasts 98TB of memory and 2PB of external storage. The machine had a peak performance of 1.7 PFLOPS. This design is significantly different from BlueGene systems which was based on all PowerPC processors. The idea behind BlueGene was to trade the speed of processors for lower power consumption. The systems thus had notably higher number of processors compared to other supercomputers giving the same performance.
Cray's Jaguar XT-5
Cray's Jaguar XT-5 which was ranked second in November 2008 is an updated version of Cray's XT-4 supercomputer. It is based on AMD's Opteron quad-core processors. Each Cray XT-5 blade includes 4 compute nodes for high scalability and each compute node can be configured with 4GB - 32GB DDR2 memory. XT-5 blades are interconnected using Cray's SeaStar2+ chips which provides very high bi-directional link speed of 9.6GB/s. The system installed at Oak Ridge National Laboratory in United States is a combination of both XT-4 and XT-5 machines. The system in total peaks at 1.6 PFLOPS, consists of 45,376 Opteron processors with 362TB of memory and 10PB of storage space.
Silicon Graphics' Altix
Altix is different from the above two supercomputers in that it is based on Intel processors and are distributed shared memory machines. The system installed at NASA/Ames Research Center/NAS nicknamed as Pleiades, consists of 12,800 Intel Xeon processors with 51 TB of RAM and over 1 PB of storage. The system peaks at 608 TFLOPS. The system supplements Columbia, which with 14,336 cores and 51 TFLOPS ranked second in 2004, just behind IBM's BlueGene/L. The nodes in Altix are connected using NUMAlink4 developed by SGI capable of providing bandwidth of up to 6.4GB/s.
| Year | Accomplishments |
| 1962 |
|
| 1965 |
|
| 1969 |
|
| 1972 |
|
| 1976 |
|
| 1982 |
|
| 1985 |
|
| 1987 |
|
| 1992 |
|
| 1993 |
|
| 1997 |
|
| 1999 |
|
| 2000 |
|
| 2002 |
|
| 2004 |
|
| 2006 |
|
| 2007 |
|
| 2008 |
|
Table 2: Chronology of Supercomputers
Future: Supercomputer Next
Normal individuals on street do not require a supercomputer for their regular computing use. Supercomputers are primarily a necessity of scientist performing mass computing at ultra high speed. They are used in all plausible domains - space exploration, nuclear energy, climate prediction, environmental simulations, gene technology, maths and physics, and many others. And while supercomputers excel at highly computation intensive tasks, yet, they are not the fastest computers on the planet. Human brain which controls thousands of human muscles, does audio and visual processing at extremely high speed and controls thousands of nerves, all in a fraction of a second is the regarded as the fastest processor in the world. 10 PFLOP is too slow a speed to simulate the whole body including tissue, blood flow and movement. Size of the brain and heat and power issues of a supercomputer are beyond comparison. This is an indication that we still have a long way to go.
Construction of supercomputers is a very challenging and a very expensive task. It may take several years for a supercomputer to move from laboratory to the market with cost figures in the range of $150 - $200 million or more. Most of this work can only be done with the support of government funds and government funded research centers. Designers of the world's fastest supercomputers - IBM, Cray, SGI, Sun, HP, Hitachi and many others are putting in efforts to create a multi-petaflop machines. IBM is planning a 50 PFLOP machine by the end of 2013 and it is estimated that within the next decade, we will have an exaflop machine. But the process of building faster machines are crippled by input/output units. I/O is not scaling as fast as Moore's Law. Lot of research is being conducted on improving the design and performance of parallel file systems, including introducing solid state drives. Other challenges include search for experts in computational science, mathematics and computer science to understand these complex systems and design softwares for taking advantage of the enormous computing power that these supercomputers provide. As the future unfolds, it will be interesting to see what we accomplish next.
Acknowledgement
I would like to thank my advisor Dr. John A. Chandy, for providing inputs and sharing his experiences and opinions.
Biography
Sumit Narayan is a Ph.D. candidate at the University of Connecticut, Storrs, USA. He holds a Masters' from University of Connecticut and Bachelors of Engineering from University of Madras. His research interests include high performance computing, parallel file systems, storage system architectures and I/O subsystems.
