IBM Corp. is using a cluster of prototype blade servers built around its multicore Cell microprocessor to breathe life into a three-dimensional model of a beating human heart at the Cebit trade show in Hanover.
Visitors to IBM's stand can don a pair of special glasses to view the model of the heart in 3-D. They can turn it around and cut into it to view cross sections of their choice using a mouse, all while the heart continues to beat in real time. Visually impressive, the demonstration also shows how much work lies ahead if programmers are to exploit the full potential of the new chip architecture.
The model was created using PV-4D, a software tool for visualizing large quantities of data resulting from scientific computation or image processing. Researchers at Germany's Fraunhofer Institute originally developed PV-4D to run on clusters of commodity PCs, and recently ported it to the Cell processor architecture.
"It was optimized at a low level. That took three months," said Franz-Josef Pfreundt, division director of the institute's competence center for high-performance computing, who was at the IBM stand to explain the demonstration.
If the optimization of the code took time, that's because the skills required to program the Cell processor are not the same as those needed to program other processors.
"There is a lack of people with that knowledge in doing parallel programming," Pfreundt said.
Each Cell chip contains one PowerPC processor core and eight specialized vector processors (or SPEs, synergistic processor elements, in IBM parlance). The SPEs each have 256K-bytes of level-one cache memory for code and data, and communicate with main memory at up to 25G bytes per second. They can also communicate with the PowerPC core or with one another through a 200G-byte-per-second bus.
"A single SPE is more powerful than the Power processor for 32-bit integer and floating-point calculation," according to Pfreundt.
Making use of that power, though, means dividing a problem into many small, independent, parallel tasks, according to Utz Bacher, an employee of IBM Germany who worked on porting the kernel of the Linux operating system to the Cell processor.
"To get the performance out of the chip, you will have to exploit the SPEs, and if you want to use the SPEs that means you have to somehow do threading," he said.
The SPEs use an all-new instruction set, not the same as the main PowerPC core, so code for them must be created using a different compiler.
Programmers of microprocessors with more traditional architectures can use optimizing compilers to squeeze the most performance out of the hardware, using techniques such as automatically prefetching data from main memory and storing it in the processor's cache before it is needed.
For the Cell, that's still very much a manual process: "You need to explicitly take care of early fetching from main memory. There is no automatic fetch," Bacher said.
That means extra work for programmers, who will need to understand the limitations of Cell's memory architecture and include fetch instructions in their code rather than leaving it to chip or the compiler.
"Maybe the compiler can take care of that in the future," Bacher said. "IBM Research is looking into that and they have really interesting ideas on how to optimize that."
IBM plans to introduce the Cell blade server in the third quarter. It hasn't set a price yet.