When you looked at a chip in the past you primarily were concerned with two things: the speed of the chip, usually expressed in GHz, and how much power it consumed. Today the IBM engineers preparing the newest POWER chip, the 14nm POWER9, are tweaking the chips for the different workloads it might run, such as cognitive or cloud, and different deployment options, such as scale-up or scale-out, and a host of other attributes. EE Times described it in late August from the Hot Chips conference where it was publicly unveiled.
IBM POWER9 chip
IBM describes it as a chip family but maybe it’s best described as the product of an entire chip community, the Open POWER Foundation. Innovations include CAPI 2.0, New CAPI, Nvidia’s NVLink 2.0, PCle Gen4, and more. It spans a range of acceleration options from HSDC clusters to extreme virtualization capabilities for the cloud. POWER9 is not just about high speed transaction processing; IBM wants the chip to interpret and reason, ingest and analyze.
POWER has gone far beyond the POWER chips that enabled Watson to (barely) beat the human Jeopardy champions. Going forward, IBM is counting on POWER9 and Watson to excel at cognitive computing, a combination of high speed analytics and self-learning. POWER9 systems should not only be lightning fast but get smarter with each new transaction.
For z System shops, POWER9 offers a glimpse into the design thinking IBM might follow with the next mainframe, probably the z14 that will need comparable performance and flexibility. IBM already has set up the Open Mainframe Project, which hasn’t delivered much yet but is still young. It took the Open POWER group a couple of years to deliver meaningful innovations. Stay tuned.
The POWER9 chip is incredibly dense (below). You can deploy it as either a scale-up or scale-out architecture. You have a choice of two-socket servers with 8 DDR4 ports and another for multiple chips per server with buffered DIMMs.
IBM POWER9 silicon layout
IBM describes the POWER9 as a premier acceleration platform. That means it offers extreme processor/accelerator bandwidth and reduced latency; coherent memory and virtual addressing capability for all accelerators; and robust accelerated compute options through the OpenPOWER community.
It includes State-of-the-Art I/O and Acceleration Attachment Signaling:
- PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth
- 25G Link x 48 lanes – 300 GB/s duplex bandwidth
And robust accelerated compute options based on open standards, including:
- On-Chip Acceleration—Gzip x1, 842 Compression x2, AES/SHA x2
- CAPI 2.0—4x bandwidth of POWER8 using PCIe Gen 4
- NVLink 2.0—next generation of GPU/CPU bandwidth and integration using 25G Link
- New CAPI—high bandwidth, low latency and open interface using 25G Link
In scale-out mode it employs direct attached memory through 8 direct DDR4 ports, which deliver:
- Up to 120 GB/s of sustained bandwidth
- Low latency access
- Commodity packaging form factor
- Adaptive 64B / 128B reads
In scale-up mode it uses buffered memory through 8 buffered channels to provide:
- Up to 230GB/s of sustained bandwidth
- Extreme capacity – up to 8TB / socket
- Superior RAS with chip kill and lane sparing
- Compatible with POWER8 system memory
- Agnostic interface for alternate memory innovations
POWER9 was publicly introduced at the Hot Chips conference last spring. Commentators writing in EE Times noted that POWER9 could become a break out chip, seeding new OEM and accelerator partners and rejuvenating IBM’s efforts against Intel in high-end servers. To achieve that kind of performance IBM deploys large chunks of memory—including a 120 Mbyte embedded DRAM in shared L3 cache while riding a 7 Tbit/second on-chip fabric. POWER9 should deliver as much as 2x the performance of the Power8 or more when the new chip arrives next year, according to Brian Thompto, a lead architect for the chip, in published reports.
As noted above, IBM will release four versions of POWER9. Two will use eight threads per core and 12 cores per chip geared for IBM’s Power virtualization environment; two will use four threads per core and 24 cores/chip targeting Linux. Both will come in two versions — one for two-socket servers with 8 DDR4 ports and another for multiple chips per server with buffered DIMMs.
The diversity of choices, according to Hot Chips observers, could help attract OEMs. IBM has been trying to encourage others to build POWER systems through its OpenPOWER group that now sports more than 200 members. So far, it’s gaining most interest from China where one partner plans to make its own POWER chips. The use of standard DDR4 DIMMs on some parts will lower barriers for OEMs by enabling commodity packaging and lower costs.
DancingDinosaur is Alan Radding, a veteran information technology analyst and writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.