Posts Tagged ‘CAPI 2.0’

Is Your Enterprise Ready for AI?

May 11, 2018

According to IBM’s gospel of AI “we are in the midst of a global transformation and it is touching every aspect of our world, our lives, and our businesses.”  IBM has been preaching its gospel of AI of the past year or longer, but most of its clients haven’t jumped fully aboard. “For most of our clients, AI will be a journey. This is demonstrated by the fact that most organizations are still in the early phases of AI adoption.”

AC922 with NIVIDIA Tesla V100 and Enhanced NVLink GPUs

The company’s latest announcements earlier this week focus POWER9 squarely on AI. Said Tim Burke, Engineering Vice President, Cloud and Operating System Infrastructure, at Red Hat. “POWER9-based servers, running Red Hat’s leading open technologies offer a more stable and performance optimized foundation for machine learning and AI frameworks, which is required for production deployments… including PowerAI, IBM’s software platform for deep learning with IBM Power Systems that includes popular frameworks like Tensorflow and Caffe, as the first commercially supported AI software offering for [the Red Hat] platform.”

IBM insists this is not just about POWER9 and they may have a point; GPUs and other assist processors are taking on more importance as companies try to emulate the hyperscalers in their efforts to drive server efficiency while boosting power in the wake of declines in Moore’s Law. ”GPUs are at the foundation of major advances in AI and deep learning around the world,” said Paresh Kharya, group product marketing manager of Accelerated Computing at NVIDIA. [Through] “the tight integration of IBM POWER9 processors and NVIDIA V100 GPUs made possible by NVIDIA NVLink, enterprises can experience incredible increases in performance for compute- intensive workloads.”

To create an AI-optimized infrastructure, IBM announced the latest additions to its POWER9 lineup, the IBM Power Systems LC922 and LC921. Characterized by IBM as balanced servers offering both compute capabilities and up to 120 terabytes of data storage and NVMe for rapid access to vast amounts of data. IBM included HDD in the announcement but any serious AI workload will choke without ample SSD.

Specifically, these new servers bring an updated version of the AC922 server, which now features recently announced 32GB NVIDIA V100 GPUs and larger system memory, which enables bigger deep learning models to improve the accuracy of AI workloads.

IBM has characterized the new models as data-intensive machines and AI-intensive systems, LC922 and LC921 Servers with POWER9 processors. The AC922, arrived last fall. It was designed for the what IBM calls the post-CPU era. The AC922 was the first to embed PCI-Express 4.0, next-generation NVIDIA NVLink, and OpenCAPI—3 interface accelerators—which together can accelerate data movement 9.5x faster than PCIe 3.0 based x86 systems. The AC922 was designed to drive demonstrable performance improvements across popular AI frameworks such as TensorFlow and Caffe.

In the post CPU era, where Moore’s Law no longer rules, you need to pay as much attention to the GPU and other assist processors as the CPU itself, maybe even more so. For example, the coherence and high-speed of the NVLink enables hash tables—critical for fast analytics—on GPUs. As IBM noted at the introduction of the new machines this week: Hash tables are fundamental data structure for analytics over large datasets. For this you need large memory: small GPU memory limits hash table size and analytic performance. The CPU-GPU NVLink2 solves 2 key problems: large memory and high-speed enables storing the full hash table in CPU memory and transferring pieces to GPU for fast operations; coherence enables new inserts in CPU memory to get updated in GPU memory. Otherwise, modifications on data in CPU memory do not get updated in GPU memory.

IBM has started referring to the LC922 and LC921 as big data crushers. The LC921 brings 2 POWER9 sockets in a 1U form factor; for I/O it comes with both PCIe 4.0 and CAPI 2.0.; and offers up to 40 cores (160 threads) and 2TB RAM, which is ideal for environments requiring dense computing.

The LC922 is considerably bigger. It offers balanced compute capabilities delivered with the P9 processor and up to 120TB of storage capacity, again advanced I/O through PCIe 4.0/CAPI 2.0, and up to 44 cores (176 threads) and 2TB RAM. The list price, notes IBM is ~30% less.

If your organization is not thinking about AI your organization is probably in the minority, according to IDC.

  • 31 percent of organizations are in [AI] discovery/evaluation
  • 22 percent of organizations plan to implement AI in next 1-2 years
  • 22 percent of organizations are running AI trials
  • 4 percent of organizations have already deployed AI

Underpinning both servers is the IBM POWER9 CPU. The POWER9 enjoys a nearly 5.6x improved CPU to GPU bandwidth vs x86, which can improve deep learning training times by nearly 4x. Even today companies are struggling to cobble together the different pieces and make them work. IBM learned that lesson and now offers a unified AI infrastructure in PowerAI and Power9 that you can use today.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.

Meet the POWER9 Chip Family

September 2, 2016

When you looked at a chip in the past you primarily were concerned with two things: the speed of the chip, usually expressed in GHz, and how much power it consumed. Today the IBM engineers preparing the newest POWER chip, the 14nm POWER9, are tweaking the chips for the different workloads it might run, such as cognitive or cloud, and different deployment options, such as scale-up or scale-out, and a host of other attributes.  EE Times described it in late August from the Hot Chips conference where it was publicly unveiled.

ibm power9 bandwidth

IBM POWER9 chip

IBM describes it as a chip family but maybe it’s best described as the product of an entire chip community, the Open POWER Foundation. Innovations include CAPI 2.0, New CAPI, Nvidia’s NVLink 2.0, PCle Gen4, and more. It spans a range of acceleration options from HSDC clusters to extreme virtualization capabilities for the cloud. POWER9 is not just about high speed transaction processing; IBM wants the chip to interpret and reason, ingest and analyze.

POWER has gone far beyond the POWER chips that enabled Watson to (barely) beat the human Jeopardy champions. Going forward, IBM is counting on POWER9 and Watson to excel at cognitive computing, a combination of high speed analytics and self-learning. POWER9 systems should not only be lightning fast but get smarter with each new transaction.

For z System shops, POWER9 offers a glimpse into the design thinking IBM might follow with the next mainframe, probably the z14 that will need comparable performance and flexibility. IBM already has set up the Open Mainframe Project, which hasn’t delivered much yet but is still young. It took the Open POWER group a couple of years to deliver meaningful innovations. Stay tuned.

The POWER9 chip is incredibly dense (below). You can deploy it as either a scale-up or scale-out architecture. You have a choice of two-socket servers with 8 DDR4 ports and another for multiple chips per server with buffered DIMMs.

power9 chip

IBM POWER9 silicon layout

IBM describes the POWER9 as a premier acceleration platform. That means it offers extreme processor/accelerator bandwidth and reduced latency; coherent memory and virtual addressing capability for all accelerators; and robust accelerated compute options through the OpenPOWER community.

It includes State-of-the-Art I/O and Acceleration Attachment Signaling:

  • PCIe Gen 4 x 48 lanes – 192 GB/s duplex bandwidth
  • 25G Link x 48 lanes – 300 GB/s duplex bandwidth

And robust accelerated compute options based on open standards, including:

  • On-Chip Acceleration—Gzip x1, 842 Compression x2, AES/SHA x2
  • CAPI 2.0—4x bandwidth of POWER8 using PCIe Gen 4
  • NVLink 2.0—next generation of GPU/CPU bandwidth and integration using 25G Link
  • New CAPI—high bandwidth, low latency and open interface using 25G Link

In scale-out mode it employs direct attached memory through 8 direct DDR4 ports, which deliver:

  • Up to 120 GB/s of sustained bandwidth
  • Low latency access
  • Commodity packaging form factor
  • Adaptive 64B / 128B reads

In scale-up mode it uses buffered memory through 8 buffered channels to provide:

  • Up to 230GB/s of sustained bandwidth
  • Extreme capacity – up to 8TB / socket
  • Superior RAS with chip kill and lane sparing
  • Compatible with POWER8 system memory
  • Agnostic interface for alternate memory innovations

POWER9 was publicly introduced at the Hot Chips conference last spring. Commentators writing in EE Times noted that POWER9 could become a break out chip, seeding new OEM and accelerator partners and rejuvenating IBM’s efforts against Intel in high-end servers. To achieve that kind of performance IBM deploys large chunks of memory—including a 120 Mbyte embedded DRAM in shared L3 cache while riding a 7 Tbit/second on-chip fabric. POWER9 should deliver as much as 2x the performance of the Power8 or more when the new chip arrives next year, according to Brian Thompto, a lead architect for the chip, in published reports.

As noted above, IBM will release four versions of POWER9. Two will use eight threads per core and 12 cores per chip geared for IBM’s Power virtualization environment; two will use four threads per core and 24 cores/chip targeting Linux. Both will come in two versions — one for two-socket servers with 8 DDR4 ports and another for multiple chips per server with buffered DIMMs.

The diversity of choices, according to Hot Chips observers, could help attract OEMs. IBM has been trying to encourage others to build POWER systems through its OpenPOWER group that now sports more than 200 members. So far, it’s gaining most interest from China where one partner plans to make its own POWER chips. The use of standard DDR4 DIMMs on some parts will lower barriers for OEMs by enabling commodity packaging and lower costs.

DancingDinosaur is Alan Radding, a veteran information technology analyst and writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.

 

 

 


%d bloggers like this: