IBM’s Power Systems Power8 Coherent Accelerator Processor Interface (CAPI) is not for every IT shop running Power Systems. However, for those that aim to attach devices to their POWER8 systems over the PCIe interface and want fast, efficient performance CAPI will be unbeatable. Steve Fields, IBM Distinguished Engineer and Director of Power Systems Design introduces it here. Some of it gets pretty geeky but slides #12-17 make the key points.
DancingDinosaur first covered CAPI here, in April, shortly after its introduction. At that point it looked like CAPI would be a game changer and nothing since suggests otherwise. As we described it then, CAPI sits directly on the POWER8 board and works with the same memory addresses that the processor uses. Pointers de-reference the same as the host application. CAPI, in effect, removes OS and device driver overhead by presenting an efficient, robust, durable and, most importantly, a direct interface. In the process, it offloads complexity.
In short, CAPI provides:
- SMP Coherence Protocol transported over PCI Express interface
- Provides isolation and filtering through the support unit in the processor (“CAPP”)
- Manages caching and address translation through the standard POWER Service Layer in the accelerator device
- Enables accelerator Functional Units to operate as part of the application at the user (direct) level, just like a CPU
What you end up with is a coherent connected accelerator for just a fraction of the development effort otherwise required. As such, CAPI enables more efficient accelerator development. It can reduce the typical seven-step I/O model flow (1-Device Driver Call, 2-Copy or Pin Source Data, 3-MMIO Notify Accelerator, 4-Acceleration, 5-Poll/Int Completion, 6-Copy or Unpin Result Data, 7-Return From Device Driver Completion) to just three steps (1-shared memory/notify accelerator, 2-acceleration, and 3-shared memory completion). The result is an easier, more natural programming model with traditional thread-level programming and no need to restructure the application to accommodate long latency I/O. Finally it enables apps otherwise not possible, such as those requiring pointer chasing (e.g. Java garbage-collection).
Other advantages include an open ecosystem for accelerators built using Field Programmable Gate Arrays (FPGA). The number and size of FPGAs can be based on application requirements, and FPGAs can attach to other components, such as private DRAM, flash memory, or a high-speed network.
Driving the need for CAPI is the insatiable demand for performance. For that, acceleration is required, which is complicated and resource-intensive to build. So IBM created CAPI, not just for pure compute but for any network-attached or storage-attached I/O. In the end it eliminates the overhead of the I/O subsystem, allowing the focus to be on the workload.
In one example IBM reported it was able to attach an IBM Flash appliance to POWER8 via the CAPI interface. As a result it could generate Read/Write commands from applications and eliminate 97% of code path length, a savings of 20-30 cores per 1M IOPS. In another test IBM reported being able to leverage CAPI to integrate flash into a server; the memory-like semantics allowed the flash to replace DRAM for many in-memory workloads. The result: 5x cost savings plus large density and energy improvements. Furthermore, by eliminating the I/O subsystem overhead from high IOPS flash access, it freed the CPU to focus on the application workload.
Finally, in a Monte Carlo simulation of 1 million iterations, a POWER8 core with FPGA and CAPI ran a full execution of the Heston pricing model for a single security 250x faster than the POWER8 core alone. It also proved easier to code, reducing the lines of C code to write by 40x compared to non-CAPI FPGA.
IBM is just getting started with CAPI. Coming up next will be CAPI working with Linux, mainly for use with analytics. Once Linux comes into the picture, expect more PCIe card vendors to deliver products that leverage CAPI. AIX too comes into the picture down the road.
Plan to attend IBM Enterprise2014 in Las Vegas, Oct. 6-19. Here is one intriguing CAPI presentation that will be there: Light up performance of your LAMP apps with a stack optimized for Power, by Alise Spence, Andi Gutmans, and Antonio Rosales. It will discuss how to leverage CAPI with POWER8 to create what they call a “killer stack” that brings together continuous delivery with exceptional performance at a competitive price. Other CAPI sessions also are in the works for Enterprise2014.
DancingDinosaur (Alan Radding) definitely is attending IBM Enterprise2014. You can follow DancingDinosaur on Twitter, @mainframeblog, or check out Technologywriter.com. Upcoming posts will look more closely at Enterprise2014 and explore some session content.
Tags: accelerators, AIX, CAPI, Coherent Accelerator Processor Interface (CAPI), DRAM, Field Programmable Gate Array (FPGA), Flash, I/O, IBM, IBM Enterprise2014, IOPs, Java, Linux, PCIe, Power Systems, POWER8, SMP