Posts Tagged ‘accelerators’

Open POWER-Open Compute-POWER9 at Open Compute Summit

March 16, 2017

Bryan Talik, President, OpenPOWER Foundation provides a detailed rundown on the action at the Open Compute  Summit held last week in Santa Clara. After weeks of writing about Cognitive, Machine Learning, Blockchain, and even quantum computing, it is a nice shift to conventional computing platforms that should still be viewed as strategic initiatives.

The OpenPOWER, Open Compute gospel was filling the air in Santa Clara.  As reported, Andy Walsh, Xilinx Director of Strategic Market Development and OpenPOWER Foundation Board member explained, “We very much support open standards and the broad innovation they foster. Open Compute and OpenPOWER are catalysts in enabling new data center capabilities in computing, storage, and networking.”

Added Adam Smith, CEO of Alpha Data:  “Open standards and communities lead to rapid innovation…We are proud to support the latest advances of OpenPOWER accelerator technology featuring Xilinx FPGAs.”

John Zannos, Canonical OpenPOWER Board Chair chimed in: For 2017, the OpenPOWER Board approved four areas of focus that include machine learning/AI, database and analytics, cloud applications and containers. The strategy for 2017 also includes plans to extend OpenPOWER’s reach worldwide and promote technical innovations at various academic labs and in industry. Finally, the group plans to open additional application-oriented workgroups to further technical solutions that benefits specific application areas.

Not surprisingly, some members even see collaboration as the key to satisfying the performance demands that the computing market craves. “The computing industry is at an inflection point between conventional processing and specialized processing,” according to Aaron Sullivan, distinguished engineer at Rackspace. “

To satisfy this shift, Rackspace and Google announced an OCP-OpenPOWER server platform last year, codenamed Zaius and Barreleye G2.  It is based on POWER9. At the OCP Summit, both companies put on a public display of the two products.

This server platform promises to improve the performance, bandwidth, and power consumption demands for emerging applications that leverage machine learning, cognitive systems, real-time analytics and big data platforms. The OCP players plan to continue their work alongside Google, OpenPOWER, OpenCAPI, and other Zaius project members.

Andy Walsh, Xilinx Director of Strategic Market Development and OpenPOWER Foundation Board member explains: “We very much support open standards and the broad innovation they foster. Open Compute and OpenPOWER are catalysts in enabling new data center capabilities in computing, storage, and networking.”

This Zaius and Barreleye G@ server platforms promise to advance the performance, bandwidth and power consumption demands for emerging applications that leverage the latest advanced technologies. These latest technologies are none other than the strategic imperatives–cognitive, machine learning, real-time analytics–IBM has been repeating like a mantra for months.

Open Compute Projects also were displayed at the Summit. Specifically, as reported: Google and Rackspace, published the Zaius specification to Open Compute in October 2016, and had engineers to explain the specification process and to give attendees a starting point for their own server design.

Other Open Compute members, reportedly, also were there. Inventec showed a POWER9 OpenPOWER server based on the Zaius server specification. Mellanox showcased ConnectX-5, its next generation networking adaptor that features 100Gb/s Infiniband and Ethernet. This adaptor supports PCIe Gen4 and CAPI2.0, providing a higher performance and a coherent connection to the POWER9 processor vs. PCIe Gen3.

Others, reported by Talik, included Wistron and E4 Computing, which showcased their newly announced OCP-form factor POWER8 server. Featuring two POWER8 processors, four NVIDIA Tesla P100 GPUs with the NVLink interconnect, and liquid cooling, the new platform represents an ideal OCP-compliant HPC system.

Talik also reported IBM, Xilinx, and Alpha Data showed their line ups of several FPGA adaptors designed for both POWER8 and POWER9. Featuring PCIe Gen3, CAPI1.0 for POWER8 and PCIe Gen4, CAPI2.0 and 25G/s CAPI3.0 for POWER9 these new FPGAs bring acceleration to a whole new level. OpenPOWER member engineers were on-hand to provide information regarding the CAPI SNAP developer and programming framework as well as OpenCAPI.

Not to be left out, Talik reported that IBM showcased products it previously tested and demonstrated: POWER8-based OCP and OpenPOWER Barreleye servers running IBM’s Spectrum Scale software, a full-featured global parallel file system with roots in HPC and now widely adopted in commercial enterprises across all industries for data management at petabyte scale.  Guess compute platform isn’t quite the dirty phrase IBM has been implying for months.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.

 

OpenCAPI, Gen-Z, CCIX Initiate a New Computing Era

October 20, 2016

The next generation data center will be a more open, cooperative, and faster place judging from the remarkably similar makeup of three open consortia, OpenCAPI , Gen-Z, and CCIX. CCIX allows processors based on different instruction set architectures to extend their cache coherency to accelerators, interconnect, and I/O.

OpenCAPI provides a way to attach accelerators and I/O devices with coherence and virtual addressing to eliminate software inefficiency associated with the traditional I/O subsystem, and to attach advanced memory technologies.  The focus of OpenCAPI is on attached devices primarily within a server. Gen-Z, announced around the same time, is a new data access technology that primarily enables read and write operations among disaggregated memory and storage.

open-power-rethink-datacenter

Rethink the Datacenter

It’s quite likely that your next data center will use all three. The OpenCAPI group includes AMD, Dell EMC, Google, Hewlett Packard Enterprise, IBM, Mellanox Technologies, Micron, NVIDIA and Xilinx. Their new specification promises to enable up to 10X faster server performance with the first products expected in the second half of 2017.

The Gen-Z consortium consists Advanced Micro Devices, Broadcom, Huawei Technologies, Red Hat, Micron, Xilinx, Samsung, IBM, and Cray. Other founding members are Cavium, IDT, Mellanox Technologies, Microsemi, Seagate, SK Hynix, and Western Digital. They plan to develop a scalable computing interconnect and protocol that will enable systems to keep with the rapidly rising tide of data that is being generated and that needs to be analyzed. This will require the rapid movement of high volumes of data between memory and storage.

The CCIX initial members include Amphenol Corp., Arteris Inc., Avery Design Systems, Atos, Cadence Design Systems, Inc., Cavium, Inc., Integrated Device Technology, Inc., Keysight Technologies, Inc., Micron Technology, Inc., NetSpeed Systems, Red Hat Inc., Synopsys, Inc., Teledyne LeCroy, Texas Instruments, and TSMC.

The basic problem all three address revolves around how to make the volume and variety of new hardware forge fast communications and work together. In effect each group, from its own particular perspective, aims to boost the performance and interoperability of data center servers, devices, and components engaged in generating and handling myriad data and tasked with analyzing large amounts of that data. This will only be compounded as IoT, blockchain, and cognitive computing ramp up.

To a large extent, this results from the inability of Moore’s Law to continue to double the number of processors indefinitely. Future advances must rely on different sorts of hardware tweaks and designs to deliver greater price/performance.

Then in Aug. 2016 IBM announced a related chip breakthrough.  It unveiled the industry’s first 7 nm chip that could hold more than 20 billion tiny switches or transistors for improved computing power. The new chips could help meet demands of future cloud computing and Big Data systems, cognitive computing, mobile products and other emerging technologies, according to IBM.

Most chips today in servers and other devices use microprocessors between 14 and 22 nanometers (nm). The 7nm technology represents at least a 50 percent power improvement. IBM intends to apply the new chips to analyze DNA, viruses, and exosomes. IBM expects to test this lab-on-a-chip technology starting with prostate cancer.

The point of this digression into chips and Moore’s Law is to suggest the need for tools and interfaces like Open CAPI, Gen-Z, and CCIX. As the use cases for ultra fast data analytics expands along with the expected proliferation of devices speed becomes critical. How long do you want to wait for an analysis of your prostate or breast cells? If the cells are dear to you, every nanosecond matters.

For instance, OpenCAPI provides an open, high-speed pathway for different types of technology – advanced memory, accelerators, networking and storage – to more tightly integrate their functions within servers. This data-centric approach to server design puts the compute power closer to the data and removes inefficiencies in traditional system architectures to help eliminate system bottlenecks that significantly improve server performance.  In some cases OpenCAPI enables system designers to access memory with sub-500 nanosecond latency.

IBM plans to introduce POWER9-based servers that leverage the OpenCAPI specification in the second half of 2017. Similarly, expect other members of OpenPOWER Foundation to introduce OpenCAPI enabled products in the same time frame. In addition, Google and Rackspace’s new server under development, codenamed Zaius and announced at the OpenPOWER Summit in San Jose, will leverage POWER9 processor technology and plans to provide the OpenCAPI interface in its design. Also, Mellanox plans to enable the new specification capabilities in its future products and Xilinx plans to support OpenCAPI enabled FPGAs

As reported at the Gen-Z announcement, “The formation of these new consortia (CCIX, OpenCAPI, and Gen-Z), backed by more than 30 industry-leading global companies, supports the premise that the datacenter of the future will require open standards. We look forward to collaborating with CCIX and OpenCAPI as this new ecosystem takes shape,” said Kurtis Bowman, Gen-Z Consortium president. Welcome to the 7nm computing era.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghostwriter. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.

 

Oracle Aims at Intel and IBM POWER

July 8, 2016

In late June Oracle announced the SPARC S7 processor, a new 20nm, 4.27 GHz, 8-core/64-thread SPARC processor targeted for scale-out Cloud workloads that usually go to Intel x86 servers. These are among the same workloads IBM is aiming for with POWER8, POWER9, and eventually POWER10, as reported by DancingDinosaur just a couple of weeks ago.

oracle roadmap trajectory

Oracle 5-year SPARC trajectory (does not include newly announced S series).

According to Oracle, the latest additions to the SPARC platform are built on the new 4.27 GHz, 8-core/64-thread SPARC S7 microprocessor with what Oracle calls Software-in-Silicon features such as Silicon Secured Memory and Data Analytics Accelerators, which enable organizations to run applications of all sizes on the SPARC platform at commodity price points. All existing commercial and custom applications will also run on the new SPARC enterprise cloud services and solutions unchanged while experiencing improvements in security, efficiency, and simplicity.

By comparison, the IBM POWER platform includes with the POWER8, which is delivered as a 12-core, 22nm processor. The POWER9, expected in 2017, will be delivered as 14nm processor with 24 cores and CAPI and NVlink accelerators, which ensure delivery of more performance with greater energy efficiency.  By 2018, the IBM roadmap shows POWER8/9 as a 10nm, maybe even a 7nm, processor, based on the existing micro-architecture. And an even beefier POWER10 is expected to arrive around 2020.

At the heart of the Oracle’s new scale-out, commodity-priced server, the S7. According to Oracle, the SPARC S7 delivers balanced compute performance with 8 cores per processor, integrated on-chip DDR4 memory interfaces, a PCIe controller, and coherency links. The cores in the SPARC S7 are optimized for running key enterprise software, including Java applications and database. The SPARC S7–based servers use very high levels of integration that increase bandwidth, reduce latencies, simplify board design, reduce the number of components, and increase reliability, according to Oracle. All this promises an increase in system efficiency with a corresponding improvement in the economics of deploying a scale-out infrastructure when compared to other vendor solutions.

Oracle’s SPARC S7 processor, based on Oracle enterprise class M7 servers, is optimized for horizontally scalable systems with all the key functionality included in the microprocessor chip. Its Software-in-Silicon capabilities, introduced with the SPARC M7 processor, are also available in the SPARC S7 processor to enable improved data protection, cryptographic acceleration, and analytics performance. These features include Security-in-Silicon, which provides Silicon Secured Memory and cryptographic acceleration, and Data Analytics Accelerator (DAX) units, which provide In-memory query acceleration and in-line decompression

SPARC S7 processor–based servers include single- and dual-processor systems that are complementary to the existing mid-range and high-end systems based on Oracle’s SPARC M7 processor. SPARC S7 processor–based servers include two rack-mountable models. The SPARC S7-2 server uses a compact 1U chassis, and the SPARC S7-2L server is implemented in a larger, more expandable 2U chassis. Uniformity of management interfaces and the adoption of standards also should help reduce administrative costs, while the chassis design provides density, efficiency, and economy as increasingly demanded by modern data centers. Published reports put the cost of the new Oracle systems at just above $11,000 with a single processor, 64GB of memory and two 600GB disk drives, and up to about $50,000 with two processors and a terabyte of memory.

DancingDinosaur doesn’t really have enough data to compare the new Oracle system with the new POWER8 and upcoming POWER9 systems. Neither Oracle nor IBM have provided sufficient details. Oracle doesn’t even offer a roadmap at this point, which might tell you something.

What we do know about the POWER machines is this: POWER9 promises a wealth of improvements in speeds and feeds. Although intended to serve the traditional Power Server market, it also is expanding its analytics capabilities and is being optimized for new deployment models like hyperscale, cloud, and technical computing through scale-out deployment. Available for either clustered or multiple formats, it will feature a shorter pipeline, improved branch execution, and low latency on the die cache as well as PCI gen 4.

According to IBM, you can expect a 3x bandwidth improvement with POWER9 over POWER8 and a 33% speed increase. POWER9 also will continue to speed hardware acceleration and support next gen NVlink, improved coherency, enhance CAPI, and introduce a 25 GPS high speed link. Although the 2-socket chip will remain, IBM suggests larger socket counts are coming. It will need that to compete with Intel.

At least IBM showed its POWER roadmap. There is no comparable information from Oracle. At best, DancingDinosaur was able to dig up the following sketchy details for 2017-2019: Next Gen Core, 2017 Software-in-Silicon V1, Scale Out fully integrated Software-in-Silicon V1 or 2; 2018- 2019 Core Enhancements, Increased Cache, Increased Bandwidth, Software-in-Silicon V3.

Both Oracle and IBM have made it clear neither really wants to compete in the low cost, scale out server market. However, as both companies’ large clients turn to scale out, hyperscale Intel-based systems they have no choice but to follow the money. With the OpenPOWER Foundation growing and driving innovation, mainly in the form of accelerators, IBM POWER may have an advantage driving a very competitive price/performance story against Intel. With the exception of Fujitsu as an ally of sorts, Oracle has no comparable ecosystem as far as DancingDinosaur can tell.

DancingDinosaur is Alan Radding, a veteran information technology analyst and writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.

 

IBM POWER8 CAPI for Efficient Top Performance

August 21, 2014

IBM’s Power Systems Power8 Coherent Accelerator Processor Interface (CAPI) is not for every IT shop running Power Systems. However, for those that aim to attach devices to their POWER8 systems over the PCIe interface and want fast, efficient performance CAPI will be unbeatable.  Steve Fields, IBM Distinguished Engineer and Director of Power Systems Design introduces it here. Some of it gets pretty geeky but slides #12-17 make the key points.

DancingDinosaur first covered CAPI here, in April, shortly after its introduction. At that point it looked like CAPI would be a game changer and nothing since suggests otherwise. As we described it then, CAPI sits directly on the POWER8 board and works with the same memory addresses that the processor uses. Pointers de-reference the same as the host application. CAPI, in effect, removes OS and device driver overhead by presenting an efficient, robust, durable and, most importantly, a direct interface. In the process, it offloads complexity.

In short, CAPI provides:

  • SMP Coherence Protocol transported over PCI Express interface
  • Provides isolation and filtering through the support unit in the processor (“CAPP”)
  • Manages caching and address translation through the standard POWER Service Layer in the accelerator device
  • Enables accelerator Functional Units to operate as part of the application at the user (direct) level, just like a CPU

What you end up with is a coherent connected accelerator for just a fraction of the development effort otherwise required. As such, CAPI enables more efficient accelerator development. It can reduce the typical seven-step I/O model flow (1-Device Driver Call, 2-Copy or Pin Source Data, 3-MMIO Notify Accelerator, 4-Acceleration, 5-Poll/Int Completion, 6-Copy or Unpin Result Data, 7-Return From Device Driver Completion) to just three steps (1-shared memory/notify accelerator, 2-acceleration, and 3-shared memory completion). The result is an easier, more natural programming model with traditional thread-level programming and no need to restructure the application to accommodate long latency I/O.  Finally it enables apps otherwise not possible, such as those requiring pointer chasing (e.g. Java garbage-collection).

Other advantages include an open ecosystem for accelerators built using Field Programmable Gate Arrays (FPGA). The number and size of FPGAs can be based on application requirements, and FPGAs can attach to other components, such as private DRAM, flash memory, or a high-speed network.

Driving the need for CAPI is the insatiable demand for performance.  For that, acceleration is required, which is complicated and resource-intensive to build. So IBM created CAPI, not just for pure compute but for any network-attached or storage-attached I/O. In the end it eliminates the overhead of the I/O subsystem, allowing the focus to be on the workload.

In one example IBM reported it was able to attach an IBM Flash appliance to POWER8 via the CAPI interface. As a result it could generate Read/Write commands from applications and eliminate 97% of code path length, a savings of 20-30 cores per 1M IOPS. In another test IBM reported being able to leverage CAPI to integrate flash into a server; the memory-like semantics allowed the flash to replace DRAM for many in-memory workloads. The result: 5x cost savings plus large density and energy improvements. Furthermore, by eliminating the I/O subsystem overhead from high IOPS flash access, it freed the CPU to focus on the application workload.

Finally, in a Monte Carlo simulation of 1 million iterations, a POWER8 core with FPGA and CAPI ran a full execution of the Heston pricing model for a single security 250x faster than the POWER8 core alone. It also proved easier to code, reducing the lines of C code to write by 40x compared to non-CAPI FPGA.

IBM is just getting started with CAPI. Coming up next will be CAPI working with Linux, mainly for use with analytics. Once Linux comes into the picture, expect more PCIe card vendors to deliver products that leverage CAPI. AIX too comes into the picture down the road.

Plan to attend IBM Enterprise2014 in Las Vegas, Oct. 6-19. Here is one intriguing CAPI presentation that will be there: Light up performance of your LAMP apps with a stack optimized for Power, by Alise Spence, Andi Gutmans, and Antonio Rosales. It will discuss how to leverage CAPI with POWER8 to create what they call a “killer stack” that brings together continuous delivery with exceptional performance at a competitive price. Other CAPI sessions also are in the works for Enterprise2014.

DancingDinosaur (Alan Radding) definitely is attending IBM Enterprise2014. You can follow DancingDinosaur on Twitter, @mainframeblog, or check out Technologywriter.com. Upcoming posts will look more closely at Enterprise2014 and explore some session content.


%d bloggers like this: