Posts Tagged ‘I/O’

Value and Power of LinuxOne Emperor II

February 4, 2018

There is much value n the mainframe but it doesn’t become clear until you do a full TCO analysis. When you talk to an IBMer about the cost of a mainframe the conversation immediately shifts to TCO, usually in the form of how many x86 systems you would have to deploy to handle a comparable workload with similar quality of service.  The LinuxONE Emperor II, introduced in September, can beat those comparisons.

LinuxONE Emperor II

Proponents of x86 boast about the low acquisition cost of x86 systems. They are right if you are only thinking about a low initial acquisition cost. But you also have to think about the cost of software for each low-cost core you purchase, and for many enterprise workloads you will need to acquire a lot of cores. This is where costs can mount quickly.

As a result, software will likely become the highest TCO item because many software products are priced per core.  Often the amount charged for cores is determined by the server’s maximum number of physical cores, regardless of whether they actually are activated. In addition, some architectures require more cores per workload. Ouch! An inexpensive device suddenly becomes a pricy machine when all those cores are tallied and priced.

Finally, x86 to IBM Z core ratios differ per workload, but x86 almost invariably requires more cores than a z-based workload; remember, any LinuxONE is a Z System. For example, the same WebSphere workload on x86 that requires 10 – 12 cores may require only one IFL on the Z. The lesson here: whether you’re talking about system software or middleware, you have to consider the impact of software on TCO.

The Emperor II delivers stunning specs. The machine can be packed with up to 170 cores, as much as 32 TB of memory, and 160 PCIe slots. And it is flexible; use this capacity, for instance, to add more system resources—cores or memory—to service an existing Linux instance or clone more Linux instances. Think of it as scale-out capabilities on steroids, taking you far beyond what you can achieve in the x86 world and do it with just a few keystrokes. As IBM puts it, you might:

  • Dynamically add cores, memory, I/O adapters, devices, and network cards without disruption.
  • Grow horizontally by adding Linux instances or grow vertically by adding resources (memory, cores, slots) to existing Linux guests.
  • Provision for peak utilization.
  • After the peak subsides automatically return unused resources to the resource pool for reallocation to another workload.

So, what does this mean to most enterprise Linux data centers? For example, IBM often cites a large insurance firm. The insurer needed fast and flexible provisioning for its database workloads. The company’s approach directed it to deploy more x86 servers to address growth. Unfortunately, the management of software for all those cores had become time consuming and costly. The company deployed 32 x86 servers with 768 cores running 384 competitor’s database licenses.

By leveraging elastic pricing on the Emperor II, for example, it only needed one machine running 63 IFLs serving 64 competitor’s database licenses.  It estimated savings of $15.6 million over 5 years just by eliminating charges for unused cores. (Full disclosure: these figures are provided by IBM; DancingDinosaur did not interview the insurer to verify this data.) Also, note there are many variables at play here around workloads and architecture, usage patterns, labor costs, and more. As IBM warns: Your results may vary.

And then there is security. Since the Emperor II is a Z it delivers all the security of the newest z14, although in a slightly different form. Specifically, it provides:

  • Ultimate workload isolation and pervasive encryption through Secure Service Containers
  • Encryption of data at rest without application change and with better performance than x86
  • Protection of data in flight over the network with full end-to-end network security
  • Use of Protected Keys to secure data without giving up performance
  • Industry-leading secure Java performance via TLS (2-3x faster than Intel)

BTW the Emperor II also anchors IBM’s Blockchain cloud service. That calls for security to the max. In the end. the Emperor II is unlike any x86 Linux system.

  • EAL 5+ isolation, best in class crypto key protection, and Secure Service Containers
  • 640 Power cores in its I/O channels (not included in the core count)
  • Leading I/O capacity and performance in the industry
  • IBM’s shared memory vertical scale architecture with a better architecture for stateful workloads like databases and systems of record
  • Hardware designed to give good response time even with 100% utilization, which simplifies the solution and reduces the extra costs x86 users assume are necessary because they’re used to keeping a utilization safety margin.

This goes far beyond TCO.  Just remember all the things the Emperor II brings: scalability, reliability, container-based security and flexibility, and more.

…and Go Pats!

DancingDinosaur is Alan Radding, a Boston-based veteran information technology analyst, writer, and ghost-writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.

OpenCAPI, Gen-Z, CCIX Initiate a New Computing Era

October 20, 2016

The next generation data center will be a more open, cooperative, and faster place judging from the remarkably similar makeup of three open consortia, OpenCAPI , Gen-Z, and CCIX. CCIX allows processors based on different instruction set architectures to extend their cache coherency to accelerators, interconnect, and I/O.

OpenCAPI provides a way to attach accelerators and I/O devices with coherence and virtual addressing to eliminate software inefficiency associated with the traditional I/O subsystem, and to attach advanced memory technologies.  The focus of OpenCAPI is on attached devices primarily within a server. Gen-Z, announced around the same time, is a new data access technology that primarily enables read and write operations among disaggregated memory and storage.

open-power-rethink-datacenter

Rethink the Datacenter

It’s quite likely that your next data center will use all three. The OpenCAPI group includes AMD, Dell EMC, Google, Hewlett Packard Enterprise, IBM, Mellanox Technologies, Micron, NVIDIA and Xilinx. Their new specification promises to enable up to 10X faster server performance with the first products expected in the second half of 2017.

The Gen-Z consortium consists Advanced Micro Devices, Broadcom, Huawei Technologies, Red Hat, Micron, Xilinx, Samsung, IBM, and Cray. Other founding members are Cavium, IDT, Mellanox Technologies, Microsemi, Seagate, SK Hynix, and Western Digital. They plan to develop a scalable computing interconnect and protocol that will enable systems to keep with the rapidly rising tide of data that is being generated and that needs to be analyzed. This will require the rapid movement of high volumes of data between memory and storage.

The CCIX initial members include Amphenol Corp., Arteris Inc., Avery Design Systems, Atos, Cadence Design Systems, Inc., Cavium, Inc., Integrated Device Technology, Inc., Keysight Technologies, Inc., Micron Technology, Inc., NetSpeed Systems, Red Hat Inc., Synopsys, Inc., Teledyne LeCroy, Texas Instruments, and TSMC.

The basic problem all three address revolves around how to make the volume and variety of new hardware forge fast communications and work together. In effect each group, from its own particular perspective, aims to boost the performance and interoperability of data center servers, devices, and components engaged in generating and handling myriad data and tasked with analyzing large amounts of that data. This will only be compounded as IoT, blockchain, and cognitive computing ramp up.

To a large extent, this results from the inability of Moore’s Law to continue to double the number of processors indefinitely. Future advances must rely on different sorts of hardware tweaks and designs to deliver greater price/performance.

Then in Aug. 2016 IBM announced a related chip breakthrough.  It unveiled the industry’s first 7 nm chip that could hold more than 20 billion tiny switches or transistors for improved computing power. The new chips could help meet demands of future cloud computing and Big Data systems, cognitive computing, mobile products and other emerging technologies, according to IBM.

Most chips today in servers and other devices use microprocessors between 14 and 22 nanometers (nm). The 7nm technology represents at least a 50 percent power improvement. IBM intends to apply the new chips to analyze DNA, viruses, and exosomes. IBM expects to test this lab-on-a-chip technology starting with prostate cancer.

The point of this digression into chips and Moore’s Law is to suggest the need for tools and interfaces like Open CAPI, Gen-Z, and CCIX. As the use cases for ultra fast data analytics expands along with the expected proliferation of devices speed becomes critical. How long do you want to wait for an analysis of your prostate or breast cells? If the cells are dear to you, every nanosecond matters.

For instance, OpenCAPI provides an open, high-speed pathway for different types of technology – advanced memory, accelerators, networking and storage – to more tightly integrate their functions within servers. This data-centric approach to server design puts the compute power closer to the data and removes inefficiencies in traditional system architectures to help eliminate system bottlenecks that significantly improve server performance.  In some cases OpenCAPI enables system designers to access memory with sub-500 nanosecond latency.

IBM plans to introduce POWER9-based servers that leverage the OpenCAPI specification in the second half of 2017. Similarly, expect other members of OpenPOWER Foundation to introduce OpenCAPI enabled products in the same time frame. In addition, Google and Rackspace’s new server under development, codenamed Zaius and announced at the OpenPOWER Summit in San Jose, will leverage POWER9 processor technology and plans to provide the OpenCAPI interface in its design. Also, Mellanox plans to enable the new specification capabilities in its future products and Xilinx plans to support OpenCAPI enabled FPGAs

As reported at the Gen-Z announcement, “The formation of these new consortia (CCIX, OpenCAPI, and Gen-Z), backed by more than 30 industry-leading global companies, supports the premise that the datacenter of the future will require open standards. We look forward to collaborating with CCIX and OpenCAPI as this new ecosystem takes shape,” said Kurtis Bowman, Gen-Z Consortium president. Welcome to the 7nm computing era.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghostwriter. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.

 

IBM POWER8 CAPI for Efficient Top Performance

August 21, 2014

IBM’s Power Systems Power8 Coherent Accelerator Processor Interface (CAPI) is not for every IT shop running Power Systems. However, for those that aim to attach devices to their POWER8 systems over the PCIe interface and want fast, efficient performance CAPI will be unbeatable.  Steve Fields, IBM Distinguished Engineer and Director of Power Systems Design introduces it here. Some of it gets pretty geeky but slides #12-17 make the key points.

DancingDinosaur first covered CAPI here, in April, shortly after its introduction. At that point it looked like CAPI would be a game changer and nothing since suggests otherwise. As we described it then, CAPI sits directly on the POWER8 board and works with the same memory addresses that the processor uses. Pointers de-reference the same as the host application. CAPI, in effect, removes OS and device driver overhead by presenting an efficient, robust, durable and, most importantly, a direct interface. In the process, it offloads complexity.

In short, CAPI provides:

  • SMP Coherence Protocol transported over PCI Express interface
  • Provides isolation and filtering through the support unit in the processor (“CAPP”)
  • Manages caching and address translation through the standard POWER Service Layer in the accelerator device
  • Enables accelerator Functional Units to operate as part of the application at the user (direct) level, just like a CPU

What you end up with is a coherent connected accelerator for just a fraction of the development effort otherwise required. As such, CAPI enables more efficient accelerator development. It can reduce the typical seven-step I/O model flow (1-Device Driver Call, 2-Copy or Pin Source Data, 3-MMIO Notify Accelerator, 4-Acceleration, 5-Poll/Int Completion, 6-Copy or Unpin Result Data, 7-Return From Device Driver Completion) to just three steps (1-shared memory/notify accelerator, 2-acceleration, and 3-shared memory completion). The result is an easier, more natural programming model with traditional thread-level programming and no need to restructure the application to accommodate long latency I/O.  Finally it enables apps otherwise not possible, such as those requiring pointer chasing (e.g. Java garbage-collection).

Other advantages include an open ecosystem for accelerators built using Field Programmable Gate Arrays (FPGA). The number and size of FPGAs can be based on application requirements, and FPGAs can attach to other components, such as private DRAM, flash memory, or a high-speed network.

Driving the need for CAPI is the insatiable demand for performance.  For that, acceleration is required, which is complicated and resource-intensive to build. So IBM created CAPI, not just for pure compute but for any network-attached or storage-attached I/O. In the end it eliminates the overhead of the I/O subsystem, allowing the focus to be on the workload.

In one example IBM reported it was able to attach an IBM Flash appliance to POWER8 via the CAPI interface. As a result it could generate Read/Write commands from applications and eliminate 97% of code path length, a savings of 20-30 cores per 1M IOPS. In another test IBM reported being able to leverage CAPI to integrate flash into a server; the memory-like semantics allowed the flash to replace DRAM for many in-memory workloads. The result: 5x cost savings plus large density and energy improvements. Furthermore, by eliminating the I/O subsystem overhead from high IOPS flash access, it freed the CPU to focus on the application workload.

Finally, in a Monte Carlo simulation of 1 million iterations, a POWER8 core with FPGA and CAPI ran a full execution of the Heston pricing model for a single security 250x faster than the POWER8 core alone. It also proved easier to code, reducing the lines of C code to write by 40x compared to non-CAPI FPGA.

IBM is just getting started with CAPI. Coming up next will be CAPI working with Linux, mainly for use with analytics. Once Linux comes into the picture, expect more PCIe card vendors to deliver products that leverage CAPI. AIX too comes into the picture down the road.

Plan to attend IBM Enterprise2014 in Las Vegas, Oct. 6-19. Here is one intriguing CAPI presentation that will be there: Light up performance of your LAMP apps with a stack optimized for Power, by Alise Spence, Andi Gutmans, and Antonio Rosales. It will discuss how to leverage CAPI with POWER8 to create what they call a “killer stack” that brings together continuous delivery with exceptional performance at a competitive price. Other CAPI sessions also are in the works for Enterprise2014.

DancingDinosaur (Alan Radding) definitely is attending IBM Enterprise2014. You can follow DancingDinosaur on Twitter, @mainframeblog, or check out Technologywriter.com. Upcoming posts will look more closely at Enterprise2014 and explore some session content.


%d bloggers like this: