Posts Tagged ‘PCIe’

IBM Refreshes its Storage for Multi-Cloud

October 26, 2018

IBM has refreshed almost its entire storage offerings virtually end to end; storage services to infrastructure and cloud to storage hardware, especially flash, to management. The announcement Oct. 23, covers wide array of storage products.

IBM Spectrum Discover

Among the most interesting of the announcements was IBM Spectrum Discover. The product automatically enhances and then leverages metadata to augment discovery capabilities. It pulls data insight from unstructured data for analytics, governance and optimization to improve and accelerate large-scale analytics, improve data governance, and enhance storage economics. At a time when data is growing at 30 percent per year finding the right data fast for analytics and AI can be slow and tedious. IBM Spectrum Discover rapidly ingests, consolidates, and indexes metadata for billions of files and objects from your data, enabling you to more easily gain insights from such massive amounts of unstructured data.

As important as Spectrum Discover is NVMe may attract more attention, in large part due to the proliferation of flash storage and the insatiable demand for increasingly faster performance. NVMe (non-volatile memory express) is the latest host controller interface and storage protocol created to accelerate the transfer of data between enterprise and client systems and solid-state drives (SSDs) over a computer’s high-speed Peripheral Component Interconnect Express (PCIe) bus.

According to IBM, NVMe addresses one of the hottest segments of the storage market, This is being driven by new solutions that, as IBM puts in, span the lifecycle of data from creation to archive.

Specifically, it is fueling major expansion of lower latency and higher throughput for NVMe fabric support across IBM’s storage portfolio. The company’s primary NVMe products introduced include:

  • New NVMe-based Storwize V7000 Gen3
  • NVMe over Fibre Channel across the flash portfolio
  • NVMe over Ethernet across the flash portfolio in 2019
  • IBM Cloud Object Storage to support in 2019

The last two are an IBM statement of direction, which is IBM’s way of saying it may or may not happen when or as expected.

Ironically, the economics of flash has dramatically reversed itself. Flash storage reduces cost as well as boosts performance. Until not too recently, flash was considered too costly for usual storage needs, something to be used selectively only when the cost justified its use due to the increased performance or efficiency. Thank you Moore’s Law and the economics of mass scale.

Maybe of greater interest to DancingDinosaur readers managing mainframe data centers is the improvements to the DS8000 storage lineup.  The IBM DS8880F is designed to deliver extreme performance, uncompromised availability, and deep integration with IBM Z. It remains the primary storage system supporting mainframe-based IT infrastructure. Furthermore, the new custom flash provides up to double maximum flash capacity in the same footprint.  An update to the zHyperLink solution also speeds application performance by significantly reducing both write and read latency.

In addition, the DS8880F offers:

  • Up to 2x maximum flash capacity
  • New 15.36TB custom flash
  • Up to 8 PB of physical capacity in the same physical space
  • Improved performance for zHyperLink connectivity
  • 2X lower write latency than High Performance FICON
  • 10X lower read latency

And, included is the next generation of High-Performance Flash Enclosures (HPFE Gen2), the DS8880F family delivers extremely low application response times, which can accelerate core transaction processes while expanding business operations into nextgen applications using AI to extract value from data. (See above, Spectrum Discover).

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog, and see more of his work at technologywriter.com.

IBM Refreshes its Storage for Multi-Cloud

October 26, 2018

IBM has refreshed almost its entire storage offerings virtually end to end; storage services to infrastructure and cloud to storage hardware, especially flash, to management. The announcement Oct. 23, covers wide array of storage products.

IBM Spectrum Discover

Among the most interesting of the announcements was IBM Spectrum Discover. The product automatically enhances and then leverages metadata to augment discovery capabilities. It pulls data insight from unstructured data for analytics, governance and optimization to improve and accelerate large-scale analytics, improve data governance, and enhance storage economics. At a time when data is growing at 30 percent per year finding the right data fast for analytics and AI can be slow and tedious. IBM Spectrum Discover rapidly ingests, consolidates, and indexes metadata for billions of files and objects from your data, enabling you to more easily gain insights from such massive amounts of unstructured data.

As important as Spectrum Discover is NVMe may attract more attention, in large part due to the proliferation of flash storage and the insatiable demand for increasingly faster performance. NVMe (non-volatile memory express) is the latest host controller interface and storage protocol created to accelerate the transfer of data between enterprise and client systems and solid-state drives (SSDs) over a computer’s high-speed Peripheral Component Interconnect Express (PCIe) bus.

According to IBM, NVMe addresses one of the hottest segments of the storage market, This is being driven by new solutions that, as IBM puts in, span the lifecycle of data from creation to archive.

Specifically, it is fueling major expansion of lower latency and higher throughput for NVMe fabric support across our storage portfolio. IBM’s primary NVMe products introduced include:

  • New NVMe-based Storwize V7000 Gen3
  • NVMe over Fibre Channel across the flash portfolio
  • NVMe over Ethernet across the flash portfolio in 2019
  • IBM Cloud Object Storage to support in 2019

The last two are an IBM statement of direction, which is IBM’s way of saying it may or may not happen when or as expected.

Ironically, the economics of flash has dramatically reversed itself. Flash storage reduces cost as well as boosts performance. Until not too recently, flash was considered too costly for usual storage needs, something to be used selectively only when the cost justified its use due to the increased performance or efficiency. Thank you Moore’s Law and the economics of mass scale.

Maybe of greater interest to DancingDinosaur readers managing mainframe data centers is the improvements to the DS8000 storage lineup. The IBM DS8880F is designed to deliver extreme performance, uncompromised availability, and deep integration with IBM Z through flash. The IBM DS8880F is designed to deliver extreme performance, uncompromised availability, and deep integration with IBM Z. It remains the primary storage system supporting mainframe-based IT infrastructure. Furthermore, the new custom flash provides up to double maximum flash capacity in the same footprint.  An update to the zHyperLink solution also speeds application performance by significantly reducing both write and read latency.

Designed to provide top performance for mission-critical applications, DS8880F is based on the same fundamental system architecture as IBM Watson. DS8880F, explains IBM, forms the three-tiered architecture that balances system resources for optimal throughput.

In addition, the DS8880F offers:

  • Up to 2x maximum flash capacity
  • New 15.36TB custom flash
  • Up to 8 PB of physical capacity in the same physical space
  • Improved performance for zHyperLink connectivity
  • 2X lower write latency than High Performance FICON
  • 10X lower read latency

And, included in the next generation of High-Performance Flash Enclosures (HPFE Gen2). The DS8880F family also delivers extremely low application response times, which can accelerate core transaction processes while expanding business operations into nextgen applications using AI to extract value from data. (See above, Spectrum Discover).

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Follow DancingDinosaur on Twitter, @mainframeblog, and see more of his work at technologywriter.com.

IBM Power System S822LC for HPC Beat Sort Record by 3.3x

November 17, 2016

The new IBM Power System S822LC for High Performance Computing servers set a new benchmark for sorting by taking less than 99 seconds (98.8 seconds) to finish sorting 100 terabytes of data in the Indy GraySort category, improving on last year’s best result, 329 seconds, by a factor of 3.3. The win proved a victory not only for the S822LC but for the entire OpenPOWER community. The team of Tencent, IBM, and Mellanox has been named the Winner of the Sort Benchmark annual global computing competition for 2016.

rack-of-new-ibm-power-systems-s822lc-for-high-performance-computing-servers-1Power System S822LC for HPC

Specifically, the machine, an IBM Power S822LC for High Performance Computing (HPC), features NVIDIA NVLink technology optimized for the Power architecture and NVIDIA’s latest GPU technology. The new system supports emerging computing methods of artificial intelligence, particularly deep learning. The combination, newly dubbed IBM PowerAI, provides a continued path for Watson, IBM’s cognitive solutions platform, to extend its artificial intelligence expertise in the enterprise by using several deep learning methods to train Watson.

Actually Tencent Cloud Data Intelligence (the distributed computing platform of Tencent Cloud) won each category in both the GraySort and MinuteSort benchmarks, establishing four new world records with its performance, outperforming the 2015 best speeds by 2-5x. Said Zeus Jiang, Vice President of Tencent Cloud and General Manager of Tencent’s Data Platform Department: “In the future, the ability to manage big data will be the foundation of successful Internet businesses.”

To get this level of performance Tencent runs 512 IBM OpenPOWER LC servers and Mellanox’100Gb interconnect technology, improving the performance of Tencent Cloud big data products with the infrastructure. Online prices for the S822LC starts at about $9600 for 2-socket, 2U with up to 20 cores (2.9-3.3Ghz), 1 TB memory (32 DIMMs), 230 GB/sec sustained memory bandwidth, 2x SFF (HDD/SSD), 2 TB storage, 5 PCIe slots, 4 CAPI enabled, up to 2 NVidia K80 GPU. Be sure to shop for volume discounts.

The 2016 Sort Benchmark Results below (apologies in advance if this table breaks apart)

Sort Benchmark Competition 20 Records (Tencent Cloud ) 2015 World Records 2016 Improvement
Daytona GraySort 44.8 TB/min 15.9 TB/min 2.8X greater performance
Indy GraySort 60.7 TB/min 18.2 TB/min 3.3X greater performance
Daytona MinuteSort 37 TB/min 7.7 TB/min 4.8X greater performance
Indy MinuteSort 55 TB/min 11 TB/min 5X greater performance

Pretty impressive, huh. As IBM explains it: Tencent Cloud used 512 IBM OpenPOWER servers and Mellanox’100Gb interconnect technology, improving the performance of Tencent Cloud big data products with the infrastructure. Then Tom Rosamilia, IBM Senior VP weighed in: “Industry leaders like Tencent are helping IBM and our OpenPOWER partners push performance boundaries for a cognitive era defined by big data and advanced analytics.” The computing record achieved by Tencent Cloud on OpenPOWER turned out to be an important milestone for the OpenPOWER Foundation too.

Added Amir Prescher, Sr. Vice President, Business Development, at Mellanox Technologies: “Real-time-analytics and big data environments are extremely demanding, and the network is critical in linking together the extra high performance of IBM POWER-based servers and Tencent Cloud’s massive amounts of data,” In effect, Tencent Cloud developed an optimized hardware/software platform to achieve new computing records while demonstrating that Mellanox’s 100Gb/s Ethernet technology can deliver total infrastructure efficiency and improve application performance, which should make it a favorite for big data applications.

Behind all of this was the new IBM Power System S822LC for High Performance Computing servers. Currently the servers feature a new IBM POWER8 chip designed for demanding workloads including artificial intelligence, deep learning and advanced analytics.  However, a new POWER9 chips has already been previewed and is expected next year.  Whatever the S822LC can do running POWER8 just imagine how much more it will do running POWER9, which IBM describes as a premier acceleration platform. DancingDinosaur covered POWER9 in early Sept. here.

To capitalize on the hardware, IBM is making a new deep learning software toolkit available, PowerAI, which runs on the recently announced IBM Power S822LC server built for artificial intelligence that features NVIDIA NVLink interconnect technology optimized for IBM’s Power architecture. The hardware-software combination provides more than 2X performance over comparable servers with 4 GPUs running AlexNet with Caffe. The same 4-GPU Power-based configuration running AlexNet with BVLC Caffe can also outperform 8 M40 GPU-based x86 configurations, making it the world’s fastest commercially available enterprise systems platform on two versions of a key deep learning framework.

Deep learning is a fast growing, machine learning method that extracts information by crunching through millions of pieces of data to detect and ranks the most important aspects of the data. Publicly supported among leading consumer web and mobile application companies, deep learning is quickly being adopted by more traditional enterprises across a wide range of industry sectors; in banking to advance fraud detection through facial recognition; in automotive for self-driving automobiles; and in retail for fully automated call centers with computers that can better understand speech and answer questions. Is your data center ready for deep learning?

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.

 

 

IBM POWER8 CAPI for Efficient Top Performance

August 21, 2014

IBM’s Power Systems Power8 Coherent Accelerator Processor Interface (CAPI) is not for every IT shop running Power Systems. However, for those that aim to attach devices to their POWER8 systems over the PCIe interface and want fast, efficient performance CAPI will be unbeatable.  Steve Fields, IBM Distinguished Engineer and Director of Power Systems Design introduces it here. Some of it gets pretty geeky but slides #12-17 make the key points.

DancingDinosaur first covered CAPI here, in April, shortly after its introduction. At that point it looked like CAPI would be a game changer and nothing since suggests otherwise. As we described it then, CAPI sits directly on the POWER8 board and works with the same memory addresses that the processor uses. Pointers de-reference the same as the host application. CAPI, in effect, removes OS and device driver overhead by presenting an efficient, robust, durable and, most importantly, a direct interface. In the process, it offloads complexity.

In short, CAPI provides:

  • SMP Coherence Protocol transported over PCI Express interface
  • Provides isolation and filtering through the support unit in the processor (“CAPP”)
  • Manages caching and address translation through the standard POWER Service Layer in the accelerator device
  • Enables accelerator Functional Units to operate as part of the application at the user (direct) level, just like a CPU

What you end up with is a coherent connected accelerator for just a fraction of the development effort otherwise required. As such, CAPI enables more efficient accelerator development. It can reduce the typical seven-step I/O model flow (1-Device Driver Call, 2-Copy or Pin Source Data, 3-MMIO Notify Accelerator, 4-Acceleration, 5-Poll/Int Completion, 6-Copy or Unpin Result Data, 7-Return From Device Driver Completion) to just three steps (1-shared memory/notify accelerator, 2-acceleration, and 3-shared memory completion). The result is an easier, more natural programming model with traditional thread-level programming and no need to restructure the application to accommodate long latency I/O.  Finally it enables apps otherwise not possible, such as those requiring pointer chasing (e.g. Java garbage-collection).

Other advantages include an open ecosystem for accelerators built using Field Programmable Gate Arrays (FPGA). The number and size of FPGAs can be based on application requirements, and FPGAs can attach to other components, such as private DRAM, flash memory, or a high-speed network.

Driving the need for CAPI is the insatiable demand for performance.  For that, acceleration is required, which is complicated and resource-intensive to build. So IBM created CAPI, not just for pure compute but for any network-attached or storage-attached I/O. In the end it eliminates the overhead of the I/O subsystem, allowing the focus to be on the workload.

In one example IBM reported it was able to attach an IBM Flash appliance to POWER8 via the CAPI interface. As a result it could generate Read/Write commands from applications and eliminate 97% of code path length, a savings of 20-30 cores per 1M IOPS. In another test IBM reported being able to leverage CAPI to integrate flash into a server; the memory-like semantics allowed the flash to replace DRAM for many in-memory workloads. The result: 5x cost savings plus large density and energy improvements. Furthermore, by eliminating the I/O subsystem overhead from high IOPS flash access, it freed the CPU to focus on the application workload.

Finally, in a Monte Carlo simulation of 1 million iterations, a POWER8 core with FPGA and CAPI ran a full execution of the Heston pricing model for a single security 250x faster than the POWER8 core alone. It also proved easier to code, reducing the lines of C code to write by 40x compared to non-CAPI FPGA.

IBM is just getting started with CAPI. Coming up next will be CAPI working with Linux, mainly for use with analytics. Once Linux comes into the picture, expect more PCIe card vendors to deliver products that leverage CAPI. AIX too comes into the picture down the road.

Plan to attend IBM Enterprise2014 in Las Vegas, Oct. 6-19. Here is one intriguing CAPI presentation that will be there: Light up performance of your LAMP apps with a stack optimized for Power, by Alise Spence, Andi Gutmans, and Antonio Rosales. It will discuss how to leverage CAPI with POWER8 to create what they call a “killer stack” that brings together continuous delivery with exceptional performance at a competitive price. Other CAPI sessions also are in the works for Enterprise2014.

DancingDinosaur (Alan Radding) definitely is attending IBM Enterprise2014. You can follow DancingDinosaur on Twitter, @mainframeblog, or check out Technologywriter.com. Upcoming posts will look more closely at Enterprise2014 and explore some session content.

3 Big Takeaways at IBM POWER8 Intro

April 24, 2014

POWER8 did not disappoint. IBM unveiled its latest generation of systems built on its new POWER8 technology on Wednesday, April 23.

DancingDinosaur sees three important takeaways from this announcement:

First, the OpenPOWER Foundation. It was introduced months ago and almost immediately forgotten. DancingDinosaur covered it at the time here. It had handful of small partners. Only one was significant, Google, and was it was hard to imagine Google bringing out open source POWER servers. Now the Foundation has several dozen members and it still is not clear what Google is doing there, but the Foundation clearly is gaining traction. You can expect more companies to join the Foundation in the coming weeks and months.

With the Foundation IBM swears it is committed to a true open ecosystem; one where even competitors can license the technology and bring out their own systems. At some point don’t be surprised to see white box Power systems below IBM’s price. More likely in the short term will be specialized Power appliances. What you get as a foundation member is the Power SOC design, Bus Specifications, Reference Designs, FW OS, and Hypervisor Open Source. It also includes access to Little Endian Linux, which will ease the migration of software to POWER. BTW, Google is listed as a member focusing on open source firmware and on the cloud and high performance computing.

Second, the POWER8 processor itself and the new family of systems. The processor, designed for big data, will run more concurrent queries and run them up to 50x fast than x86 with 4x more threads per core than x86. Its I/O bandwidth is 5x faster than POWER7. It can handle 1TB of memory with 4-6x more memory bandwidth and more than 3x more on-chip cache than an x86. The processor itself will utilize 22nm circuits and run 2.5 -5 GHz.

POWER8 sports an eight-threaded processor. That means each of the 12 cores in the CPU will coordinate the processing of eight sets of instructions at a time for a total of 96 processes. Each process consists of a set of related instructions making up a discrete process within a program. By designating sections of an application that can run as a process and coordinate the results, a chip can accomplish more work than a single-threaded chip, IBM explains. By comparison, IBM reports Intel’s Ivy Bridge E5 Xeon CPUs are double-threaded cores, with up to eight cores, handling 16 processes at a time (compared to 96 with POWER8).  Yes, there is some coordination overhead incurred as more threads are added. Still the POWER8 chip should attract interest among white box manufacturers and users of large numbers of servers processing big data.

Third is CAPI, your newest acronym.  If something is going to be a game-changer, this will be it. The key is to watch for adoption. Coherent Accelerator Processor Interface (CAPI) sits directly on the POWER8 and works with the same memory addresses that the processor uses. Pointers de-referenced same as the host application. CAPI, in effect, removes OS and device driver overhead by presenting an efficient, robust, durable interface. In the process, it offloads complexity.

CAPI can reduce the typical seven-step I/O model flow to three steps (shared memory/notify accelerator, acceleration, and shared memory completion). The advantages revolve around virtual addressing and data caching through shared memory and reduced latency for highly referenced data. [see accompanying graphic] It also enables an easier, natural programming model with traditional thread level programming and eliminates the need to restructure the application to accommodate long latency I/O.  Finally it enables apps otherwise not possible, such as those requiring pointer chasing.

 CAPI Picture

It’s too early to determine if CAPI is a game changer but IBM has already started to benchmark some uses. For example, it ran NoSQL on POWER8 with CAPI and achieved a 5x cost reduction. When combined with IBM’s TMI flash it found it could:

  • Attack problem sets otherwise too big for the memory footprint
  • Deliver fast access to small chunks of data
  • Achieve high throughput for data or simplify object addressing through memory semantics.

CAPI brings programming efficiency and simplicity. It uses the PCIe physical interface for the easiest programming and fastest, most direct I/O performance. It enables better virtual addressing and data caching. Although it was intended for acceleration it works well for I/O caching. And it has been shown to deliver a 5x cost reduction with equivalent performance when attaching to flash.  In summary, CAPI enables you to regain infrastructure control and rein in costs to deliver services otherwise not feasible.

It will take time for CAPI to catch on. Developers will need to figure out where and how best to use it. But with CAPI as part of the OpenPOWER Foundation expect to see work taking off in a variety of directions. At a pre-briefing a few weeks ago, DancingDinosaur was able to walk through some very CAPI interesting demos.

As for the new POWER8 Systems lineup, IBM introduced 6 one- or two-socket systems, some for Linux others for all systems.  The systems, reportedly, will start below $8000.

You can follow Alan Radding/DancingDinosaur on Twitter: @mainframeblog. Also, please join me at IBM Edge2014, this May 19-23 at the Venetian in Las Vegas.  Find me in the bloggers lounge.


%d bloggers like this: