Posts Tagged ‘high performance computing (HPC)’

IBM Shows Off POWER and NVIDIA GPU Setting High Performance Record 

May 4, 2017

The record achievement used 60 Power processors and 120 GPU accelerators to shatter the previous supercomputer record, which used over a 700,000 processors. The results point to how dramatically the capabilities of high performance computing (HPC) has increase while the cost of HPC systems has declined. Or put another way: the effort demonstrates the ability of NVIDIA GPUs to simulate one billion cell models in a fraction of the time, while delivering 10x the performance and efficiency.

Courtesy of IBM: Takes a lot of processing to take you into a tornado

In short, the combined success of IBM and NVIDIA puts the power of cognitive computing within the reach of mainstream enterprise data centers. Specifically the project performed reservoir modeling to predict the flow of oil, water, and natural gas in the subsurface of the earth before they attempt to extract the maximum oil in the most efficient way. The effort, in this case, involved a billion-cell simulation, which took just 92 minutes using 30 for HPC servers equipped with 60 POWER processors and 120 NVIDIA Tesla P100 GPU accelerators.

“This calculation is a very salient demonstration of the computational capability and density of solution that GPUs offer. That speed lets reservoir engineers run more models and ‘what-if’ scenarios than previously,” according to Vincent Natoli, President of Stone Ridge Technology, as quoted in the IBM announcement. “By increasing compute performance and efficiency by more than an order of magnitude, we’re democratizing HPC for the reservoir simulation community,” he added.

“The milestone calculation illuminates the advantages of the IBM POWER architecture for data-intensive and cognitive workloads.” said Sumit Gupta, IBM Vice President, High Performance Computing, AI & Analytics in the IBM announcement. “By running Stone Ridge’s ECHELON on IBM Power Systems, users can achieve faster run-times using a fraction of the hardware.” Gupta continued. The previous record used more than 700,000 processors in a supercomputer installation that occupies nearly half a football field while Stone Ridge did this calculation on two racks of IBM Power Systems that could fit in the space of half a ping-pong table.”

This latest advance challenges perceived misconceptions that GPUs could not be efficient on complex application codes like reservoir simulation and are better suited to simple, more naturally parallel applications such as seismic imaging. The scale, speed, and efficiency of the reported result disprove this misconception. The milestone calculation with a relatively small server infrastructure enables small and medium-size oil and energy companies to take advantage of computer-based reservoir modeling and optimize production from their asset portfolio.

Billion cell simulations in the industry are rare in practice, but the calculation was accomplished to highlight the performance differences between new fully GPU-based codes like the ECHELON reservoir simulator and equivalent legacy CPU codes. ECHELON scales from the cluster to the workstation and while it can simulate a billion cells on 30 servers, it can also run smaller models on a single server or even on a single NVIDIA P100 board in a desktop workstation, the latter two use cases being more in the sweet spot for the energy industry, according to IBM.

As importantly, the company notes, this latest breakthrough showcases the ability of IBM Power Systems with NVIDIA GPUs to achieve similar performance leaps in other fields such as computational fluid dynamics, structural mechanics, climate modeling, and others that are widely used throughout the manufacturing and scientific community. By taking advantage of POWER and GPUs organizations can literally do more with less, which often is an executive’s impossible demand.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at and here.


IBM Power System S822LC for HPC Beat Sort Record by 3.3x

November 17, 2016

The new IBM Power System S822LC for High Performance Computing servers set a new benchmark for sorting by taking less than 99 seconds (98.8 seconds) to finish sorting 100 terabytes of data in the Indy GraySort category, improving on last year’s best result, 329 seconds, by a factor of 3.3. The win proved a victory not only for the S822LC but for the entire OpenPOWER community. The team of Tencent, IBM, and Mellanox has been named the Winner of the Sort Benchmark annual global computing competition for 2016.

rack-of-new-ibm-power-systems-s822lc-for-high-performance-computing-servers-1Power System S822LC for HPC

Specifically, the machine, an IBM Power S822LC for High Performance Computing (HPC), features NVIDIA NVLink technology optimized for the Power architecture and NVIDIA’s latest GPU technology. The new system supports emerging computing methods of artificial intelligence, particularly deep learning. The combination, newly dubbed IBM PowerAI, provides a continued path for Watson, IBM’s cognitive solutions platform, to extend its artificial intelligence expertise in the enterprise by using several deep learning methods to train Watson.

Actually Tencent Cloud Data Intelligence (the distributed computing platform of Tencent Cloud) won each category in both the GraySort and MinuteSort benchmarks, establishing four new world records with its performance, outperforming the 2015 best speeds by 2-5x. Said Zeus Jiang, Vice President of Tencent Cloud and General Manager of Tencent’s Data Platform Department: “In the future, the ability to manage big data will be the foundation of successful Internet businesses.”

To get this level of performance Tencent runs 512 IBM OpenPOWER LC servers and Mellanox’100Gb interconnect technology, improving the performance of Tencent Cloud big data products with the infrastructure. Online prices for the S822LC starts at about $9600 for 2-socket, 2U with up to 20 cores (2.9-3.3Ghz), 1 TB memory (32 DIMMs), 230 GB/sec sustained memory bandwidth, 2x SFF (HDD/SSD), 2 TB storage, 5 PCIe slots, 4 CAPI enabled, up to 2 NVidia K80 GPU. Be sure to shop for volume discounts.

The 2016 Sort Benchmark Results below (apologies in advance if this table breaks apart)

Sort Benchmark Competition 20 Records (Tencent Cloud ) 2015 World Records 2016 Improvement
Daytona GraySort 44.8 TB/min 15.9 TB/min 2.8X greater performance
Indy GraySort 60.7 TB/min 18.2 TB/min 3.3X greater performance
Daytona MinuteSort 37 TB/min 7.7 TB/min 4.8X greater performance
Indy MinuteSort 55 TB/min 11 TB/min 5X greater performance

Pretty impressive, huh. As IBM explains it: Tencent Cloud used 512 IBM OpenPOWER servers and Mellanox’100Gb interconnect technology, improving the performance of Tencent Cloud big data products with the infrastructure. Then Tom Rosamilia, IBM Senior VP weighed in: “Industry leaders like Tencent are helping IBM and our OpenPOWER partners push performance boundaries for a cognitive era defined by big data and advanced analytics.” The computing record achieved by Tencent Cloud on OpenPOWER turned out to be an important milestone for the OpenPOWER Foundation too.

Added Amir Prescher, Sr. Vice President, Business Development, at Mellanox Technologies: “Real-time-analytics and big data environments are extremely demanding, and the network is critical in linking together the extra high performance of IBM POWER-based servers and Tencent Cloud’s massive amounts of data,” In effect, Tencent Cloud developed an optimized hardware/software platform to achieve new computing records while demonstrating that Mellanox’s 100Gb/s Ethernet technology can deliver total infrastructure efficiency and improve application performance, which should make it a favorite for big data applications.

Behind all of this was the new IBM Power System S822LC for High Performance Computing servers. Currently the servers feature a new IBM POWER8 chip designed for demanding workloads including artificial intelligence, deep learning and advanced analytics.  However, a new POWER9 chips has already been previewed and is expected next year.  Whatever the S822LC can do running POWER8 just imagine how much more it will do running POWER9, which IBM describes as a premier acceleration platform. DancingDinosaur covered POWER9 in early Sept. here.

To capitalize on the hardware, IBM is making a new deep learning software toolkit available, PowerAI, which runs on the recently announced IBM Power S822LC server built for artificial intelligence that features NVIDIA NVLink interconnect technology optimized for IBM’s Power architecture. The hardware-software combination provides more than 2X performance over comparable servers with 4 GPUs running AlexNet with Caffe. The same 4-GPU Power-based configuration running AlexNet with BVLC Caffe can also outperform 8 M40 GPU-based x86 configurations, making it the world’s fastest commercially available enterprise systems platform on two versions of a key deep learning framework.

Deep learning is a fast growing, machine learning method that extracts information by crunching through millions of pieces of data to detect and ranks the most important aspects of the data. Publicly supported among leading consumer web and mobile application companies, deep learning is quickly being adopted by more traditional enterprises across a wide range of industry sectors; in banking to advance fraud detection through facial recognition; in automotive for self-driving automobiles; and in retail for fully automated call centers with computers that can better understand speech and answer questions. Is your data center ready for deep learning?

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at and here.



Latest IBM Initiatives Drive Power Advantages over x86

November 20, 2015

This past week IBM announced a multi-year strategic collaboration between it and Xilinx that aims to enable higher performance and energy-efficient data center applications through Xilinx FPGA-enabled workload acceleration on IBM POWER-based systems. The goal is to deliver open acceleration infrastructures, software, and middleware to address applications like machine learning, network functions virtualization (NFV), genomics, high performance computing (HPC), and big data analytics. In the process, IBM hopes to put x86 systems at an even greater price/performance disadvantage.


Courtesy of IBM

At the same time IBM and several fellow OpenPOWER Foundation members revealed new technologies, collaborations and developer resources to enable clients to analyze data more deeply and at high speed. The new offerings center on the tight integration of IBM’s open and licensable POWER processors with accelerators and dedicated high performance x86e processors optimized for computationally intensive software code. The accelerated POWER-based offerings come at a time when many companies are seeking the best platform for Internet of Things, machine learning, and other performance hungry applications.

The combination of collaborations and alliances are clearly aimed at establishing Power as the high performance leader for the new generation of workloads. Noted IBM, independent software vendors already are leveraging IBM Flash Storage attached to CAPI to create very large memory spaces for in-memory processing of analytics, enabling the same query workloads to run with a fraction of the number of servers compared to commodity x86 solutions.  These breakthroughs enable POWER8-based systems to continue where the promise of Moore’s Law falls short, by delivering performance gains through OpenPOWER ecosystem-driven, full stack innovation. DancingDinosaur covered efforts to expand Moore’s Law on the z a few weeks back here.

The new workloads present different performance challenges. To begin, we’re talking about heterogeneous workloads that are becoming increasingly prevalent, forcing data centers to turn to application accelerators just to keep up with the demands for throughput and latency at low power. The Xilinx All Programmable FPGAs promise to deliver the power efficiency that makes accelerators practical to deploy throughout the data center. Just combine IBM’s open and licensable POWER architecture with Xilinx FPGAs to deliver compelling performance, performance/watt, and lower total cost of ownership for this new generation of data centers workloads.

As part of the IBM and Xilinx strategic collaboration, IBM Systems Group developers will create solution stacks for POWER-based servers, storage, and middleware systems with Xilinx FPGA accelerators for data center architectures such as OpenStack, Docker, and Spark. IBM will also develop and qualify Xilinx accelerator boards for IBM Power Systems servers. Xilinx is developing and will release POWER-based versions of its leading software defined SDAccel™ Development Environment and libraries for the OpenPOWER developer community.

But there is more than this one deal. IBM is promising new products, collaborations and further investments in accelerator-based solutions on top of the POWER processor architecture.  Most recently announced were:

The coupling of NVIDIA® Tesla® K80 GPUs, the flagship offering of the NVIDIA Tesla Accelerated Computing Platform, with Watson’s POWER-based architecture to accelerate Watson’s Retrieve and Rank API capabilities to 1.7x of its normal speed. This speed-up can further improve the cost-performance of Watson’s cloud-based services.

On the networking front Mellanox announced the world’s first smart network switch, the Switch-IB 2, capable of delivering an estimated 10x system performance improvement. NEC also announced availability of its ExpEther Technology suited for POWER architecture-based systems, along with plans to leverage IBM’s CAPI technology to deliver additional accelerated computing value in 2016.

Finally, two OpenPOWER members, E4 Computer Engineering and Penguin Computing, revealed new systems based on the OpenPOWER design concept and incorporating IBM POWER8 and NVIDIA Tesla GPU accelerators. IBM also reported having ported a series of key IBM Internet of Things, Spark, Big Data, and Cognitive applications to take advantage of the POWER architecture with accelerators.

The announcements include the names of partners and products but product details were in short supply as were cost and specific performance details. DancingDinosaur will continue to chase those down.

DancingDinosaur is Alan Radding, a veteran information technology analyst and writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at and here.

DancingDinosaur will not post the week of Thanksgiving. Have a delicious holiday.

%d bloggers like this: