Posts Tagged ‘InfoSphere BigInsights’

IBM Big Data Innovations Heading to System z

April 4, 2013

Earlier this week IBM announced new technologies intended to help companies and governments tackle Big Data by making it simpler, faster and more economical to analyze massive amounts of data. Its latest innovations, IBM suggested, would drive reporting and analytics results as much as 25 times faster.

The biggest of IBM’s innovations is BLU Acceleration, targeted initially for DB2. It combines a number of techniques to dramatically improve analytical performance and simplify administration. A second innovation, referred to as the enhanced Big Data Platform, improves the use and performance of the InfoSphere BigInsights and InfoSphere Streams products. Finally, it announced the new IBM PureData System for Hadoop, designed to make it easier and faster to deploy Hadoop in the enterprise.

BLU Acceleration is the most innovative of the announcements, probably a bona fide industry first, although others, notably Oracle, are scrambling to do something similar. BLU Acceleration enables much faster access to information by extending the capabilities of in-memory systems. It allows the loading of data into RAM instead of residing on hard disks for faster performance and dynamically moves unused data to storage.  It even works, according to IBM, when data sets exceed the size of the memory.

Another innovation included in BLU Acceleration is data skipping, which allows the system to skip over irrelevant data that doesn’t need to be analyzed, such as duplicate information. Other innovations include the ability to analyze data in parallel across different processors; the ability to analyze data transparently to the application, without the need to develop a separate layer of data modeling; and actionable compression, where data no longer has to be decompressed to be analyzed because the data order has been preserved.   Finally, it leverages parallel vector processing, which enables multi-core and SIMD (Single Instruction Multiple Data) parallelism.

During testing, IBM reported, some queries in a typical analytics workload ran more than 1000x faster when using the combined innovations of BLU Acceleration. It also resulted in 10x storage space savings during beta tests. BLU acceleration will be used first in DB2 10.5 and Informix 12.1 TimeSeries for reporting and analytics. It will be extended for other data workloads and to other products in the future.

BLU Acceleration promises to be as easy to use as load-and-go.  BLU tables coexist with traditional row tables; using the same schema, storage, and memory. You can query any combination of row or BLU (columnar) tables, and IBM assures easy conversion of conventional tables to BLU tables.

DancingDinosaur likes seeing the System z included as an integral part of the BLU Acceleration program.  The z has been a DB2 workhorse and apparently will continue to be as organizations move into the emerging era of big data analytics. On top of its vast processing power and capacity, the z brings its unmatched quality of service.

Specifically, IBM has called out the z for:

  • InfoSphere BigInsights via the zEnterprise zBX for data exploration and online archiving
  • IDAA (in-memory Netezza technology) for reporting and analytics as well as operational analytics
  • DB2 for SQL and NoSQL transactions with enhanced Hadoop integration in DB2 11 (beta)
  • IMS for highest performance transactions with enhanced Hadoop integration  in IMS 13 (beta)

Of course, the zEnterprise is a full player in hybrid computing through the zBX so zEnterprise shops have a few options to tap when they want to leverage BLU Accelerator and IBM’s other big data innovations.

Finally, IBM announced the new IBM PureData System for Hadoop, which should simplify and streamline the deployment of Hadoop in the enterprise. Hadoop has become the de facto open systems approach to organizing and analyzing vast amounts of unstructured as well as structured data, such as posts to social media sites, digital pictures and videos, online transaction records, and cell phone location data. The problem with Hadoop is that it is not intuitive for conventional relational DBMS staff and IT. Vendors everywhere are scrambling to overlay a familiar SQL approach on Hadoop’s map/reduce method.

The new IBM PureData System for Hadoop promises to reduce from weeks to minutes the ramp-up time organizations need to adopt enterprise-class Hadoop technology with powerful, easy-to-use analytic tools and visualization for both business analysts and data scientists. It also provides enhanced big data tools for management, monitoring, development, and integration with many more enterprise systems.  The product represents the next step forward in IBM’s overall strategy to deliver a family of systems with built-in expertise that leverages its decades of experience in reducing the cost and complexity associated with information technology.

IBM Technical Computing Tackles Big Data

October 26, 2012

IBM Technical Computing, also referred to as high performance computing (HPC), bolstered its Platform Computing Symphony product for big data mainly by adding enterprise-ready InfoSphere BigInsights Hadoop capabilities. The Platform Symphony product now includes Apache Hadoop, map/reduce and indexing capabilities, application accelerators, and development tools. IBM’s recommended approach to simplifying and accelerating big data analytics entails the integration of Platform Symphony, General Parallel File System (GPFS), Intelligent Cluster, and DCS3700 storage.

This is not to say that IBM is leaving the traditional supercomputing and HPC market. Its Sequoia supercomputer recently topped the industry by delivering over 16 petaflops of performance.  Earlier this year it also unveiled the new LRZ SuperMUC system, built with IBM System x iDataPlex direct water cooled dx360 M4 servers encompassing more than 150,000 cores to provide a peak performance of up to three petaflops.  SuperMUC, run by Germany’s Bavarian Academy of Science’s Leibniz Supercomputing Centre, will be used to explore the frontiers of medicine, astrophysics, quantum chromodynamics, and other scientific disciplines.

But IBM is intent on broadening the scope of HPC by pushing it into mainstream business. With technical computing no longer just about supercomputers the company wants to extend technical computing to diverse industries. It already has a large presence in the petroleum, life sciences, financial services, automotive, aerospace, defense, and electronics for compute-intensive workloads. Now it is looking for new areas where a business can exploit technical computing for competitive gain.  Business analytics and big data are the first candidates that come to mind.

When it comes to big data, the Platform Symphony product already has posted some serious Hadoop benchmark results:

  • Terasort , a big data benchmark that tests the efficiency MapReduce clusters in handling very large datasets—Platform Symphony used 10x less cores
  • SWIM, a benchmark developed at UC Berkley that simulates real-world workload patterns on Hadoop clusters—Platform Symphony ran 6x faster
  • Sleep, a standard measure to compare core scheduling efficiency of MapReduce workloads—Platform Symphony came out 60x faster.

Technical computing at IBM involves System x, Power, System i, and PureFlex—just about everything except z. And it probably could run on the z too through x or p blades in the zBX.

Earlier this month IBM announced a number of technical computing enhancements including a high-performance, low-latency big data platform encompassing IBM’s Intelligent Cluster, Platform Symphony, IBM GPFS, and System Storage DCS3700. Specifically for Platform Symphony is a new low latency Hadoop multi-cluster capability that scales to 100,000 cores per application and shared memory logic for better big data application performance.

Traditionally, HPC customers coded their own software to handle the nearly mind-boggling complexity of the problems they were trying to solve. To expand technical computing to mainstream business, IBM has lined up a set of ISVs to provide packaged applications covering CAE, Life Science, EDA, and more. These include Rogue Wave, ScaleMP, Ansys, Altair, Accelrys, Cadence, Synopsys, and others.

IBM also introduced the new Flex System HPC Starter Configuration, a hybrid system that can handle both POWER7 and System x.  The starter config includes the Flex Enterprise Chassis, an Infiniband (IB) chassis switch, Power7 compute node, and an IB expansion card for Power or x86 nodes. Platform Computing software handles workload management and optimizes resources. IBM describes it as a high density, price/performance offering but hasn’t publicly provided any pricing. Still, it should speed time to HPC.

As technical computing goes mainstream it will increasingly focus on big data and Hadoop.  Compute-intensive, scientific-oriented companies already do HPC. The newcomers want to use big data techniques to identify fraud, reduce customer churn, make sense of customer sentiment, and similar activities associated with big data. Today that calls for Hadoop which has become the de facto standard for big data, although that may change going forward as a growing set of alternatives to Hadoop gain traction.


Get every new post delivered to your Inbox.

Join 572 other followers

%d bloggers like this: