Posts Tagged ‘ETL’

IBM On-Premises Cognitive Means z Systems Only

February 16, 2017

Just in case you missed the incessant drumbeat coming out of IBM, the company committed to cognitive computing. But that works for z data centers since IBM’s cognitive system is available on-premises only for the z. Another z first: IBM just introduced Machine Learning (key for cognitive) for the private cloud starting with the z.


There are three ways to get IBM cognitive computing solutions: the IBM Cloud, Watson, or the z System, notes Donna Dillenberger, IBM Fellow, IBM Enterprise Solutions. The z, however, is the only platform that IBM supports for cognitive computing on premises (sorry, no Power). As such, the z represents the apex of programmatic computing, at least as IBM sees it. It also is the only IBM platform that supports cognitive natively; mainly in the form of Hadoop and Spark, both of which are programmatic tools.

What if your z told you that a given strategy had a 92% of success. It couldn’t do that until now with IBM’s recently released cognitive system for z.

Your z system today represents the peak of programmatic computing. That’s what everyone working in computers grew up with, going all the way back to Assembler, COBOL, and FORTRAN. Newer languages and operating systems have arrived since; today your mainframe can respond to Java or Linux and now Python and Anaconda. Still, all are based on the programmatic computing model.

IBM believes the future lies in cognitive computing. Cognitive has become the company’s latest strategic imperative, apparently trumping its previous strategic imperatives: cloud, analytics, big data, and mobile. Maybe only security, which quietly slipped in as a strategic imperative sometime 2016, can rival cognitive, at least for now.

Similarly, IBM describes itself as a cognitive solutions and cloud platform company. IBM’s infatuation with cognitive starts with data. Only cognitive computing will enable organizations to understand the flood of myriad data pouring in—consisting of structured, local data but going beyond to unlock the world of global unstructured data; and then to decision tree-driven, deterministic applications, and eventually, probabilistic systems that co-evolve with their users by learning along with them.

You need cognitive computing. It is the only way, as IBM puts it: to move beyond the constraints of programmatic computing. In the process, cognitive can take you past keyword-based search that provides a list of locations where an answer might be located to an intuitive, conversational means to discover a set of confidence-ranked possibilities.

Dillenberger suggests it won’t be difficult to get to the IBM cognitive system on z . You don’t even program a cognitive system. At most, you train it, and even then the cognitive system will do the heavy lifting by finding the most appropriate training models. If you don’t have preexisting training models, “just use what the cognitive system thinks is best,” she adds. Then the cognitive system will see what happens and learn from it, tweaking the models as necessary based on the results and new data it encounters. This also is where machine learning comes in.

IBM has yet to document payback and ROI data. Dillenberger, however, has spoken with early adopters.  The big promised payback, of course, will come from the new insights uncovered and the payback will be as astronomical or meager as you are in executing on those insights.

But there also is the promise of a quick technical payback for z data centers managers. When the data resides on z—a huge advantage for the z—you just run analytics where the data is. In such cases you can realize up to 3x the performance, Dillenberger noted.  Even if you have to pull data from some other location too you still run faster, maybe 2x faster. Other z advantages include large amounts of memory, multiple levels of cache, and multiple I/O processors get at data without impacting CPU performance.

When the data and IBM’s cognitive system resides on the z you can save significant money. “ETL consumed huge amounts of MIPS. But when the client did it all on the z, it completely avoided the costly ETL process,” Dillenberger noted. As a result, that client reported savings of $7-8 million dollars a year by completely bypassing the x-86 layer and ETL and running Spark natively on the z.

As Dillenberger describes it, cognitive computing on the z is here now, able to deliver a payback fast, and an even bigger payback going forward as you execute on the insights it reveals. And you already have a z, the only on-premises way to IBM’s Cognitive System.

DancingDinosaur is Alan Radding, a veteran information technology analyst, writer, and ghost-writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at and here.


Latest New Mainframe puts Apache Spark Native on the z System

April 1, 2016

IBM keeps rolling out new versions of the z System.  The latest is the z/OS Platform for Apache Spark announced earlier this week. The new machine is optimized for marketers, data analysts, and developers eager to apply advanced analytics to the z’s rich, resident data sets for real-time insights.


z/OS Platform for Apache Spark

Data is everything in the new economy; and the most and best data you can grab and the fastest you can analyze it, the more likely you will win. The z, already the center of a large, expansive data environment, is well positioned to drive winning data-fueled strategies.

IBM z/OS Platform for Apache Spark enables Spark, an open-source analytics framework, to run natively on z/OS. According to IBM, the new system is available now. Its key advantage:  to enable data scientists to analyze data in place on the system of origin. This eliminates the need to perform extract, transform and load (ETL), a cumbersome, slow, and costly process. Instead, with Spark the z breaks the bind between the analytics library and underlying file system.

Apache Spark provides an open-source cluster computing framework with in-memory processing to speed analytic applications up to 100 times faster compared to other technologies on the market today, according to IBM. Apache Spark can help reduce data interaction complexity, increase processing speed, and enhance mission-critical applications by enabling analytics that deliver deep intelligence. Considered highly versatile in many environments, Apache Spark is most regarded for its ease of use in creating algorithms that extract insight from complex data.

IBM’s goal lies not in eliminating the overhead of ETL but in fueling interest in cognitive computing. With cognitive computing, data becomes a fresh natural resource—an almost infinite and forever renewable asset—that can be used by computer systems to understand, reason and learn. To succeed in this cognitive era businesses must be able to develop and capitalize on insights before the insights are no longer relevant. That’s where the z comes in.

With this offering, according to IBM, accelerators from z Systems business partners can help organizations more easily take advantage of z Systems data and capabilities to understand market changes alongside individual client needs. With this kind of insight managers should be able to make the necessary business adjustments in real-time, which will speed time to value and advance cognitive business transformations among IBM customers.

At this point IBM has identified 3 business partners:

  1. Rocket Software, long a mainframe ISV, is bringing its new Rocket Launchpad solution, which allows z shops to try the platform using data on z/OS.
  1. DataFactZ is a new partner working with IBM to develop Spark analytics based on Spark SQL and MLlib for data and transactions processed on the mainframe.
  1. Zementis brings its in-transaction predictive analytics offering for z/OS with a standards-based execution engine for Apache Spark. The product promises to allow users to deploy and execute advanced predictive models that can help them anticipate end users’ needs, compute risk, or detect fraud in real-time at the point of greatest impact, while processing a transaction.

This last point—detecting problems in real time at the point of greatest impact—is really the whole reason for Spark on z/OS.  You have to leverage your insight before the prospect makes the buying decision or the criminal gets away with a fraudulent transaction. After that your chances are slim to none of getting a prospect to reverse the decision or to recover stolen goods. Having the data and logic processing online and in-memory on the z gives you the best chance of getting the right answer fast while you can still do something.

As IBM also notes, the z/OS Platform for Apache Spark includes Spark open source capabilities consisting of the Apache Spark core, Spark SQL, Spark Streaming, Machine Learning Library (MLlib) and Graphx, combined with the industry’s only mainframe-resident Spark data abstraction solution. The new platform helps enterprises derive insights more efficiently and securely. In the processing the platform can streamline development to speed time to insights and decision and simplify data access through familiar data access formats and Apache Spark APIs.

Best of all, however, is the in-memory capabilities as noted above. Apache Spark uses an in-memory approach for processing data to deliver results quickly. The platform includes data abstraction and integration services that enable z/OS analytics applications to leverage standard Spark APIs.  It also allows analysts to collect unstructured data and use their preferred formats and tools to sift through data.

At the same time developers and analysts can take advantage of the familiar tools and programming languages, including Scala, Python, R, and SQL to reduce time to value for actionable insights. Of course all the familiar z/OS data formats are available too: IMS, VSAM, DB2 z/OS, PDSE or SMF along with whatever you get through the Apache Spark APIs.

This year we already have seen the z13s and now the z/OS Platform for Apache Spark. Add to that the z System LinuxOne last year. z-Based data centers suddenly have a handful of radically different new mainframes to consider.  Can Watson, a POWER-based system, be far behind? Your guess is as good as anyone’s.

DancingDinosaur is Alan Radding, a veteran information technology analyst and writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at and here.

Syncsort Brings z System Integration Software to Open Source Tools

October 13, 2015

In a series of announcements last month, Syncsort integrated its DMX-h data integration software with Apache Kafka, an open distributed messaging system. This will enable mainframe shops to tap DMX-h’s easy-to-use GUI to subscribe, transform, enrich, and distribute enterprise-wide data for real-time Kafka messaging.

Spark graphic

Courtesy of IBM

Syncsort also delivered an open source contribution of an IBM z Systems mainframe connector that makes mainframe data available to the Apache Spark open-source analytics platform. Not stopping there, Syncsort is integrating the Intelligent Execution capabilities of its DMX data integration product suite with Apache Spark too. Intelligent Execution allows users to visually design data transformations once and then run them anywhere – across Hadoop, MapReduce, Spark, Linux, Windows, or Unix, on premise or in the cloud.

Said Tendü Yoğurtçu, General Manager of Syncsort’s big data business, in the latest announcement: “We are seeing increased demand for real-time analytics in industries such as healthcare, financial services, retail, and telecommunications.” With these announcements, Syncsort sees itself delivering the next generation streaming ETL and Internet of Things data integration platform.

Of course, the Syncsort offer should be unnecessary for most z System users except those that are long term Syncsort shops or are enamored of Syncsort’s GUI.  IBM already  offers Spark native on z/OS and Linux on z so there is no additional cost.  BTW, Syncsort itself was just acquired. What happens with its various products remains to be seen.

Still  IBM has been on a 12-year journey to expand mainframe workloads—Linux to Hadoop and Spark and beyond—the company has been urging mainframe shops as fast as fast as possible to become fully engaged in big data, open source, and more. The Syncsort announcements come at a precipitous time; mainframe data centers can more easily participate in the hottest use cases: real-time data analytics, streaming data analytics across diverse data sources, and more at the time when the need for such analytics is increasing.

Apache Spark and some of these other technologies should already be a bit familiar to z System data centers; Apache Kafka will be less familiar. DancingDinosaur noted Spark and others here, when LinuxOne was introduced.

To refresh, Apache Spark consists of a fast engine for large-scale data processing that provides over 80 high-level operators to make it easy to build parallel apps or use them interactively from the Scala, Python, and R shells. It also offers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.  As noted above Syncsort offers an open source version of the IBM z Systems mainframe connector that makes mainframe data available to the Apache Spark open-source analytics platform.

Spark already has emerged as one of the most active big data open source projects, initially as a fast memory-optimized processing engine for machine learning and now as the single compute platform for all types of workloads including real-time data processing, interactive queries, social graph analysis, and others. Given Spark’s success, there is a growing need to securely access data from a diverse set of sources, including mainframes, and to transform the data into a format that is easily understandable by Spark.

Apache Kafka, essentially an enterprise service bus, is less widely known. Apache Kafka brings a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system. Kafka is often used in place of traditional message brokers like JMS and AMQP because of its higher throughput, reliability and replication. Syncsort has integrated its data integration software with Apache Kafka’s distributed messaging system to enable users to leverage DMX-h’s GUI as part of an effort to subscribe, transform, enrich, and distribute enterprise-wide data for real-time Kafka messaging.

According to Matei Zaharia, creator of Apache Spark and co-founder & CTO of Databricks: “Organizations look to Spark to enable a variety of use cases, including streaming data analytics across diverse data sources”.  He continues: “Syncsort has recognized the importance of Spark in the big data ecosystem for real-time streaming applications and is focused on making it easy to bring diverse data sets into Spark.” IBM certainly recognizes this too, and the z System is the right platform for making all of this happen.

DancingDinosaur is Alan Radding, a veteran IT analyst and writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at and here.

%d bloggers like this: