In a series of announcements last month, Syncsort integrated its DMX-h data integration software with Apache Kafka, an open distributed messaging system. This will enable mainframe shops to tap DMX-h’s easy-to-use GUI to subscribe, transform, enrich, and distribute enterprise-wide data for real-time Kafka messaging.
Courtesy of IBM
Syncsort also delivered an open source contribution of an IBM z Systems mainframe connector that makes mainframe data available to the Apache Spark open-source analytics platform. Not stopping there, Syncsort is integrating the Intelligent Execution capabilities of its DMX data integration product suite with Apache Spark too. Intelligent Execution allows users to visually design data transformations once and then run them anywhere – across Hadoop, MapReduce, Spark, Linux, Windows, or Unix, on premise or in the cloud.
Said Tendü Yoğurtçu, General Manager of Syncsort’s big data business, in the latest announcement: “We are seeing increased demand for real-time analytics in industries such as healthcare, financial services, retail, and telecommunications.” With these announcements, Syncsort sees itself delivering the next generation streaming ETL and Internet of Things data integration platform.
Of course, the Syncsort offer should be unnecessary for most z System users except those that are long term Syncsort shops or are enamored of Syncsort’s GUI. IBM already offers Spark native on z/OS and Linux on z so there is no additional cost. BTW, Syncsort itself was just acquired. What happens with its various products remains to be seen.
Still IBM has been on a 12-year journey to expand mainframe workloads—Linux to Hadoop and Spark and beyond—the company has been urging mainframe shops as fast as fast as possible to become fully engaged in big data, open source, and more. The Syncsort announcements come at a precipitous time; mainframe data centers can more easily participate in the hottest use cases: real-time data analytics, streaming data analytics across diverse data sources, and more at the time when the need for such analytics is increasing.
Apache Spark and some of these other technologies should already be a bit familiar to z System data centers; Apache Kafka will be less familiar. DancingDinosaur noted Spark and others here, when LinuxOne was introduced.
To refresh, Apache Spark consists of a fast engine for large-scale data processing that provides over 80 high-level operators to make it easy to build parallel apps or use them interactively from the Scala, Python, and R shells. It also offers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. As noted above Syncsort offers an open source version of the IBM z Systems mainframe connector that makes mainframe data available to the Apache Spark open-source analytics platform.
Spark already has emerged as one of the most active big data open source projects, initially as a fast memory-optimized processing engine for machine learning and now as the single compute platform for all types of workloads including real-time data processing, interactive queries, social graph analysis, and others. Given Spark’s success, there is a growing need to securely access data from a diverse set of sources, including mainframes, and to transform the data into a format that is easily understandable by Spark.
Apache Kafka, essentially an enterprise service bus, is less widely known. Apache Kafka brings a fast, scalable, durable, and fault-tolerant publish-subscribe messaging system. Kafka is often used in place of traditional message brokers like JMS and AMQP because of its higher throughput, reliability and replication. Syncsort has integrated its data integration software with Apache Kafka’s distributed messaging system to enable users to leverage DMX-h’s GUI as part of an effort to subscribe, transform, enrich, and distribute enterprise-wide data for real-time Kafka messaging.
According to Matei Zaharia, creator of Apache Spark and co-founder & CTO of Databricks: “Organizations look to Spark to enable a variety of use cases, including streaming data analytics across diverse data sources”. He continues: “Syncsort has recognized the importance of Spark in the big data ecosystem for real-time streaming applications and is focused on making it easy to bring diverse data sets into Spark.” IBM certainly recognizes this too, and the z System is the right platform for making all of this happen.
Tags: analytics, Apache Kafka, Apache Spark, Big Data, Cloud, DataFrames, DMX-h, DMX-h Intelligent Execution, ETL, financial services, GraphX, hadoop, healthcare, Linux, LinuxONE, mainframe, MapReduce, MLlib for machine learning, open source, retail, Spark Streaming, SQL, Syncsort, System z, technology, telecommunications, UNIX, Windows