Posts Tagged ‘Scala’

Latest New Mainframe puts Apache Spark Native on the z System

April 1, 2016

IBM keeps rolling out new versions of the z System.  The latest is the z/OS Platform for Apache Spark announced earlier this week. The new machine is optimized for marketers, data analysts, and developers eager to apply advanced analytics to the z’s rich, resident data sets for real-time insights.

ibm_zos_apache_spark_app

z/OS Platform for Apache Spark

Data is everything in the new economy; and the most and best data you can grab and the fastest you can analyze it, the more likely you will win. The z, already the center of a large, expansive data environment, is well positioned to drive winning data-fueled strategies.

IBM z/OS Platform for Apache Spark enables Spark, an open-source analytics framework, to run natively on z/OS. According to IBM, the new system is available now. Its key advantage:  to enable data scientists to analyze data in place on the system of origin. This eliminates the need to perform extract, transform and load (ETL), a cumbersome, slow, and costly process. Instead, with Spark the z breaks the bind between the analytics library and underlying file system.

Apache Spark provides an open-source cluster computing framework with in-memory processing to speed analytic applications up to 100 times faster compared to other technologies on the market today, according to IBM. Apache Spark can help reduce data interaction complexity, increase processing speed, and enhance mission-critical applications by enabling analytics that deliver deep intelligence. Considered highly versatile in many environments, Apache Spark is most regarded for its ease of use in creating algorithms that extract insight from complex data.

IBM’s goal lies not in eliminating the overhead of ETL but in fueling interest in cognitive computing. With cognitive computing, data becomes a fresh natural resource—an almost infinite and forever renewable asset—that can be used by computer systems to understand, reason and learn. To succeed in this cognitive era businesses must be able to develop and capitalize on insights before the insights are no longer relevant. That’s where the z comes in.

With this offering, according to IBM, accelerators from z Systems business partners can help organizations more easily take advantage of z Systems data and capabilities to understand market changes alongside individual client needs. With this kind of insight managers should be able to make the necessary business adjustments in real-time, which will speed time to value and advance cognitive business transformations among IBM customers.

At this point IBM has identified 3 business partners:

  1. Rocket Software, long a mainframe ISV, is bringing its new Rocket Launchpad solution, which allows z shops to try the platform using data on z/OS.
  1. DataFactZ is a new partner working with IBM to develop Spark analytics based on Spark SQL and MLlib for data and transactions processed on the mainframe.
  1. Zementis brings its in-transaction predictive analytics offering for z/OS with a standards-based execution engine for Apache Spark. The product promises to allow users to deploy and execute advanced predictive models that can help them anticipate end users’ needs, compute risk, or detect fraud in real-time at the point of greatest impact, while processing a transaction.

This last point—detecting problems in real time at the point of greatest impact—is really the whole reason for Spark on z/OS.  You have to leverage your insight before the prospect makes the buying decision or the criminal gets away with a fraudulent transaction. After that your chances are slim to none of getting a prospect to reverse the decision or to recover stolen goods. Having the data and logic processing online and in-memory on the z gives you the best chance of getting the right answer fast while you can still do something.

As IBM also notes, the z/OS Platform for Apache Spark includes Spark open source capabilities consisting of the Apache Spark core, Spark SQL, Spark Streaming, Machine Learning Library (MLlib) and Graphx, combined with the industry’s only mainframe-resident Spark data abstraction solution. The new platform helps enterprises derive insights more efficiently and securely. In the processing the platform can streamline development to speed time to insights and decision and simplify data access through familiar data access formats and Apache Spark APIs.

Best of all, however, is the in-memory capabilities as noted above. Apache Spark uses an in-memory approach for processing data to deliver results quickly. The platform includes data abstraction and integration services that enable z/OS analytics applications to leverage standard Spark APIs.  It also allows analysts to collect unstructured data and use their preferred formats and tools to sift through data.

At the same time developers and analysts can take advantage of the familiar tools and programming languages, including Scala, Python, R, and SQL to reduce time to value for actionable insights. Of course all the familiar z/OS data formats are available too: IMS, VSAM, DB2 z/OS, PDSE or SMF along with whatever you get through the Apache Spark APIs.

This year we already have seen the z13s and now the z/OS Platform for Apache Spark. Add to that the z System LinuxOne last year. z-Based data centers suddenly have a handful of radically different new mainframes to consider.  Can Watson, a POWER-based system, be far behind? Your guess is as good as anyone’s.

DancingDinosaur is Alan Radding, a veteran information technology analyst and writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing at technologywriter.com and here.

IBM Continues Open Source Commitment with Apache Spark

June 18, 2015

If anyone believes IBM’s commitment to open source is a passing fad, forget it. IBM has invested billions in Linux, open Power through the Open Power Foundation, and more. Its latest is the announcement of a major commitment to Apache Spark, a fast open source and general cluster computing system for big data.

spark VGN8668

Courtesy of IBM: developers work with Spark at Galvanize Hackathon

As IBM sees it, Spark brings essential advances to large-scale data processing. Specifically, it dramatically improves the performance of data dependent-apps and is expected to play a big role in the Internet of Things (IoT). In addition, it radically simplifies the process of developing intelligent apps, which are fueled by data. It does so by providing high-level APIs in Scala, Java, and Python, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing.

IBM is contributing its breakthrough IBM SystemML machine learning technology to the Spark open source ecosystem. Spark brings essential advances to large-scale data processing, such as improvements in the performance of data dependent apps. It also radically simplifies the process of developing intelligent apps, which are fueled by data. But maybe the biggest advantage is that it can handle data coming from multiple, disparate sources.

What IBM likes in Spark is that it’s agile, fast, and easy to use. It also likes it being open source, which ensures it is improved continuously by a worldwide community. That’s also some of the main reasons mainframe and Power Systems data centers should pay attention to Spark.  Spark will make it easier to connect applications to data residing in your data center. If you haven’t yet noticed an uptick in mobile transactions coming into your data center, they will be coming. These benefit from Spark. And if you look out just a year or two, expect to see IoT applications adding to and needing to combine all sorts of data, much of it ending up on the mainframe or Power System in one form or another. So make sure Spark is on your radar screen.

Over the course of the next few months, IBM scientists and engineers will work with the Apache Spark open community to accelerate access to advanced machine learning capabilities and help drive speed-to-innovation in the development of smart business apps. By contributing SystemML, IBM hopes data scientists iterate faster to address the changing needs of business and to enable a growing ecosystem of app developers who will apply deep intelligence to everything.

To ensure that happens, IBM will commit more than 3,500 researchers and developers to work on Spark-related projects at more than a dozen labs worldwide, and open a Spark Technology Center in San Francisco for the Data Science and Developer community to foster design-led innovation in intelligent applications. IBM also aims to educate more than 1 million data scientists and data engineers on Spark through extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize, and Big Data University MOOC (Massive Open Online Course).

Of course, Spark isn’t going to be the end of tools to expedite the latest app dev. With IoT just beginning to gain widespread interest expect a flood of tools to expedite developing IoT data-intensive applications and more tools to facilitate connecting all these coming connected devices, estimated to number in the tens of billions within a few years.

DancingDinosaur applauds IBM’s decade-plus commitment to open source and its willingness to put real money and real code behind it. That means the IBM z System mainframe, the POWER platform, Linux, and the rest will be around for some time. That’s good; DancingDinosaur is not quite ready to retire.

DancingDinosaur is Alan Radding, a veteran IT analyst and writer. Please follow DancingDinosaur on Twitter, @mainframeblog. See more of his IT writing on Technologywriter.com and here.


%d bloggers like this: